# G06 - Wall Jump
---

## Index
[1.Introduction](#intro) <br>
[2.Case analysis](#oldCase) <br>
[3.Performance analysis](#performance) <br>
[4.New case proposal](#newCase) <br>
[5.Team](#team)

## <a name="intro"></a>1. Introduction
This notebook has all the information related to <b>G06</b>'s delivery 1. Here you can find an analysis of the <b>Wall Jump</b> example of Unity's ml-agents, as well as a new case proposal of the same example, with all information needed to reproduce it by yourself.

This notebook also works as a post mortem for our new proposal's implementation.

|![AlexRivero](g06-img/Alex.png)|![DavidRecuero](g06-img/David.png)|
|---|---|
|Alex Rivero Ferràs|David Recuero Redrado|
|alexriveroferras@enti.cat|davidrecueroredrado@enti.cat|

## <a name="oldCase"></a>2. Case analysis

### Rewards
In this example there are 3 rewards given to the agent depending on its performance solving the problem:
- <span style="color:green">+1.0, if the agent reaches the goal</span> - Given to the agent in the <i>OnTriggerStay()</i> (using the collider of the goal)
- <span style="color:red">-0.0005, for every step the agent does</span> - Given to the agent in <i>MoveAgent()</i>. This reward motivates the agent to find the optimal path to the goal (using as less movements as possible)
- <span style="color:red">-1.0, if the agent or the cube falls off the platform</span> - Given to the agent in <i>AgentAction()</i>
![MoveAgent](g06-img/Rewards.png)

### Actions
The agent can take 4 different main actions, each of them with multiple options:
<img style="float: right;" src="g06-img/Movement.gif">
- Forward movement
    - Forward
    - Backwards 
    - No action
- Side movement
    - Left
    - Right
    - No action
- Rotation
    - Left
    - Right
    - No action
- Jump
    - Jump
    - No action
    
### Observations
The agent has 2 different observations in the <i>CollectObservations()</i> function: the position of the agent and a boolean indicating if the agent is grounded (as the agent only receives the reward if it stays on the goal being grounded). It also has 14 ray casts each detecting 4 possible objects.

### How it all works
In this example the agent has to reach a green area tagged as "goal". The environment has 3 different states:
- The goal accessible for the agent <b>without any wall</b> blocking it.
- A <b>small-sized wall</b> blocking the way to the goal. In this case the agent can jump over the wall with a simple jump.
- A <b>big-sized wall</b> blocking the way to the goal. In this case the agent has to push a cube against the wall, jump on the cube and then jump over the wall.

As this example has 3 different states it would take too long to train it using the "hard way" (trying randomly every situation). That's why it uses curriculum learning.<br>
Curriculum learning uses progression to train the agent. In this example, the wall scales up when the agent reaches a threshold. The agent learns to solve the 3 states progressively, making it easier and faster to train.

To do so, the agent uses 2 brains and, depending on the current case, the brain is passed to the behavior parameters using the <i>ConfigureAgent()</i> function.

```c#
void ConfigureAgent(int config)
    {
        var localScale = wall.transform.localScale;
        if (config == 0)
        {
            localScale = new Vector3(
                localScale.x,
                Academy.Instance.FloatProperties.GetPropertyWithDefault("no_wall_height", 0),
                localScale.z);
            wall.transform.localScale = localScale;
            GiveModel("SmallWallJump", noWallBrain);
        }
        else if (config == 1)
        {
            localScale = new Vector3(
                localScale.x,
                Academy.Instance.FloatProperties.GetPropertyWithDefault("small_wall_height", 4),
                localScale.z);
            wall.transform.localScale = localScale;
            GiveModel("SmallWallJump", smallWallBrain);
        }
        else
        {
            var min = Academy.Instance.FloatProperties.GetPropertyWithDefault("big_wall_min_height", 8);
            var max = Academy.Instance.FloatProperties.GetPropertyWithDefault("big_wall_max_height", 8);
            var height = min + Random.value * (max - min);
            localScale = new Vector3(
                localScale.x,
                height,
                localScale.z);
            wall.transform.localScale = localScale;
            GiveModel("BigWallJump", bigWallBrain);
        }
    }
```
The <i>SmallWallJump</i> brain is used for both when there's no wall and when the wall is low enough to jump over it. The <i>BigWallJump</i> brain is used when the wall is too high and the agent needs the cube to jump over it.

The variables <i>no_wall_min_height</i>, <i>small_wall_min_height</i> and <i>big_wall_min_height</i> are defined in a document called <b>wall_jump.yaml</b> (in the path <i>ml-agents\config\curricula</i>). In this file there's the setup for the curriculum learning:

```c#
BigWallJump:
  measure: progress
  thresholds: [0.1, 0.3, 0.5]
  min_lesson_length: 100
  signal_smoothing: true
  parameters:
    big_wall_min_height: [0.0, 0.922, 1.0 , 1.5]
    big_wall_max_height: [0.922, 1.3, 1.5, 1.5]

SmallWallJump:
  measure: progress
  thresholds: [0.1, 0.3, 0.5]
  min_lesson_length: 100
  signal_smoothing: true
  parameters:
    small_wall_height: [0.1, 0.3, 0.5, 0.9222]
```

Here's a link to the repository's docs where curriculum learning is explained: https://github.com/ENTI-Input-Output/ml-agents/blob/master/docs/Training-Curriculum-Learning.md

## <a name="performance"></a>3. Performance analysis
To test the performance of the training we've tried to train the agents using both curriculum learning and the "normal" reinforcement learning. We expected to see a faster training and better results when using curriculum learning, and that's what we obtained:

In the <i>small_wall_jump</i> situation, we can see there is no big difference between the curriculum learning and the usual method (the blue line shows the mean reward progression of the curriculum learning).
![SmallWallJump](g06-img/SmallWallJump_01.png)

However, in the <i>big_wall_jump</i> situation the difference is very noticeable. It took more than 8 million of iterations for the agent to achieve a mean reward of ~0.6, whilst using curriculum learning it took the exact amount of 100k iterations.
![BigWallJump](g06-img/BigWallJump_01.png)

We also tried to apply the advice that some authors give in their articles and we eliminated the negative reward for falling off the platform. Instead, we started to give a very small positive reward to the agent (+0.002). This accelerated the training, but not with the results we expected (didn't reach the expected mean reward nor the behavior). Instead of reaching the goal, the agent learned to always fall off the platform as it never reached the goal and didn't know that there were a bigger reward when reaching it.
![PositiveReward](g06-img/Small&BigWallJump_PositiveReward.png)
![AgentFallingOff](g06-img/AgentFallingOff.gif)

<br>
<br>
The main problem we've encountered is that we haven't been able to reproduce the same the training that comes with the ml-agents project. Even though we didn't change any value no line of code, the agent never learned to jump on the cube to jump over the wall. We've reached the conclusion that the the project doesn't come with the configuration needed to reproduce the given reuslts.

## <a name="newCase"></a>4. New case proposal
As a new case, we first thought of increasing the vertical size of the wall and add a hole to force the agent push the cube to area below the hole and then jump through it. The first problem we found was the one mentioned above. The agent wasn't able to jump on the cube so it would never go through the hole. Instead, it just stood in front of the wall without doing anything.

Another problem we faced was that the collider used to detect the hole was wider than the wall. Combined with the fact that we were using the <i>AddReward()</i> function, the agent learned to just approach the wall and jump against it so it entered the collider and got the reward. We solved it by decreasing the size of the hole's collider.

As we were unable to solve the first problem, we decided to eliminate the <i>big wall jump</i> case and put the hole lower so the agent could jump through it without using the white cube. This time we obtained the expected results:

![NewCaseWorking](g06-img/NewCaseWorking.gif)
![NewCaseGraphic](g06-img/NewCaseGraphic.png)

To configure this new case we did the following:

1. We created a new variable for a new brain (for our case proposal)<br>
![NewCaseVariable](g06-img/NewCaseVariable.png)


2. Then we added our new case to the function <i>ConfigureAgent()</i><br>
![NewCaseBrain](g06-img/NewCaseBrain.png)


3. After that, we created a function to give a reward of 0.05 to the agent when it enters the hole. To do this, we needed to create a tag named Hole<br>
![OnTriggerEnterHole](g06-img/OnTriggerEnterHole.png)


4. For the agent to detect the hole, we needed to add the tag we created to the 2 Ray Perception Sensors detectable tags<br>
![NewCaseRaySensor](g06-img/NewCaseRaySensor.png)


5. As we just increased the number of observations, we had to adjust the vector observation size<br>
![NewCaseObservationSize](g06-img/NewCaseObservationSize.png)


6. To create the hole in the wall we created a new prefab using 4 cubes and a 5th object on trigger mode to put on the hole. The wall is created dynamically using a script that puts all the parts in place depending on the desired size of the hole<br>
![NewCaseDynamicWall](g06-img/NewCaseDynamicWall.png)


7. Finally, we eliminated the wall gameObject from the <i>WallJumpArea</i> prefab and we added our own prefab. Then added some code to instantiate it in the <i>ConfigureAgent()</i>function<br>
![NewCaseInstantiateWall](g06-img/NewCaseInstantiateWall.png)

## <a name="team"></a>5. Team

|![AlexRivero](g06-img/Alex.png)|![DavidRecuero](g06-img/David.png)|
|---|---|
|Alex Rivero Ferràs|David Recuero Redrado|
|alexriveroferras@enti.cat|davidrecueroredrado@enti.cat|