# Wall Jump ML-Agents Example
---
## Introduction
---
This notebook examines the [**WallJump**](https://www.youtube.com/watch?v=NITLug2DIWQ&feature=youtu.be) example of [Unity's ML-Agents](https://github.com/Unity-Technologies/ml-agents) repository. This case is about an agent that must jump through a variable-height wall, sometimes with the help of a box, which must move to jump on it, through the wall and reach the target. 

Specifically we will explain the operation of the variables and rewards of the `WallJump` scene of Unity. Also will train a new brain adding modifications to the enviroment to later compare the results obtained with the trained brain in an environment without modifications thanks to the [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) graphics. 

## Team Members
---
### Miquel Ripoll Fornes
- __*Location*__: Mallorca
- __*Mail*__: miquelripollfornes@enti.cat

### Marc Martos Cabré
- __*Location*__: Barcelona
- __*Mail*__: marcmartoscabre@enti.cat

## Case Analysis
---
First of all, we take a look at our example in the [Unity's ML-Agents](https://github.com/Unity-Technologies/ml-agents) repository. We have an agent on scene (blue box) which it's goal is to reach the marked area on the ground. The scene will have three different configurations.

Each configuration challenge the actor in different ways, making him thinking different on every iteration. The configurations modify the wall height and this will change the actor behaviour. 

On the bigWall configuration, if the agent detects a wall, he will instantly try to fins a box to jump over the wall. On the two other cases, he will just move on throught the area and he will jump without any problem over the wall. 

<img align='center' src='img/WallJumpExample.gif'>

### Agent
---
The agent is a blue cube which have **three brains** on it. This three brains will act in different situations such as the mentioned before, depending on the height of the wall. We will use the same brain for the "No wall" case and the "Small Wall" case, because the player does not need to interact with the block in order to overcome the wall, he can just jump over it. 

This case has **4 states**, which are: the agent position, the goal position, the wall position and his height and the block position. This states are directly related with the observations.

The agent can do **four different actions** on each frame depending of the actual states, he can move on forward or backward, rotate around his up vector, move to sides and he can also jump. This actions are in the `MoveAgent` method, used in `AgentAction` method.

<img align='center' src='img/actions.png'>

To make this actions, first of all, the actor needs the information about the other objects of the scene. For this purpose, the agent have the `CollectObservations` method which uses the Raycasts to do this job.

Finally, to do the correct and optimal action considering the observations collected we need to train it. To do that, the **rewards** will be used with the `AddReward` method.

### Ray Perception
---
As we said before, the agent needs to **throw raycasts** to collect his observations on put them in a vector. To do that the agent will throw rays in different angles and heights, depending on it's position. This **rays will collide against an object** of the scene and this will give all the information the actor needs from the world, like the **position** of the objects or his **height**. We have **three** different objects to collide the ray against in this example: 

- Wall: The obstacle the actor will face on the area, the raycast could give us the **distance and height** of this.
- Goal: The area the actor have to reach, the raycast give us the **distance** between the agent and goal.
- Block: The object which will help the player to get through the wall, the raycast give us the **distance and if the agent are on it**.

> Note: There is another raycast down that continuously checks if the agent is **grounded or falling**.

<img align='center' src='img/Raycasts.png' width="200" height="200"> 

In this case we have rays in 7 diferents angles on 2 diferent heights, offering a total of 14 rays. This means for every detectable object we need 14 `vectorObservation` slots on the agent brains. This is usefull for the future implementation of our case.

### Rewards
---
Agents learn by trial and error when receiving a status. So for learning to take effect is necessary to give the appropriate **rewards/punishments** at the right time. These rewards are **associated to an action** given a concrete state, reason why with sufficient experience the agent can finish knowing which is the best option of all given a state to maximize its reward.

In this example, we have 4 kinds of rewards, 3 setted with `SetReward` method, **overriding** the current step reward of the agent and updates the episode reward accordingly and 1 added with `AddReward` method, **incrementing** the step and episode rewards by the provided value:

- Agent falls: `SetReward(-1f)`.
- Block falls: `SetReward(-1f)`.
- Agent reach goal: `SetReward(1f)`.
- Agent move: `AddReward(-0.0005f)` to current reward.

The values of the rewards are normalized, for that reason moving an agent adds `-0.0005f`, because the maximum steps of each episode are 2000, which ends up with a total of `-1f` at the end of the episode. If we change one of these two values is mandatory change the other accordly.


## Performance Analysis
---
### Parameters

There are a lot of parameters in `trainer_config.yaml` which might show different results on your trainings. We will explain few parameters we consider the most relevant for our experiment:

#### Learning rate:
This parameter is how much value have an action for the agent while he's training. Therefore the agent will learn slower the lower this parameter is. On the other hand we shouldn't have a very high value, so the agent will not explore enough and will be satisfied with the first actions that reward him in a positive way, which can generate that he doesn't learn the optimal action to perform. Default value: 0.003

#### Max steps:
As the name implies, this means that the higher this parameter is, the longer the agent will train. We can define it as runs or the times when the environment is restarted before completing the training. Default value: 1.000.000

#### Hidden units:
Hidden units are the number of units in the hidden layers of the neural network. This parameter give the agent more complexity and recombination of input factors so that he can find different solutions. This means that the agent could find more solutions and in this way find the most optimal action. Default value: 256

### Tweaking parameters:
TODO

<img align='center' src='img/tensorboard.png'>

### Tweaking rewards:
TODO

<img align='center' src='img/tensorboard.png'>

## New Case Proposal
---
For our own case we have decided to add an enemy to the scene so that the agent has to dodge it and make it difficult to get to goal.

### Enemy
We have decided to add an enemy which can not be jump over by the agent and which moves in a straight line on the x-axis. The enemy can be modified the speed and size to regulate the difficulty.
<img width='400px' align='center' src='img/enemy.png'>

### Observations
As we have added a new element in the scene, we had to add more observations to the agent with the help of the RayPerception. For this we have had to mark the enemy with a tag and add it in the agent's script.

### Brain
As for the brain, with respect to the original, the only modification has been to extend the observation vector to 88 since, as we have commented previously, new observations have been added.
<img width='400px' align='center' src='img/brain.png'>