# Understanding the PushBlock case
___
## 1. Introduction
In this notebook we analyse one of the examples in the ml-agents repository, the PushBlock example. 
Also we will analyse its performance with several parameter changes. And in the end we will take a look at a slightly modified version of the PushBlock case and see how it works compared to the original one.

The team responsible for this work is formed by:  

<img src="adria_foto.jpg" style="width: 200px; margin: 10px 0px;  border: 1px solid black;"> 
Adrià Ortiz Navarro  
veric00@gmail.com  
<img src="ferran_foto.jpg" style="width: 200px; margin: 10px 0px;  border: 1px solid black;"> 
Ferran Illa Capellas  
ferran.illa26@gmail.com




## 2. Case analysis
The PushBlock example consists of a small block (the Agent) who has to push a bigger block (the Target) to the goal zone. This occur in a considerably small square platform with borders around it, so both the Agent and the Target can not fall from the platform.

![pushblock](PushBlock_ss.png)



By default the **rewards** given to the Agent in the PushBlock case are the following:

- +5 for reaching the Goal  
This is the main reward given when the Agent completes his task.

- -(1/`maxSteps`) for every frame  
This negative reward is to encourage the Agent to complete his task as fast as possible, as the total reward decreases every frame. A negative reward like this will be necessary in any variation of the PushBlock case.

In order for the brain to calculate the next Agent action, it needs to receive the **state** of the Agent inside the world. The information that the brain needs in the PushBlock example is the following:

- Distance to Target
- Distance to Goal
- Distance to Walls

It uses raycasts from the Agent to look for the objects mentioned above.
```css
var rayDistance = 12f;
float[] rayAngles = { 0f, 45f, 90f, 135f, 180f, 110f, 70f };
var detectableObjects = new[] { "block", "goal", "wall" };
AddVectorObs(rayPer.Perceive(rayDistance, rayAngles, detectableObjects, 0f, 0f));
AddVectorObs(rayPer.Perceive(rayDistance, rayAngles, detectableObjects, 1.5f, 0f));
```
What the Perceive function basically does is that, for each ray, stores categorial information on a detected object along with the object distance. This information is sent to the brain through the `AddVectorObs` method.

<img src="raycasts.png" style="width: 700px;"> 
&nbsp;



Another important aspect of the process are the **actions**. These are the decisions of the brain. In the PushBlock case we have a discrete action space with only 6 actions posible. These actions correspond to 4 directions of movement (forward, back, right and left) and 2 directions of rotation (right and left). 

Depending on the action received, the right value is stored in the `rotateDir` and `dirToGo` variables, and then they are used to move the Agent in the scene.

```css
transform.Rotate(rotateDir, Time.fixedDeltaTime * 200f);
agentRB.AddForce(dirToGo * academy.agentRunSpeed, ForceMode.VelocityChange);
```
<br><br>
So summarizing, the learning process goes like this:

The Agent does random actions and at some point he accidentally pushes the Target to the Goal zone, so he gets a high reward. After that, whenever the state of the Agent is similar to the one where he got a high reward, the Agent will also do similar actions to that step. Over time this situation will be repeated and the policy will be shaped. Also, the Agent will slowly learn to complete his task faster because of the negative reward over time of which we have spoken before. If the Agent pushes the Target to the Goal fast the reward will be higher so he will repeat those actions in the future.

## 3. Performance analysis

## 4. New case proposal

How to train:
1. First, duplicate a brain, rename it if needed, and put the model to none 
2. Put the brain in the academy and check the control box
3. Put the same brain into the GameObject Agent, inside PushAgentBasic, the part with Brain.
4. In the Anaconda Prompt, put activate ml-agents and press Enter
5. In the Anaconda Prompt put the direction of the folder ml-agents 
6. Once you are inside that folder, put mlagents-learn config/trainer_config.yaml --run-id=”NameOfTheLearning” --train
7. Press enter, wait a few seconds, and then, press the play button. With this, the agent will train and learn

If you like to train from a launcher and not from the editor:
6. Create a .exe of the project
7. Once you are inside that folder, put mlagents-learn config/trainer_config.yaml --env=FolderOfTheExecutable/Executable --run-id=”NameOfTheLearning” --train	

Rewards:
- The target arrives to the goal = +5.0f
- Every frame that the target isn’t in the goal = (-1f / agentParameters.maxStep)
- If the agent is at a max distance of 3.0f with the target, every frame = + 0.001f
- If the target is at a max distance of 1.0f with a wall, every frame = -0.01f

States:
- Only has the default state


<img src="newCase_ss.png" style="width: 700px;">

In this map, there are a few walls inside the path. Two are vertical and they aren’t a big problem, but the other cuts the map in 2 parts, putting difficulty to put the target into the goal. 