# Wall Jump ML-Agents Example
---
## Introduction
---
This notebook examines the [**WallJump**](https://www.youtube.com/watch?v=NITLug2DIWQ&feature=youtu.be) example of [Unity's ML-Agents](https://github.com/Unity-Technologies/ml-agents) repository. This case is about an agent that must jump through a variable-height wall, sometimes with the help of a box, which must move to jump on it, through the wall and reach the target. 

Specifically we will explain the operation of the variables and rewards of the `WallJump` scene of Unity. Also will train a new brain adding modifications to the enviroment to later compare the results obtained with the trained brain in an environment without modifications thanks to the [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) graphics. 

## Team Members
---
### Miquel Ripoll Fornes
<img style="float: left; margin-right: 20px;" src='img/miquel.png' width="100" height="100"/> <br/>

- __*Location*__: Mallorca   
- __*Mail*__: miquelripollfornes@enti.cat 
<br/>
<br/>

### Marc Martos Cabré
<img style="float: left; margin-right: 20px;" src='img/marc.png' width="100" height="100"/> <br/>

- __*Location*__: Barcelona
- __*Mail*__: marcmartoscabre@enti.cat

## Case Analysis
---
First of all, we take a look at our example in the [Unity's ML-Agents](https://github.com/Unity-Technologies/ml-agents) repository. We have an agent on scene (blue box) which it goal is to reach the marked area on the ground. The scene will have three different configurations.

Each configuration challenge the actor in different ways, making him thinking different on every iteration. The configurations, modify the wall height and this will change the actor behaviour. 

On the `bigWall` configuration, if the agent detects a wall, he will instantly try to find a box to jump over the wall. On the two other cases, he will just move on throught the area and he will jump without any problem over the wall. 

<img align='center' src='img/WallJumpExample.gif'>

### Agent
---
The agent is a blue cube which have **three brains** on it. This three brains will act in different situations such as the mentioned before, depending on the height of the wall. We will use the same brain for the "No wall" case and the "Small Wall" case, because the player does not need to interact with the block in order to overcome the wall, he can just jump over it. 

This case has **four states**, which are: the agent position, the goal position, the wall position and his height and the block position. This states are directly related with the observations.

The agent can do **four different actions** on each frame depending on the actual states, he can move on forward or backward, rotate around his up vector, move to sides and he can also jump. This actions are in the `MoveAgent` method, used in `AgentAction` method.

<img align='center' src='img/actions.png'>

In the first place, to make this actions, the actor needs the information about the other objects of the scene. For this purpose, the agent have the `CollectObservations` method which uses the Raycasts to do this job.

Finally, to do the correct and optimal action considering the observations collected we need to train it. To do that, the **rewards** will be used with the `AddReward` method.

### Ray Perception
---
As we said before, the agent needs to **throw raycasts** to collect his observations and put them in a vector. To do that the agent will throw rays in different angles and heights, depending on its position. This **rays will collide against an object** of the scene and this will give all the information the actor needs from the world, as the **position** of the objects or his **height**. We have **three** different objects to collide the ray against in the example: 

- Wall: The obstacle the actor will face on the area, the raycast could give us the **distance and height** of this.
- Goal: The area the actor have to reach, the raycast give us the **distance** between the agent and goal.
- Block: The object which will help the player to get through the wall, the raycast give us the **distance and if the agent are on it**.

> Note: There is another raycast down that continuously checks if the agent is **grounded or falling**.

<img align='center' src='img/Raycasts.png' width="200" height="200"> 

In this case we have rays in 7 types angles on 2 diferent heights, offering a total of 14 rays. This means for every detectable object we need 14 `vectorObservation` slots on the agent brains. This is usefull for the future implementation of our case.

### Rewards
---
Agents learn by trial and error when receiving a status. So for training to take effect is necessary to give the appropriate **rewards/punishments** at the right time. These rewards are **associated to an action** given a concrete state, reason why with sufficient experience the agent can finish knowing which is the best option of all given a state to maximize its reward.

In this example, we have 4 kinds of rewards, 3 setted with `SetReward` method, **overriding** the current step reward of the agent and updates the episode reward accordingly and 1 added with `AddReward` method, **incrementing** the step and episode rewards by the provided value:

- Agent falls: `SetReward(-1f)`.
- Block falls: `SetReward(-1f)`.
- Agent reach goal: `SetReward(1f)`.
- Agent move: `AddReward(-0.0005f)` to current reward.

The values of the rewards are normalized, for that reason moving an agent adds `-0.0005f`, because the maximum steps of each episode are 2000, which ends up with a total of `-1f` at the end of the episode. If we change one of these two values is mandatory change the other accordly.

## Performance Analysis
---
### Parameters

There are a lot of parameters in `trainer_config.yaml` which might show different results on your trainings. Before start, we need to know all the brains goes to the config file to found his header and take his parameters, but if they couldn't find it the brain uses the default header parameters. So we will explain few parameters we consider the most relevant for our experiment, make an optimization of the configuration to train the agent more efficiently:

<img src='img/config.png'>

#### Learning rate:
This parameter is **how much value have an action** for the agent while he's training. Therefore the agent will learn slower the lower this parameter is. On the other hand we shouldn't have a very high value, so the agent will not explore enough and will be satisfied with the first actions that reward him in a positive way, which can generate that he doesn't learn the optimal action to perform. <br/>
Default value: `0.0003`

#### Max steps:
As the name implies, this means that the higher this parameter is, the longer the agent will train. We can define it as runs or the **times the environment is restarted** before completing the training. <br/>
Default value: `1.000.000`

#### Hidden units:
Hidden units are the number of units in the hidden layers of the neural network. This parameter give the agent more **complexity and recombination of input factors** so that he can find different solutions. For problems where the action is a very complex interaction between the observation variables, this should be larger. <br/>
Default value: `256`

#### Normalize:
Corresponds to whether normalization is applied to the **vector observation inputs**. This normalization is based on the running average and variance of the vector observation. Normalization can be helpful in cases with **complex continuous control problems**, but may be harmful with simpler discrete control problems. <br/>
Default value: `false`

### Tweaking parameters:

The first thing we did was try to reduce the number of `max_steps` in training, so we tried to see if 500k had enough to train our brains (small & big). We could see that the `smallWallBrain` had **enough with 50k** to find a valid solution and between **150k-200k to optimize** it. While the `bigWallBrain`, at **50k he learns not to die** so his trainings are longer, but **500k not enough** for him to learn which is the best option to achieve the Goal.

Having checked this, we decided to check the **proportionality** of the `learningRate` with the Steps, so we doubled the value from `0.0003` to `0.0006`, hoping that the training would be much more effective and we encountered these results:

<img align='center' src='img/learningRate.png'>

As we can see the results did **not change** at all, showing us that the learningRate of this example is already sufficiently optimized and that therefore it did not help to increase it.

The next step was changing the `normalize` value from `false` to `true`. Because, as we said before, the normalization is helpful in cases with **complex continuous control problems** as our example, at least the big wall case. We really didn’t expect much of this change, but...

<img align='center' src='img/normalize.png'>

On the `smallWallBrain` side showed no change as we expect, but on the `bigWallBrain` we had a big surprise. The normalization makes it possible for the agent to find a **valid solution at 300k-350k**, and needs more or less **500k to optimize** it. This was a great help in achieving the Steps optimization, we were looking for.

Last step in our experiment to optimize the training was change the `hidden_units` and try to use a larger value, as it is better for complex problems. We changed the value to `512` and we obtained this: 

<img align='center' src='img/normalize_hu.png'>

In this case, we had the opposite result as we expect. Increase the `hiden_units` till 512 not only has the training not improved, it has also worsened its efficiency. So it seems that the case was not complicated enough for that value, so the initial value is better adapted to the example.

Finally, we have our optimized `trainer_config.yaml` file, obtaining a significant reduction in the steps necessary for the most difficult case (_*big wall*_).

| Parameter     | Default Config  | Our Optimized Config |
| ---           |       ---       |         ---          |
| Learning Rate | `0.0003`        | `0.0003`             |
| Max Steps     | `1.000.000`     | `500.000`            |
| Hidden Units  | `256`           | `256`                |
| Normalize     | `false`         | `true`               |


## New Case Proposal
---
For our own case we have decided to **add an enemy** to the scene so that the agent has to dodge it and make it difficult to get to goal, because if the agent touches him dies and re-init the scene. The initial hypothesis is that the agent learns to dodge the enemies while learning to jump the wall with the help of the block. 

In this case the agent can perform the **same actions** as in the example: move, rotate and jump. On the other hand, the **states have increased** because there is now an enemy on the scene.

For this example we gonna copy the `WallJumpAgent.cs`, rename to `WallJumpAgentUpgrade.cs` and do all the changes on it. 

### Enemy
We have decided to add an enemy which can not be jump over by the agent and which moves in a straight line on the x-axis. The enemy can be modified the speed and size to regulate the difficulty. To implement it only needs to create a `Cube` in the `WallJumpArea` prefab, put in front of the wall, set a coherent size _*(we used x=3, y=1.5, z=1)*_ and drag the `EnemyBehavior.cs` we made it. Could also be changed the speed of the agent in the inspector.

<img align='center' src='img/enemy.png' width="300">


### Observations
As we said before, we have added a new element in the scene, it means **more states and more observations**. We had to add more observations to the agent with the help of the RayPerception. For this reason, as we explained on the Case Analysis section, need to increase the `vectorObservations` of the **two brains** _*(Big & Small)*_ in 14, from 74 to 88.

<img align='center' src='img/brain.png' width="300">

Once that is done, we need to say the agent what is an enemy. To do that, first of all need to go one more time to the `Enemy` and **tag it** as a `stone` in the inspector (create the new tag or a similar one).

<img align='center' src='img/tag.png' width="300">

Now it's necessary to declare on the `WallBallAgentUpgrade.cs`, putting the new tag on the `detectableObjects` array.

<img align='center' src='img/detectable.png' width="400">

### Rewards
To show the agent that the enemy is an obstacle to his target, we decided to give him a **negative reward** every time he came into contact with him. This Reward had to be forceful enough to dodge it, so we typed the same value as the fall of the agent or the block _*(SetReward(-1f)*_. Thus, when the agent touches the enemy, the step is restarted and given a negative reward.

<img align='center' src='img/collision.png' width="400">

With this first version we did not have enough, since the agent learned not to die and to dodge the enemy, but never learned to pass the wall. To try to solve this, we try to add one more reward to the setup, this time we would add a reward the first time the agent brought the block closer to the wall at a predetermined distance. We were varying this reward between 0.2f and 0.5f to see that it really didn’t help the final solution we were looking for. Then we end up discarding it.

<img align='center' src='img/help.png'>

Finally, with some testing we found the correct way to train the agent. We only need to put the enemy in some cases, not all cases. To do that we decide to exclude the enemy in the 50% of the Big Wall cases, changing his 'y' position when the enemy resets.

<img align='center' src='img/reset.png' width="450">

It was our final and optimal solution for this example and we can see the results of it in the tensorboard graphics.

<img align='center' src='img/finalBoard.png'>

<img align='center' src='img/finalGif.gif'>


## Annex

### EnemyBehavior.cs

In [None]:
using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class EnemyBehavior : MonoBehaviour{
    private bool dirRight = true;
    public float len = 9.0f;
    public float speed = 2.0f;

    // Update is called once per frame
    void Update() {
        if (dirRight)
            transform.Translate(Vector2.right * speed * Time.deltaTime);
        else
            transform.Translate(-Vector2.right * speed * Time.deltaTime);

        if (transform.position.x >= len) dirRight = false;
        if (transform.position.x <= -len) dirRight = true;
    }

    void OnCollisionEnter(Collision col) {
        dirRight = !dirRight;
    }

}

### WallBallAgentUpgrade.cs

In [None]:
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System.Linq;
using MLAgents;

public class WallJumpAgentUpgrade : Agent
{
    // Depending on this value, the wall will have different height
    int configuration;
    // Brain to use when no wall is present
    public Brain noWallBrain;
    // Brain to use when a jumpable wall is present
    public Brain smallWallBrain;
    // Brain to use when a wall requiring a block to jump over is present
    public Brain bigWallBrain;

    public GameObject ground;
    public GameObject spawnArea;
    Bounds spawnAreaBounds;


    public GameObject goal;
    public GameObject shortBlock;
    public GameObject enemy;
    public GameObject wall;
    Rigidbody shortBlockRB;
    Rigidbody agentRB;
    Material groundMaterial;
    Renderer groundRenderer;
    WallJumpAcademy academy;
    RayPerception rayPer;

    public bool together = false;
    public float jumpingTime;
    public float jumpTime;
    // This is a downward force applied when falling to make jumps look
    // less floaty
    public float fallingForce;
    // Use to check the coliding objects
    public Collider[] hitGroundColliders = new Collider[3];
    Vector3 jumpTargetPos;
    Vector3 jumpStartingPos;

    string[] detectableObjects;

    public override void InitializeAgent()
    {
        academy = FindObjectOfType<WallJumpAcademy>();
        rayPer = GetComponent<RayPerception>();
        configuration = Random.Range(0, 5);
        detectableObjects = new string[] { "wall", "goal", "block", "stone" };

        agentRB = GetComponent<Rigidbody>();
        shortBlockRB = shortBlock.GetComponent<Rigidbody>();
        spawnAreaBounds = spawnArea.GetComponent<Collider>().bounds;
        groundRenderer = ground.GetComponent<Renderer>();
        groundMaterial = groundRenderer.material;

        spawnArea.SetActive(false);
    }


    // Begin the jump sequence
    public void Jump()
    {
        jumpingTime = 0.2f;
        jumpStartingPos = agentRB.position;
    }

    /// <summary>
    /// Does the ground check.
    /// </summary>
    /// <returns><c>true</c>, if the agent is on the ground, 
    /// <c>false</c> otherwise.</returns>
    /// <param name="boxWidth">The width of the box used to perform 
    /// the ground check. </param>
    public bool DoGroundCheck(bool smallCheck)
    {
        if (!smallCheck)
        {
            hitGroundColliders = new Collider[3];
            Physics.OverlapBoxNonAlloc(
                gameObject.transform.position + new Vector3(0, -0.05f, 0),
                new Vector3(0.95f / 2f, 0.5f, 0.95f / 2f),
                hitGroundColliders,
                gameObject.transform.rotation);
            bool grounded = false;
            foreach (Collider col in hitGroundColliders)
            {

                if (col != null && col.transform != this.transform &&
                    (col.CompareTag("walkableSurface") ||
                     col.CompareTag("block") ||
                     col.CompareTag("wall")))
                {
                    grounded = true; //then we're grounded
                    break;
                }
            }
            return grounded;
        }
        else
        {

            RaycastHit hit;
            Physics.Raycast(transform.position + new Vector3(0, -0.05f, 0), -Vector3.up, out hit,
                1f);

            if (hit.collider != null &&
                (hit.collider.CompareTag("walkableSurface") ||
                 hit.collider.CompareTag("block") ||
                 hit.collider.CompareTag("wall"))
                && hit.normal.y > 0.95f)
            {
                return true;
            }

            return false;
        }
    }


    /// <summary>
    /// Moves  a rigidbody towards a position smoothly.
    /// </summary>
    /// <param name="targetPos">Target position.</param>
    /// <param name="rb">The rigidbody to be moved.</param>
    /// <param name="targetVel">The velocity to target during the
    ///  motion.</param>
    /// <param name="maxVel">The maximum velocity posible.</param>
    void MoveTowards(Vector3 targetPos, Rigidbody rb, float targetVel, float maxVel) {
        Vector3 moveToPos = targetPos - rb.worldCenterOfMass;
        Vector3 velocityTarget = moveToPos * targetVel * Time.fixedDeltaTime;
        if (float.IsNaN(velocityTarget.x) == false) {
            rb.velocity = Vector3.MoveTowards(
                rb.velocity, velocityTarget, maxVel);
        }
    }

    public override void CollectObservations() {
            float rayDistance = 20f;
            float[] rayAngles = { 0f, 45f, 90f, 135f, 180f, 110f, 70f };
            AddVectorObs(rayPer.Perceive(
                rayDistance, rayAngles, detectableObjects, 0f, 0f));
            AddVectorObs(rayPer.Perceive(
                rayDistance, rayAngles, detectableObjects, 2.5f, 2.5f));
            Vector3 agentPos = agentRB.position - ground.transform.position;

            AddVectorObs(agentPos / 20f);
            AddVectorObs(DoGroundCheck(true) ? 1 : 0);
    }

    /// <summary>
    /// Gets a random spawn position in the spawningArea.
    /// </summary>
    /// <returns>The random spawn position.</returns>
    public Vector3 GetRandomSpawnPos() {
        Vector3 randomSpawnPos = Vector3.zero;
        float randomPosX = Random.Range(-spawnAreaBounds.extents.x,
                                        spawnAreaBounds.extents.x);
        float randomPosZ = Random.Range(-spawnAreaBounds.extents.z,
                                        spawnAreaBounds.extents.z);

        randomSpawnPos = spawnArea.transform.position +
                                  new Vector3(randomPosX, 0.45f, randomPosZ);
        return randomSpawnPos;
    }

    /// <summary>
    /// Chenges the color of the ground for a moment
    /// </summary>
    /// <returns>The Enumerator to be used in a Coroutine</returns>
    /// <param name="mat">The material to be swaped.</param>
    /// <param name="time">The time the material will remain.</param>
    IEnumerator GoalScoredSwapGroundMaterial(Material mat, float time) {
        groundRenderer.material = mat;
        yield return new WaitForSeconds(time); //wait for 2 sec
        groundRenderer.material = groundMaterial;
    }


    public void MoveAgent(float[] act) {
        AddReward(-0.0005f);
        bool smallGrounded = DoGroundCheck(true);
        bool largeGrounded = DoGroundCheck(false);

        Vector3 dirToGo = Vector3.zero;
        Vector3 rotateDir = Vector3.zero;
        int dirToGoForwardAction = (int) act[0];
        int rotateDirAction = (int) act[1];
        int dirToGoSideAction = (int) act[2];
        int jumpAction = (int) act[3];

        if (dirToGoForwardAction==1)
            dirToGo = transform.forward * 1f * (largeGrounded ? 1f : 0.5f);
        else if (dirToGoForwardAction==2)
            dirToGo = transform.forward * -1f * (largeGrounded ? 1f : 0.5f);
        if (rotateDirAction==1)
            rotateDir = transform.up * -1f;
        else if (rotateDirAction==2)
            rotateDir = transform.up * 1f;
        if (dirToGoSideAction==1)
            dirToGo = transform.right * -0.6f * (largeGrounded ? 1f : 0.5f);
        else if (dirToGoSideAction==2)
            dirToGo = transform.right * 0.6f * (largeGrounded ? 1f : 0.5f);
        if (jumpAction == 1)
            if ((jumpingTime <= 0f) && smallGrounded) {
                Jump();
            }

        transform.Rotate(rotateDir, Time.fixedDeltaTime * 300f);
        agentRB.AddForce(dirToGo * academy.agentRunSpeed,
                         ForceMode.VelocityChange);

        if (jumpingTime > 0f) {
            jumpTargetPos =
            new Vector3(agentRB.position.x,
                        jumpStartingPos.y + academy.agentJumpHeight,
                        agentRB.position.z) + dirToGo;
            MoveTowards(jumpTargetPos, agentRB, academy.agentJumpVelocity,
                        academy.agentJumpVelocityMaxChange);

        }

        if (!(jumpingTime > 0f) && !largeGrounded) {
            agentRB.AddForce(
            Vector3.down * fallingForce, ForceMode.Acceleration);
        }
        jumpingTime -= Time.fixedDeltaTime;
    }

    public override void AgentAction(float[] vectorAction, string textAction) {
        MoveAgent(vectorAction);
        if ((!Physics.Raycast(agentRB.position, Vector3.down, 20))
            || (!Physics.Raycast(shortBlockRB.position, Vector3.down, 20))) {
            Done();
            SetReward(-1f);
            ResetBlock(shortBlockRB);
            ResetEnemy(enemy);
            StartCoroutine(
                GoalScoredSwapGroundMaterial(academy.failMaterial, .5f));
        }
        //if (Mathf.Abs(shortBlock.transform.position.z - wall.transform.position.z) <= 3.0
        //    && !together) { // help the player use the box
        //    AddReward(0.5f);
        //    together = true;
        //}
    }

    // Detect when the agent hits the enemy
    void OnCollisionEnter(Collision col) {
        if (col.gameObject.CompareTag("stone")) {
            Done();
            SetReward(-1f);
            ResetBlock(shortBlockRB);
            ResetEnemy(enemy);
            StartCoroutine(
                GoalScoredSwapGroundMaterial(academy.failMaterial, .5f));
            Debug.Log("I have killed the player");
        }
    }

    // Detect when the agent hits the goal
    void OnTriggerStay(Collider col) {
        if (col.gameObject.CompareTag("goal") && DoGroundCheck(true)) {
            SetReward(1f);
            Done();
            StartCoroutine(
                GoalScoredSwapGroundMaterial(academy.goalScoredMaterial, 2));
        }
    }

    //Reset the orange block position
    void ResetBlock(Rigidbody blockRB) {
        blockRB.transform.position = GetRandomSpawnPos();
        blockRB.velocity = Vector3.zero;
        blockRB.angularVelocity = Vector3.zero;
    }
    
    //Reset the enemy position
    void ResetEnemy(GameObject enemy) {
        if (configuration == 2 || configuration == 3) // CHANGE IT from 5 to 6
            enemy.transform.localPosition = new Vector3(0.0f, -100.0f, -5.0f);
        else
            enemy.transform.localPosition = new Vector3(0.0f, 1.5f, -5.0f);
    }

    public override void AgentReset() {
        ResetBlock(shortBlockRB);
        transform.localPosition = new Vector3(
            18 * (Random.value - 0.5f), 1, -12);
        configuration = Random.Range(0, 6); // CHANGE IT from 5 to 6
        ResetEnemy(enemy); // Reset Enemy
        agentRB.velocity = default(Vector3);
        //together = false;
    }

    private void FixedUpdate() {
        if (configuration != -1) {
            ConfigureAgent(configuration);
            configuration = -1;
        }
    }

    /// <summary>
    /// Configures the agent. Given an integer config, the wall will have
    /// different height and a different brain will be assigned to the agent.
    /// </summary>
    /// <param name="config">Config. 
    /// If 0 : No wall and noWallBrain.
    /// If 1:  Small wall and smallWallBrain.
    /// Other : Tall wall and BigWallBrain. </param>
    void ConfigureAgent(int config) {
        if (config == 0) {
            wall.transform.localScale = new Vector3(
                wall.transform.localScale.x,
                academy.resetParameters["no_wall_height"],
                wall.transform.localScale.z);
            GiveBrain(noWallBrain);
        }
        else if (config == 1) {
            wall.transform.localScale = new Vector3(
                wall.transform.localScale.x,
                academy.resetParameters["small_wall_height"],
                wall.transform.localScale.z);
            GiveBrain(smallWallBrain);
        }
        else {
            float height =
                academy.resetParameters["big_wall_min_height"] +
                Random.value * (academy.resetParameters["big_wall_max_height"] -
                academy.resetParameters["big_wall_min_height"]);
            wall.transform.localScale = new Vector3(
                wall.transform.localScale.x,
                height,
                wall.transform.localScale.z);
            GiveBrain(bigWallBrain);
        }
    }
}