# Learning Algorithm
For solving the Collaboration and Competition version of the Tennis environment, I decided to use the MADDPG algorithm. As part of the MADDPG implementation, I made use of:

 1. Priority Experience Replay ( Replay Buffer)
 2. Actor Critic models
 3. Soft update using local and target networks and controlling using hyperparameter TAU

# Hyperparameters
I experiaemented with following Hyperparameters in various runs and solved the environment using the values shown below:

 1. BATCH_SIZE = 1024 ( Size of the batch used for sampling)
 2. BUFFER_SIZE = 1e5 ( Replay buffer size aka memory size )
 3. TAU  = 0.1 ( To control how much the target network should be updated using the local network)
 4. GAMMA = 0.99 ( discount factor )
 5. ACTOR_LR = 1e-3 ( Learning rate used for actor model )
 6. CRITIC_LR = 1e-3 ( Learning rate used for critic model )
 7. UPDATE_EVERY = 100 (How frequent do we want to learn i.e at every UPDATE_EVERY steps)
 8. LEARN_TIMES = 1 ( how many times should we perform learning during every UPDATE_EVERY steps)

# Model Architecture
My solution uses Actor Critic models to implement MADDPG. As with any MADDPG algorithm, the local actor critic models have their respective copy i.e target models. 

1. Actor Model - My actor model is made up of 3 fully connected layers ( 48x256, 256x512, 512 x action_size ). The activation function in the final output layer is tanh, as we need our action values to be between -1 and 1. The other two layers use relu activation functions. 

2. Critic model - My critic model is again made up of 3 fully connected layers ( 48x256, 256+(action_size*2)x512, 512x1). The last layer does not have any activation function. The other 2 layers use relu activation function. 

# Implementation

In [1]:
from solution import CollaborationSolution
from unityagents import UnityEnvironment
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#env = UnityEnvironment(file_name="./Tennis_Windows_x86_64/Tennis.exe")
env = UnityEnvironment(file_name="./Tennis_Linux_NoVis/Tennis.x86_64")

try:
    sol = CollaborationSolution(env,enable_wandb=True)
    scores = sol.train(num_episodes=5000)
    if scores:
        # plot the scores
        fig = plt.figure()
        ax = fig.add_subplot(111)
        plt.plot(np.arange(len(scores)), scores)
        plt.ylabel('Score')
        plt.xlabel('Episode #')
        plt.show()
finally:
    env.close()


INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


ERROR:wandb.jupyter:Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
[34m[1mwandb[0m: Wandb version 0.10.12 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Number of agents: 2
Size of each action: 2
There are 2 agents. Each observes a state with length: 24
The state for the first agent looks like: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]




Episode:0, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.10000000149011612
Best avg_score_100_episodes seen in Episode:0
Episode:1, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.10000000149011612
Episode:2, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.06666666766007741
Episode:3, Total score (averaged over agents) this episode: 0.20000000298023224, Avg over 100 episodes: 0.10000000149011612
Episode:4, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.0800000011920929
Episode:5, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.08333333457509677
Episode:6, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.0857142869915281
Episode:7, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.07500000111758709
Episode:8, T

Episode:72, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.034246575852779494
Episode:73, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03378378428720139
Episode:74, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.033333333830038704
Episode:75, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.034210526825566044
Episode:76, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03376623426938986
Episode:77, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.03461538513119404
Episode:78, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.035443038502825965
Episode:79, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03500000052154064
Episode:80, Total score (averaged over agents) this episode: 0.0, Avg over 1

Episode:145, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.02500000037252903
Episode:146, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02500000037252903
Episode:147, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02500000037252903
Episode:148, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02500000037252903
Episode:149, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02500000037252903
Episode:150, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02500000037252903
Episode:151, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.02600000038743019
Episode:152, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.027000000402331352
Episode:153, Total score (averaged over agents) this episode: 0.0, Avg 

Episode:221, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.018000000268220902
Episode:222, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.018000000268220902
Episode:223, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.01700000025331974
Episode:224, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.018000000268220902
Episode:225, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.018000000268220902
Episode:226, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.018000000268220902
Episode:227, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.018000000268220902
Episode:228, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.018000000268220902
Episode:229, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 e

Episode:293, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.023000000342726708
Episode:294, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.02400000035762787
Episode:295, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02400000035762787
Episode:296, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.02500000037252903
Episode:297, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.02600000038743019
Episode:298, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02600000038743019
Episode:299, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02600000038743019
Episode:300, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02600000038743019
Episode:301, Total score (averaged over agents) this ep

Episode:367, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.020000000298023225
Episode:368, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.020000000298023225
Episode:369, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.020000000298023225
Episode:370, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.020000000298023225
Episode:371, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.020000000298023225
Episode:372, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.01700000025331974
Episode:373, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.01700000025331974
Episode:374, Total score (averaged over agents) this episode: 0.20000000298023224, Avg over 100 episodes: 0.019000000283122064
Episode:375, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.019000

Episode:440, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.030800000503659247
Episode:441, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.030800000503659247
Episode:442, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.030800000503659247
Episode:443, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.030800000503659247
Episode:444, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.029800000488758086
Episode:445, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.030800000503659247
Episode:446, Total score (averaged over agents) this episode: 0.20000000298023224, Avg over 100 episodes: 0.03280000053346157
Episode:447, Total score (averaged over agents) this episode: 0.20000000298023224, Avg over 100 episodes: 0.03480000056326389
Episode:448, Total score (averaged over agents) th

  Qloss = F.mse_loss(Qval,Qnext)


Episode:501, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03690000057220459
Episode:502, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03590000055730343
Episode:503, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03490000054240227
Episode:504, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03490000054240227
Episode:505, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03490000054240227
Episode:506, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03490000054240227
Episode:507, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.033900000527501105
Episode:508, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.033900000527501105
Episode:509, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.033900000527501105
Episode

Episode:573, Total score (averaged over agents) this episode: 0.20000000298023224, Avg over 100 episodes: 0.031000000461935996
Episode:574, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.030000000447034835
Episode:575, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.030000000447034835
Episode:576, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.029000000432133674
Episode:577, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.029000000432133674
Episode:578, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.029000000432133674
Episode:579, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.029000000432133674
Episode:580, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.02600000038743019
Episode:581, Total score (averaged over agents) this episode: 0.0, Avg over 100 e

Episode:644, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.03500000052154064
Episode:645, Total score (averaged over agents) this episode: 0.30000000447034836, Avg over 100 episodes: 0.03800000056624413
Episode:646, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03800000056624413
Episode:647, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03800000056624413
Episode:648, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03800000056624413
Episode:649, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.03700000055134296
Episode:650, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.03700000055134296
Episode:651, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.036000000536441805
Episode:652, Total score (averaged over agents) this episode: 0.0, Avg 

Episode:718, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.023000000342726708
Episode:719, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.022000000327825547
Episode:720, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.022000000327825547
Episode:721, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.022000000327825547
Episode:722, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.022000000327825547
Episode:723, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.022000000327825547
Episode:724, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.021000000312924386
Episode:725, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.021000000312924386
Episode:726, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.020000000298023225
E

Episode:790, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.012000000178813934
Episode:791, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.013000000193715095
Episode:792, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.014000000208616257
Episode:793, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.015000000223517418
Episode:794, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.01600000023841858
Episode:795, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.01600000023841858
Episode:796, Total score (averaged over agents) this episode: 0.10000000149011612, Avg over 100 episodes: 0.01700000025331974
Episode:797, Total score (averaged over agents) this episode: 0.0, Avg over 100 episodes: 0.01700000025331974
Episode:798, Total score (averaged o

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

# Graphs
Here are few more graphs showing the learning scores per episode and average score for 100 episodes

![Learning graph](./images/learning-graph.jpg)



# Try the trained model

In [None]:
from solution import CollaborationSolution
from unityagents import UnityEnvironment
import numpy as np

env = UnityEnvironment(file_name="./Tennis_Windows_x86_64/Tennis.exe")

try:
    sol = CollaborationSolution(env,enable_wandb=False)
    sol.watch_trained('./checkpoints/checkpoint-0.093000001385808')
finally:
    env.close()


# Video of the trained model
%%HTML
<div align="middle">
      <video width="80%" controls>
            <source src="./video.mp4" type="video/mp4">
      </video>
</div>

# Ideas for future Work
This project was a tricky one. It look me many tries to figure out Multi Agent setup for DDPG. The learning process also currently requires 5+ learnings per episode. This is slowing down the learning. 

The learning was also not statble at times. Few things for next improvements:

1. The MADDPG paper talks about few improvements to MADDPG like using Policy Ensembles and Inferring policies of other Agents. I will like to explore these further. 
2. I will like to try out other algorithms like TRPO in MA to solve this environment. 