# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
import os

import torch
from unityagents import UnityEnvironment
import numpy as np
from multiAgent import MultiAgent
import matplotlib.pyplot as plt

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [2]:
env = UnityEnvironment(file_name=os.getcwd() + "/Tennis_Windows_x86_64/Tennis.exe")

# ## Setting
state = 'Train'
# state = 'Test'
mode = 'slow'
# mode = 'fast'

# ## Train & Test
multi_agent = MultiAgent(env=env, state_size=8, action_size=2, random_seed=27)
if state == 'Train':
    scores = multi_agent.train(n_episodes=3000, max_t=3000)
    plt.plot(scores)
else: # Test     
    multi_agent.agents[0].actor_local.load_state_dict(torch.load(os.getcwd() + '/saved_model/saved_agent_1_TD3_actor.pth'))
    multi_agent.agents[0].critic_local_1.load_state_dict(torch.load(os.getcwd() + '/saved_model/saved_agent_1_TD3_critic_1.pth'))
    multi_agent.agents[0].critic_local_2.load_state_dict(torch.load(os.getcwd() + '/saved_model/saved_agent_1_TD3_critic_2.pth'))
    multi_agent.agents[1].actor_local.load_state_dict(torch.load(os.getcwd() + '/saved_model/saved_agent_2_TD3_actor.pth'))
    multi_agent.agents[1].critic_local_1.load_state_dict(torch.load(os.getcwd() + '/saved_model/saved_agent_2_TD3_critic_1.pth'))
    multi_agent.agents[1].critic_local_2.load_state_dict(torch.load(os.getcwd() + '/saved_model/saved_agent_2_TD3_critic_2.pth'))
    print('====================================')
    print('Sucessfully loaded')
    print('====================================')

    num_agents = 2
    score_test = []                                  
    for test_episode in range(1, 100+1):
        train_mode = True if mode == 'fast' else False
        env_info = multi_agent.env.reset(train_mode=train_mode)[multi_agent.brain_name]      # reset the environment
        state = env_info.vector_observations[:, -8:]                 # get the current state
        scores = np.zeros(num_agents)                                # initialize the score
        while True:
            action = multi_agent.act(state, add_noise=False)         # select an action
            env_info = env.step(action)[multi_agent.brain_name]      # send the action to the environment
            next_state = env_info.vector_observations[:, -8:]        # get the next state
            reward = env_info.rewards                                # get the reward
            done = env_info.local_done                               # see if episode has finished
            scores += reward                                         # update the score
            state = next_state                                       # roll over the state to next time step
            if any(done):                                            # exit loop if episode finished
                score_test.append(np.max(scores))
                print('\rEpisode {}\tAverage Score: {:.2f}'.format(test_episode, np.mean(score_test)), end="")
                score_temp = 0.
                break
        
    print("\nFinal Score: {}".format(np.mean(score_test)))
    multi_agent.env.close()
env.close()

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


Episode 50	Average Score : 0.0020 	 eps : 0.989
Episode 100	Average Score : 0.0040 	 eps : 0.972
Episode 150	Average Score : 0.0172 	 eps : 0.952
Episode 200	Average Score : 0.0197 	 eps : 0.934
Episode 250	Average Score : 0.0304 	 eps : 0.910
Episode 300	Average Score : 0.0562 	 eps : 0.885
Episode 350	Average Score : 0.0729 	 eps : 0.858
Episode 400	Average Score : 0.0859 	 eps : 0.831
Episode 450	Average Score : 0.0917 	 eps : 0.802
Episode 500	Average Score : 0.0935 	 eps : 0.774
Episode 550	Average Score : 0.0906 	 eps : 0.746
Episode 600	Average Score : 0.0921 	 eps : 0.718
Episode 650	Average Score : 0.0938 	 eps : 0.690
Episode 700	Average Score : 0.0928 	 eps : 0.662
Episode 750	Average Score : 0.0932 	 eps : 0.632
Episode 800	Average Score : 0.1139 	 eps : 0.592
Episode 850	Average Score : 0.1227 	 eps : 0.556
Episode 900	Average Score : 0.1573 	 eps : 0.494
Episode 950	Average Score : 0.3757 	 eps : 0.326
Episode 984	Average Score : 0.5201 	 eps : 0.208
Environment solved in

FileNotFoundError: [Errno 2] No such file or directory: '../saved_model/saved_agent_1_TD3_actor.pth'

You can select training mode for training by yourself by setting `state='Train'`. Otherwise you can just see the saved multi-agents by setting `state='Test'`. Here if you want to see agent's action slowly to see the result of 100 consecutive rewards, then set `mode='slow'`, else if fastly, `mode='fast'`