# Mathias Babin - P3 Collaboration and Competition Testing

This is my implementation for solving the P2 Continuous Control project for [Udacity's Deep Reinforcement Learning course](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893). Details on the project are provided in the **README** for this repository. The purpose of this notebook is to watch a **finished** agent perform in this enviroment. If you wish to **train** an agent for yourself, please go to the **Continuous_Control** notebook included in this repository.


### 1. Setting up the Environment

The following cells will import various packages and sets up the environment, the first of which gaurentees that both [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/) have been installed correctly.

In [1]:
from unityagents import UnityEnvironment
from agent import Agent

from collections import deque
import numpy as np
import torch
import matplotlib.pyplot as plt
%matplotlib inline

The next cell simply sets up the Enviroment. **_IMPORTANT:_**  If the following cell opens a Unity Window that crashes, this is because the rest of the cells in the project are not being executed fast enough. To avoid this, please select **Restart & Run All** under **Kernal**. This will execute all the cells in the project.

In [2]:
env = UnityEnvironment(file_name="Tennis.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


### 2. Testing the Agent

Start by intializing values for the training of the agent, and loading the weights for the agent to use from the *checkpoint_actor_init.pth* and *checkpoint_critic_init.pth* files created by the **Continuous Control** notebook.

In [3]:
# Get brains from Unity ML
brain_name = env.brain_names[0] 
brain = env.brains[brain_name]

env_info = env.reset(train_mode=False)[brain_name] # reset the environment

num_agents = len(env_info.agents) # get number of agents

action_size = brain.vector_action_space_size # get action size

states = env_info.vector_observations
state_size = states.shape[1] # get state space size

# Initialize agents
agents = Agent(state_size=state_size, action_size=action_size, seed=10)

# load trained weights
agents.actor_local.load_state_dict(torch.load('checkpoint_actor.pth'))
agents.critic_local.load_state_dict(torch.load('checkpoint_critic.pth'))

Test the smart agents out, and display its final score.

In [4]:
env_info = env.reset(train_mode=False)[brain_name]         
states = env_info.vector_observations # Get initial state                
score = 0 # score for each episode
scores = np.zeros(num_agents) # scores that each agent recieves

while True:
    action1 = agents.act(states[0]) # action for agent 1
    action2 = agents.act(states[1]) # action for agent 2
    actions = np.random.randn(num_agents, action_size) # randomize actions
    actions[0] = action1 # replace random action with agent 1 action
    actions[1] = action2 # replace random action with agent 2 action
        
    env_info = env.step(actions)[brain_name] # step in the environment
    next_states = env_info.vector_observations # get next state
    rewards = env_info.rewards # get rewards from taking actions in current state
    dones = env_info.local_done # check if done
    scores += env_info.rewards # sum rewards as score
    states = next_states # prepare for next epsiode by setting a new state
    if np.any(dones): # exit if done episode
        break

score = np.max(scores) # score is largest of two agents
print("Final score: ", score)

Final score:  1.5000000223517418


Finally close the environment.

In [5]:
env.close()

### 3. Implementation Details

If you have any questions about the implementation details of this project please refer to the **Report.pdf** file included with this repository for a full explanation of both the algorithms and design decisions chosen.