# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [2]:
from unityagents import UnityEnvironment
import numpy as np
import random
import torch
from collections import deque
import time

import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

from maddpg_agent import Agent

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [3]:
env = UnityEnvironment(file_name="Tennis_Linux/Tennis.x86_64")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [4]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1.  If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01.  Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping. 

Run the code cell below to print some information about the environment.

In [5]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents 
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 2
Size of each action: 2
There are 2 agents. Each observes a state with length: 24
The state for the first agent looks like: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agents and receive feedback from the environment.

Once this cell is executed, you will watch the agents' performance, if they select actions at random with each time step.  A window should pop up that allows you to observe the agents.

Of course, as part of the project, you'll have to change the code so that the agents are able to use their experiences to gradually choose better actions when interacting with the environment!

In [None]:
for i in range(1, 6):                                      # play game for 5 episodes
    env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
    states = env_info.vector_observations                  # get the current state (for each agent)
    scores = np.zeros(num_agents)                          # initialize the score (for each agent)
    while True:
        actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
        actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
        env_info = env.step(actions)[brain_name]           # send all actions to tne environment
        next_states = env_info.vector_observations         # get next state (for each agent)
        rewards = env_info.rewards                         # get reward (for each agent)
        dones = env_info.local_done                        # see if episode finished
        scores += env_info.rewards                         # update the score (for each agent)
        states = next_states                               # roll over states to next time step
        if np.any(dones):                                  # exit loop if episode finished
            break
    print('Score (max over agents) from episode {}: {}'.format(i, np.max(scores)))

When finished, you can close the environment.

### 4. Train the agent

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [8]:
# MADDPG function

N_EPISODES = 2000
CONSEC_EPISODES = 100
PRINT_EVERY = 1
ADD_NOISE = True
SOLVED_SCORE = 0.5

# initialize agents
agent_0 = Agent(state_size, action_size, num_agents=1, random_seed=0)
agent_1 = Agent(state_size, action_size, num_agents=1, random_seed=0)

def maddpg(n_episodes=N_EPISODES, max_t=1000, train_mode=True):
    """Multi-Agent Deep Deterministic Policy Gradient (MADDPG)
    
    Params
    ======
        n_episodes (int)      : maximum number of training episodes
        max_t (int)           : maximum number of timesteps per episode
        train_mode (bool)     : if 'True' set environment to training mode

    """
    scores_window = deque(maxlen=CONSEC_EPISODES)
    scores_all = []
    moving_average = []
    best_score = -np.inf
    best_episode = 0
    already_solved = False    

    for i_episode in range(1, n_episodes+1):
        env_info = env.reset(train_mode=train_mode)[brain_name]         # reset the environment
        states = np.reshape(env_info.vector_observations, (1,48)) # get states and combine them #! replace 48 by states.shape[0] * state_size
        agent_0.reset()
        agent_1.reset()
        scores = np.zeros(num_agents)
        for t in range(max_t):
            actions = get_actions(states, ADD_NOISE)           # choose agent actions and combine them
            env_info = env.step(actions)[brain_name]           # send both agents' actions together to the environment
            next_states = np.reshape(env_info.vector_observations, (1, 48)) # combine the agent next states
            rewards = env_info.rewards                         # get reward
            done = env_info.local_done                         # see if episode finished
            agent_0.step(states, actions, rewards[0], next_states, done, 0, t) # agent 1 learns
            agent_1.step(states, actions, rewards[1], next_states, done, 1, t) # agent 2 learns
            scores += np.max(rewards)                          # update the score for each agent
            states = next_states                               # roll over states to next time step
            if np.any(done):                                   # exit loop if episode finished
                break

        ep_best_score = np.max(scores)
        scores_window.append(ep_best_score)
        scores_all.append(ep_best_score)
        moving_average.append(np.mean(scores_window))

        # save best score                        
        if ep_best_score > best_score:
            best_score = ep_best_score
            best_episode = i_episode
        
        # print results
        if i_episode % PRINT_EVERY == 0:
            print('Episodes {:0>4d}-{:0>4d}\tMax Reward: {:.3f}\tMoving Average: {:.3f}'.format(
                i_episode-PRINT_EVERY, i_episode, np.max(scores_all[-PRINT_EVERY:]), moving_average[-1]))

        # determine if environment is solved and keep best performing models
        if moving_average[-1] >= SOLVED_SCORE:
            if not already_solved:
                print('<-- Environment solved in {:d} episodes! \
                \n<-- Moving Average: {:.3f} over past {:d} episodes'.format(
                    i_episode-CONSEC_EPISODES, moving_average[-1], CONSEC_EPISODES))
                already_solved = True
                # save weights
                torch.save(agent_0.actor_local.state_dict(), 'checkpoint_actor_1.pth')
                torch.save(agent_0.critic_local.state_dict(), 'checkpoint_critic_1.pth')
                torch.save(agent_1.actor_local.state_dict(), 'checkpoint_actor_2.pth')
                torch.save(agent_1.critic_local.state_dict(), 'checkpoint_critic_2.pth')
            elif ep_best_score >= best_score:
                print('<-- Best episode so far!\
                \nEpisode {:0>4d}\tMax Reward: {:.3f}\tMoving Average: {:.3f}'.format(
                i_episode, ep_best_score, moving_average[-1]))
                # save weights
                torch.save(agent_0.actor_local.state_dict(), 'checkpoint_actor_1.pth')
                torch.save(agent_0.critic_local.state_dict(), 'checkpoint_critic_1.pth')
                torch.save(agent_1.actor_local.state_dict(), 'checkpoint_actor_2.pth')
                torch.save(agent_1.critic_local.state_dict(), 'checkpoint_critic_2.pth')
            elif (i_episode-best_episode) >= 200:
                # stop training if model stops converging
                print('<-- Training stopped. Best score not matched or exceeded for 200 episodes')
                break
            else:
                continue
            
    return scores_all, moving_average

def get_actions(states, add_noise):
    '''gets actions for each agent and then combines them into one array'''
    action_0 = agent_0.act(states, add_noise)    # agent 0 chooses an action
    action_1 = agent_1.act(states, add_noise)    # agent 1 chooses an action
    return np.concatenate((action_0, action_1), axis=0).flatten()

In [None]:
scores, avgs = maddpg()

# plot the scores
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores)), scores, label='MADDPG')
plt.plot(np.arange(len(scores)), avgs, c='r', label='moving avg')
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.legend(loc='upper left');
plt.show()

Episodes 0000-0001	Max Reward: 0.000	Moving Average: 0.000
Episodes 0001-0002	Max Reward: 0.200	Moving Average: 0.100
Episodes 0002-0003	Max Reward: 0.000	Moving Average: 0.067
Episodes 0003-0004	Max Reward: 0.000	Moving Average: 0.050
Episodes 0004-0005	Max Reward: 0.000	Moving Average: 0.040
Episodes 0005-0006	Max Reward: 0.300	Moving Average: 0.083
Episodes 0006-0007	Max Reward: 0.000	Moving Average: 0.071
Episodes 0007-0008	Max Reward: 0.000	Moving Average: 0.063
Episodes 0008-0009	Max Reward: 0.000	Moving Average: 0.056
Episodes 0009-0010	Max Reward: 0.000	Moving Average: 0.050
Episodes 0010-0011	Max Reward: 0.000	Moving Average: 0.045
Episodes 0011-0012	Max Reward: 0.000	Moving Average: 0.042
Episodes 0012-0013	Max Reward: 0.000	Moving Average: 0.038
Episodes 0013-0014	Max Reward: 0.000	Moving Average: 0.036
Episodes 0014-0015	Max Reward: 0.000	Moving Average: 0.033
Episodes 0015-0016	Max Reward: 0.000	Moving Average: 0.031
Episodes 0016-0017	Max Reward: 0.000	Moving Average: 0.0

Episodes 0139-0140	Max Reward: 0.000	Moving Average: 0.009
Episodes 0140-0141	Max Reward: 0.000	Moving Average: 0.009
Episodes 0141-0142	Max Reward: 0.000	Moving Average: 0.009
Episodes 0142-0143	Max Reward: 0.000	Moving Average: 0.009
Episodes 0143-0144	Max Reward: 0.000	Moving Average: 0.009
Episodes 0144-0145	Max Reward: 0.100	Moving Average: 0.010
Episodes 0145-0146	Max Reward: 0.000	Moving Average: 0.010
Episodes 0146-0147	Max Reward: 0.100	Moving Average: 0.011
Episodes 0147-0148	Max Reward: 0.000	Moving Average: 0.011
Episodes 0148-0149	Max Reward: 0.000	Moving Average: 0.011
Episodes 0149-0150	Max Reward: 0.000	Moving Average: 0.011
Episodes 0150-0151	Max Reward: 0.100	Moving Average: 0.012
Episodes 0151-0152	Max Reward: 0.000	Moving Average: 0.012
Episodes 0152-0153	Max Reward: 0.000	Moving Average: 0.012
Episodes 0153-0154	Max Reward: 0.000	Moving Average: 0.012
Episodes 0154-0155	Max Reward: 0.100	Moving Average: 0.013
Episodes 0155-0156	Max Reward: 0.000	Moving Average: 0.0

Episodes 0278-0279	Max Reward: 0.200	Moving Average: 0.054
Episodes 0279-0280	Max Reward: 0.100	Moving Average: 0.055
Episodes 0280-0281	Max Reward: 0.300	Moving Average: 0.058
Episodes 0281-0282	Max Reward: 0.300	Moving Average: 0.061
Episodes 0282-0283	Max Reward: 0.100	Moving Average: 0.060
Episodes 0283-0284	Max Reward: 0.000	Moving Average: 0.059
Episodes 0284-0285	Max Reward: 0.000	Moving Average: 0.058
Episodes 0285-0286	Max Reward: 0.000	Moving Average: 0.058
Episodes 0286-0287	Max Reward: 0.000	Moving Average: 0.057
Episodes 0287-0288	Max Reward: 0.000	Moving Average: 0.057
Episodes 0288-0289	Max Reward: 0.000	Moving Average: 0.055
Episodes 0289-0290	Max Reward: 0.000	Moving Average: 0.055
Episodes 0290-0291	Max Reward: 0.000	Moving Average: 0.054
Episodes 0291-0292	Max Reward: 0.100	Moving Average: 0.054
Episodes 0292-0293	Max Reward: 0.000	Moving Average: 0.053
Episodes 0293-0294	Max Reward: 0.000	Moving Average: 0.053
Episodes 0294-0295	Max Reward: 0.000	Moving Average: 0.0

Episodes 0417-0418	Max Reward: 0.000	Moving Average: 0.065
Episodes 0418-0419	Max Reward: 0.000	Moving Average: 0.064
Episodes 0419-0420	Max Reward: 0.000	Moving Average: 0.062
Episodes 0420-0421	Max Reward: 0.000	Moving Average: 0.062
Episodes 0421-0422	Max Reward: 0.000	Moving Average: 0.062
Episodes 0422-0423	Max Reward: 0.000	Moving Average: 0.060
Episodes 0423-0424	Max Reward: 0.000	Moving Average: 0.060
Episodes 0424-0425	Max Reward: 0.000	Moving Average: 0.059
Episodes 0425-0426	Max Reward: 0.000	Moving Average: 0.057
Episodes 0426-0427	Max Reward: 0.000	Moving Average: 0.055
Episodes 0427-0428	Max Reward: 0.000	Moving Average: 0.053
Episodes 0428-0429	Max Reward: 0.000	Moving Average: 0.053
Episodes 0429-0430	Max Reward: 0.000	Moving Average: 0.052
Episodes 0430-0431	Max Reward: 0.100	Moving Average: 0.049
Episodes 0431-0432	Max Reward: 0.000	Moving Average: 0.048
Episodes 0432-0433	Max Reward: 0.000	Moving Average: 0.047
Episodes 0433-0434	Max Reward: 0.100	Moving Average: 0.0

Episodes 0556-0557	Max Reward: 0.100	Moving Average: 0.044
Episodes 0557-0558	Max Reward: 0.000	Moving Average: 0.044
Episodes 0558-0559	Max Reward: 0.100	Moving Average: 0.045
Episodes 0559-0560	Max Reward: 0.400	Moving Average: 0.049
Episodes 0560-0561	Max Reward: 0.100	Moving Average: 0.050
Episodes 0561-0562	Max Reward: 0.000	Moving Average: 0.050
Episodes 0562-0563	Max Reward: 0.000	Moving Average: 0.050
Episodes 0563-0564	Max Reward: 0.000	Moving Average: 0.050
Episodes 0564-0565	Max Reward: 0.000	Moving Average: 0.049
Episodes 0565-0566	Max Reward: 0.200	Moving Average: 0.051
Episodes 0566-0567	Max Reward: 0.400	Moving Average: 0.055
Episodes 0567-0568	Max Reward: 0.100	Moving Average: 0.056
Episodes 0568-0569	Max Reward: 0.200	Moving Average: 0.058
Episodes 0569-0570	Max Reward: 0.000	Moving Average: 0.058
Episodes 0570-0571	Max Reward: 0.100	Moving Average: 0.059
Episodes 0571-0572	Max Reward: 0.000	Moving Average: 0.059
Episodes 0572-0573	Max Reward: 0.100	Moving Average: 0.0

Episodes 0695-0696	Max Reward: 0.100	Moving Average: 0.095
Episodes 0696-0697	Max Reward: 0.000	Moving Average: 0.095
Episodes 0697-0698	Max Reward: 0.000	Moving Average: 0.095
Episodes 0698-0699	Max Reward: 0.000	Moving Average: 0.095
Episodes 0699-0700	Max Reward: 0.100	Moving Average: 0.096
Episodes 0700-0701	Max Reward: 0.100	Moving Average: 0.095
Episodes 0701-0702	Max Reward: 0.100	Moving Average: 0.095
Episodes 0702-0703	Max Reward: 0.100	Moving Average: 0.096
Episodes 0703-0704	Max Reward: 0.100	Moving Average: 0.096
Episodes 0704-0705	Max Reward: 0.100	Moving Average: 0.095
Episodes 0705-0706	Max Reward: 0.200	Moving Average: 0.096
Episodes 0706-0707	Max Reward: 0.100	Moving Average: 0.097
Episodes 0707-0708	Max Reward: 0.100	Moving Average: 0.097
Episodes 0708-0709	Max Reward: 0.200	Moving Average: 0.098
Episodes 0709-0710	Max Reward: 0.100	Moving Average: 0.099
Episodes 0710-0711	Max Reward: 0.100	Moving Average: 0.100
Episodes 0711-0712	Max Reward: 0.200	Moving Average: 0.1

Episodes 0834-0835	Max Reward: 0.100	Moving Average: 0.073
Episodes 0835-0836	Max Reward: 0.200	Moving Average: 0.074
Episodes 0836-0837	Max Reward: 0.000	Moving Average: 0.073
Episodes 0837-0838	Max Reward: 0.000	Moving Average: 0.071
Episodes 0838-0839	Max Reward: 0.100	Moving Average: 0.071
Episodes 0839-0840	Max Reward: 0.100	Moving Average: 0.072
Episodes 0840-0841	Max Reward: 0.100	Moving Average: 0.071
Episodes 0841-0842	Max Reward: 0.000	Moving Average: 0.070
Episodes 0842-0843	Max Reward: 0.000	Moving Average: 0.070
Episodes 0843-0844	Max Reward: 0.000	Moving Average: 0.070
Episodes 0844-0845	Max Reward: 0.100	Moving Average: 0.070
Episodes 0845-0846	Max Reward: 0.000	Moving Average: 0.069
Episodes 0846-0847	Max Reward: 0.200	Moving Average: 0.070
Episodes 0847-0848	Max Reward: 0.100	Moving Average: 0.070
Episodes 0848-0849	Max Reward: 0.000	Moving Average: 0.069
Episodes 0849-0850	Max Reward: 0.300	Moving Average: 0.071
Episodes 0850-0851	Max Reward: 0.100	Moving Average: 0.0

Episodes 0973-0974	Max Reward: 0.300	Moving Average: 0.114
Episodes 0974-0975	Max Reward: 0.200	Moving Average: 0.115
Episodes 0975-0976	Max Reward: 0.100	Moving Average: 0.114
Episodes 0976-0977	Max Reward: 0.100	Moving Average: 0.114
Episodes 0977-0978	Max Reward: 0.100	Moving Average: 0.114
Episodes 0978-0979	Max Reward: 0.100	Moving Average: 0.114
Episodes 0979-0980	Max Reward: 0.300	Moving Average: 0.117
Episodes 0980-0981	Max Reward: 0.100	Moving Average: 0.117
Episodes 0981-0982	Max Reward: 0.300	Moving Average: 0.118
Episodes 0982-0983	Max Reward: 0.100	Moving Average: 0.119
Episodes 0983-0984	Max Reward: 0.100	Moving Average: 0.120
Episodes 0984-0985	Max Reward: 0.100	Moving Average: 0.121
Episodes 0985-0986	Max Reward: 0.100	Moving Average: 0.120
Episodes 0986-0987	Max Reward: 0.000	Moving Average: 0.119
Episodes 0987-0988	Max Reward: 0.100	Moving Average: 0.120
Episodes 0988-0989	Max Reward: 0.200	Moving Average: 0.122
Episodes 0989-0990	Max Reward: 0.100	Moving Average: 0.1

Episodes 1112-1113	Max Reward: 0.400	Moving Average: 0.259
Episodes 1113-1114	Max Reward: 0.000	Moving Average: 0.255
Episodes 1114-1115	Max Reward: 1.200	Moving Average: 0.266
Episodes 1115-1116	Max Reward: 0.100	Moving Average: 0.267
Episodes 1116-1117	Max Reward: 0.300	Moving Average: 0.268
Episodes 1117-1118	Max Reward: 2.700	Moving Average: 0.293
Episodes 1118-1119	Max Reward: 0.500	Moving Average: 0.297
Episodes 1119-1120	Max Reward: 0.300	Moving Average: 0.296
Episodes 1120-1121	Max Reward: 1.000	Moving Average: 0.306
Episodes 1121-1122	Max Reward: 2.500	Moving Average: 0.330
Episodes 1122-1123	Max Reward: 0.100	Moving Average: 0.327
Episodes 1123-1124	Max Reward: 0.000	Moving Average: 0.327
Episodes 1124-1125	Max Reward: 0.000	Moving Average: 0.327
Episodes 1125-1126	Max Reward: 0.000	Moving Average: 0.323
Episodes 1126-1127	Max Reward: 0.000	Moving Average: 0.321
Episodes 1127-1128	Max Reward: 0.100	Moving Average: 0.322
Episodes 1128-1129	Max Reward: 0.000	Moving Average: 0.3

Episodes 1251-1252	Max Reward: 0.100	Moving Average: 0.375
Episodes 1252-1253	Max Reward: 0.200	Moving Average: 0.374
Episodes 1253-1254	Max Reward: 0.100	Moving Average: 0.373
Episodes 1254-1255	Max Reward: 0.100	Moving Average: 0.373
Episodes 1255-1256	Max Reward: 0.100	Moving Average: 0.371
Episodes 1256-1257	Max Reward: 0.100	Moving Average: 0.367
Episodes 1257-1258	Max Reward: 1.300	Moving Average: 0.371
Episodes 1258-1259	Max Reward: 0.500	Moving Average: 0.374
Episodes 1259-1260	Max Reward: 0.100	Moving Average: 0.372
Episodes 1260-1261	Max Reward: 0.700	Moving Average: 0.376
Episodes 1261-1262	Max Reward: 0.000	Moving Average: 0.375
Episodes 1262-1263	Max Reward: 0.500	Moving Average: 0.379
Episodes 1263-1264	Max Reward: 0.300	Moving Average: 0.379
Episodes 1264-1265	Max Reward: 1.700	Moving Average: 0.396
Episodes 1265-1266	Max Reward: 0.100	Moving Average: 0.395
Episodes 1266-1267	Max Reward: 0.000	Moving Average: 0.395
Episodes 1267-1268	Max Reward: 0.300	Moving Average: 0.3

Episodes 1384-1385	Max Reward: 0.200	Moving Average: 0.930
Episodes 1385-1386	Max Reward: 0.100	Moving Average: 0.930
Episodes 1386-1387	Max Reward: 0.200	Moving Average: 0.931
Episodes 1387-1388	Max Reward: 0.700	Moving Average: 0.937
Episodes 1388-1389	Max Reward: 0.500	Moving Average: 0.941
Episodes 1389-1390	Max Reward: 2.500	Moving Average: 0.965
Episodes 1390-1391	Max Reward: 0.100	Moving Average: 0.964
Episodes 1391-1392	Max Reward: 0.300	Moving Average: 0.964


### 5. Watch the trained agents

In [None]:
N_EPISODES = 10
PRINT_EVERY = 1
CONSEC_EPISODES = 10
ADD_NOISE = False

In [None]:
## reinitialize the agents (if needed)
agent_0 = Agent(state_size, action_size, num_agents=1, random_seed=0)
agent_1 = Agent(state_size, action_size, num_agents=1, random_seed=0)

# load the weights from file
agent_0_weights = 'checkpoint_actor_1.pth'
agent_1_weights = 'checkpoint_actor_2.pth'
agent_0.actor_local.load_state_dict(torch.load(agent_0_weights))
agent_1.actor_local.load_state_dict(torch.load(agent_1_weights))   

In [None]:
def test(n_episodes=N_EPISODES, max_t=1000, train_mode=False):

    scores_window = deque(maxlen=CONSEC_EPISODES)
    scores_all = []
    moving_average = []  

    for i_episode in range(1, n_episodes+1):
        env_info = env.reset(train_mode=train_mode)[brain_name]         # reset the environment
        states = np.reshape(env_info.vector_observations, (1,48)) # get states and combine them
        scores = np.zeros(num_agents)
        while True:
            actions = get_actions(states, ADD_NOISE)           # choose agent actions and combine them
            env_info = env.step(actions)[brain_name]           # send both agents' actions together to the environment
            next_states = np.reshape(env_info.vector_observations, (1, 48)) # combine the agent next states
            rewards = env_info.rewards                         # get reward
            done = env_info.local_done                         # see if episode finished
            scores += np.max(rewards)                          # update the score for each agent
            states = next_states                               # roll over states to next time step
            if np.any(done):                                   # exit loop if episode finished
                break

        ep_best_score = np.max(scores)
        scores_window.append(ep_best_score)
        scores_all.append(ep_best_score)
        moving_average.append(np.mean(scores_window))

        # print results
        if i_episode % PRINT_EVERY == 0:
            print('Episodes {:0>4d}-{:0>4d}\tMax Reward: {:.3f}\tMoving Average: {:.3f}'.format(
                i_episode-PRINT_EVERY, i_episode, np.max(scores_all[-PRINT_EVERY:]), moving_average[-1]))
            
    return scores_all, moving_average  

In [None]:
scores, avgs = test()

# plot the scores
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(np.arange(len(scores)), scores, label='MADDPG')
plt.plot(np.arange(len(scores)), avgs, c='r', label='moving avg')
plt.ylabel('Score')
plt.xlabel('Episode #')
plt.legend(loc='upper left');
plt.show()

### 6. Close the environment

In [None]:
env.close()