# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [2]:
env_file_name = "Tennis_Windows_x86_64/Tennis.exe"
# env = UnityEnvironment(file_name=env_file_name)
env = UnityEnvironment(file_name=env_file_name,no_graphics=True)

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1.  If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01.  Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents 
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])
print('states shape : ',states.shape)
print('Both states look like : ',states)
print(2*states)

Number of agents: 2
Size of each action: 2
There are 2 agents. Each observes a state with length: 24
The state for the first agent looks like: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]
states shape :  (2, 24)
Both states look like :  [[ 0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.         -6.65278625 -1.5
  -0.          0.          6.83172083  6.         -0.          0.        ]
 [ 0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.         -6.4669857  -1.5
   0.          0.         -6.83172083  6.          0.          0.

### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agents and receive feedback from the environment.

Once this cell is executed, you will watch the agents' performance, if they select actions at random with each time step.  A window should pop up that allows you to observe the agents.

Of course, as part of the project, you'll have to change the code so that the agents are able to use their experiences to gradually choose better actions when interacting with the environment!

In [5]:
if False:
    total_scores = []
    for i in range(100):                                        # play game for 5 episodes
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        states = env_info.vector_observations                  # get the current state (for each agent)
        scores = np.zeros(num_agents)                          # initialize the score (for each agent)
        t = 0
        while True:
            actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
            actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
            # print('actions : ',actions)
            env_info = env.step(actions)[brain_name]           # send all actions to tne environment
            t += 1
            next_states = env_info.vector_observations         # get next state (for each agent)
            rewards = env_info.rewards                         # get reward (for each agent)
            dones = env_info.local_done                        # see if episode finished
            scores += env_info.rewards                         # update the score (for each agent)
            states = next_states                               # roll over states to next time step
            if np.any(dones):                                  # exit loop if episode finished
                break
        print('Score (max over agents) from episode {}: {}, and {} steps taken'.format(i, np.max(scores),t))
        print(scores)
        total_scores.append(scores)
    print('Average Random Score : ', np.mean(total_scores))
        
def plot_results(results):
    import matplotlib.pyplot as plt
    import torch
    plt.ion()

    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(results.all_rewards)), [np.sum(ar) for ar in results.all_rewards])
    plt.plot(np.arange(len(results.avg_rewards)), results.avg_rewards)
    plt.ylabel('Rewards')
    plt.xlabel('Episode #')
    plt.show()

    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(results.critic_loss)), results.critic_loss)
    plt.ylabel('critic_losses')
    plt.xlabel('Learn Step #')
    plt.show()

    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(results.actor_loss)), results.actor_loss)
    plt.ylabel('actor_losses')
    plt.xlabel('Learn Step #')
    plt.show()

# env.close()

When finished, you can close the environment.

In [None]:
from maddpg import maddpg
import cProfile
DoProfile = False

config = {
    'gamma'               : 0.99,
    'tau'                 : 0.01,
    'action_size'         : action_size,
    'state_size'          : state_size,
    'hidden_size'         : 256,
    'buffer_size'         : 50000,
    'batch_size'          : 256,
    'seed'                : 40,
    'max_episodes'        : 1000,
    'dropout'             : 0.01,      # currently not active
    'learn_every'         : 1,
    'learn_num'           : 2,
    'critic_learning_rate': 1e-3,
    'actor_learning_rate' : 1e-3,
    'noise_decay'         : 0.999,
    'sigma'               : 0.2,
    'num_agents'          : num_agents,
    'env_file_name'       : env_file_name,
    'load_model'          : True,
    'save_model'          : True,
    'train_mode'          : True,
    'brain_name'          : brain_name}

def print_config(config):
    print('Config Parameters    : ')
    for c,k in config.items():
        print('{:20s} : {}'.format(c,k))

config_list = []
result_list = []
var_range = []
# batch = [512,1024]
# nd = [0.999, 0.998]
# for l in learn:
    # for b in batch:
        # var_range.append([l,b])
        # for h in hidden:
            # for n in nd:
var_range = [0.05] #, 0.1, 0.15, 0.2, 0.4, 0.6] # , 0.999975] # , 0.6, 0.7, 0.8] #, 0.45, 0.5]
# var_range = [0.9998, 0.9999, 0.99995] # , 0.0003, 0.0005, 0.001]# [0.2, 0.25, 0.3]
selected_seeds = [1,2,3,4,5] # [31,36,43,44] # 24,26, 33] # [8,16] # [7,9,13,15]
# [8,16] for learn2,sig6, hidden256, batch 256, ind
# [41,43,48,50,54,55,57,62,64,67,77,86]
# num_runs = 50
for param in range(len(var_range)):
    alt_config = config.copy()
    # alt_config['sigma'] = var_range[param]
    # alt_config['noise_decay'] = var_range[param]
    # alt_config['noise_scale_trigger'] = var_range[param]
    # alt_config['actor_learning_rate'] = var_range[param]
    # alt_config['learn_every_low'] = var_range[param][0]
    num_runs = len(selected_seeds)
    for main in range(num_runs):#len(tau_range)):
        print('-------------------------------------')
        print('New Run :')
        print('-------------------------------------')
        # env = UnityEnvironment(file_name=env_file_name,no_graphics=True)
        # brain_name = env.brain_names[0]
        # brain = env.brains[brain_name]
        # alt_config['seed'] += 1
        alt_config['seed'] = selected_seeds[main]
        print_config(alt_config)
        config_list.append(alt_config.copy())
        alt_config['brain_name'] = brain_name
        agent = maddpg(env, alt_config)
        if DoProfile:cProfile.run("results = agent.train()",'PerfStats')
        else:results = agent.train()
        result_list.append(results)
        # all_rewards,avg_rewards,critic_losses,actor_losses = agent.train()
        print_config(alt_config)
        plot_results(results)
        
print('-------------------------------------')
print('-------------------------------------')
print('Summary :')
print('-------------------------------------')
print('-------------------------------------')
for param in range(len(var_range)):
    for main in range(num_runs):
        print_config(config_list[param*num_runs+main])
        plot_results(result_list[param*num_runs+main])
    
env.close()

-------------------------------------
New Run :
-------------------------------------
Config Parameters    : 
gamma                : 0.99
tau                  : 0.01
action_size          : 2
state_size           : 24
hidden_size          : 256
buffer_size          : 50000
batch_size           : 256
seed                 : 1
max_episodes         : 1000
dropout              : 0.01
learn_every          : 1
learn_num            : 2
critic_learning_rate : 0.001
actor_learning_rate  : 0.001
noise_decay          : 0.999
sigma                : 0.2
num_agents           : 2
env_file_name        : Tennis_Windows_x86_64/Tennis.exe
load_model           : True
save_model           : True
train_mode           : True
brain_name           : TennisBrain


episode: 0/1000   0% ETA:  --:--:-- |                                        | 

Memory loaded with length :  46779
Running on device :  cpu


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


Episode 0 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.999 || 0.384 seconds, mem : 46794
[0m

episode: 3/1000   0% ETA:  0:06:59 |                                         | 

Double Hit
[42mEpisode 2 with 61 steps || Reward : [0.1  0.19] || avg reward :  0.063 || Noise  0.997 || 0.344 seconds, mem : 46893
[0m[41mEpisode 4 with 29 steps || Reward : [0.   0.09] || avg reward :  0.056 || Noise  0.995 || 0.249 seconds, mem : 46945
[0m

episode: 7/1000   0% ETA:  0:05:37 |                                         | 

Double Hit
[42mEpisode 5 with 80 steps || Reward : [0.2  0.09] || avg reward :  0.080 || Noise  0.994 || 0.481 seconds, mem : 47025
[0m[41mEpisode 7 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.073 || Noise  0.992 || 0.279 seconds, mem : 47072
[0m[41mEpisode 8 with 30 steps || Reward : [0.   0.09] || avg reward :  0.074 || Noise  0.991 || 0.305 seconds, mem : 47102
[0m[41mEpisode 9 with 29 steps || Reward : [0.   0.09] || avg reward :  0.076 || Noise  0.990 || 0.235 seconds, mem : 47131
[0m

episode: 11/1000   1% ETA:  0:05:10 |                                        | 

[41mEpisode 10 with 31 steps || Reward : [0.   0.09] || avg reward :  0.077 || Noise  0.989 || 0.250 seconds, mem : 47162
[0m[41mEpisode 11 with 30 steps || Reward : [0.   0.09] || avg reward :  0.078 || Noise  0.988 || 0.281 seconds, mem : 47192
[0m[44mEpisode 12 with 53 steps || Reward : [0.1  0.09] || avg reward :  0.080 || Noise  0.987 || 0.350 seconds, mem : 47245
[0m[41mEpisode 13 with 30 steps || Reward : [-0.01  0.1 ] || avg reward :  0.081 || Noise  0.986 || 0.248 seconds, mem : 47275
[0m

episode: 15/1000   1% ETA:  0:04:59 |                                        | 

[41mEpisode 14 with 25 steps || Reward : [-0.01  0.1 ] || avg reward :  0.083 || Noise  0.985 || 0.218 seconds, mem : 47300
[0mDouble Hit
[42mEpisode 16 with 88 steps || Reward : [0.2  0.19] || avg reward :  0.085 || Noise  0.983 || 0.478 seconds, mem : 47402
[0m

episode: 18/1000   1% ETA:  0:05:12 |                                        | 

Double Hit
[42mEpisode 17 with 89 steps || Reward : [0.2  0.19] || avg reward :  0.091 || Noise  0.982 || 0.488 seconds, mem : 47491
[0m[44mEpisode 18 with 52 steps || Reward : [0.09 0.1 ] || avg reward :  0.092 || Noise  0.981 || 0.342 seconds, mem : 47543
[0mDouble Hit
[42mEpisode 19 with 75 steps || Reward : [0.2  0.09] || avg reward :  0.097 || Noise  0.980 || 0.397 seconds, mem : 47618
[0m[41mEpisode 20 with 24 steps || Reward : [-0.01  0.1 ] || avg reward :  0.097 || Noise  0.979 || 0.218 seconds, mem : 47642
[0m

episode: 22/1000   2% ETA:  0:05:08 |                                        | 

[41mEpisode 21 with 27 steps || Reward : [-0.01  0.1 ] || avg reward :  0.097 || Noise  0.978 || 0.252 seconds, mem : 47669
[0m[44mEpisode 22 with 80 steps || Reward : [0.1  0.09] || avg reward :  0.097 || Noise  0.977 || 0.425 seconds, mem : 47749
[0m[41mEpisode 23 with 30 steps || Reward : [-0.01  0.1 ] || avg reward :  0.098 || Noise  0.976 || 0.224 seconds, mem : 47779
[0m[41mEpisode 24 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.098 || Noise  0.975 || 0.332 seconds, mem : 47812
[0m

episode: 26/1000   2% ETA:  0:05:05 |\                                       | 

[41mEpisode 25 with 30 steps || Reward : [0.   0.09] || avg reward :  0.097 || Noise  0.974 || 0.231 seconds, mem : 47842
[0m[41mEpisode 27 with 30 steps || Reward : [0.   0.09] || avg reward :  0.094 || Noise  0.972 || 0.223 seconds, mem : 47890
[0m[41mEpisode 28 with 30 steps || Reward : [0.   0.09] || avg reward :  0.093 || Noise  0.971 || 0.192 seconds, mem : 47920
[0m[41mEpisode 29 with 30 steps || Reward : [0.   0.09] || avg reward :  0.093 || Noise  0.970 || 0.178 seconds, mem : 47950
[0m

episode: 32/1000   3% ETA:  0:04:39 ||                                       | 

[41mEpisode 31 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.091 || Noise  0.968 || 0.174 seconds, mem : 47995
[0m[41mEpisode 32 with 31 steps || Reward : [0.   0.09] || avg reward :  0.091 || Noise  0.968 || 0.222 seconds, mem : 48026
[0m[41mEpisode 33 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.091 || Noise  0.967 || 0.257 seconds, mem : 48059
[0m[41mEpisode 34 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.091 || Noise  0.966 || 0.276 seconds, mem : 48092
[0m

episode: 37/1000   3% ETA:  0:04:31 |/                                       | 

[41mEpisode 36 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.089 || Noise  0.964 || 0.234 seconds, mem : 48138
[0m[41mEpisode 38 with 30 steps || Reward : [0.   0.09] || avg reward :  0.087 || Noise  0.962 || 0.303 seconds, mem : 48182
[0m[41mEpisode 39 with 30 steps || Reward : [0.   0.09] || avg reward :  0.087 || Noise  0.961 || 0.232 seconds, mem : 48212
[0m[41mEpisode 40 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.087 || Noise  0.960 || 0.250 seconds, mem : 48246
[0m

episode: 42/1000   4% ETA:  0:04:24 |-                                       | 

[41mEpisode 41 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.087 || Noise  0.959 || 0.235 seconds, mem : 48278
[0m[41mEpisode 42 with 30 steps || Reward : [0.   0.09] || avg reward :  0.087 || Noise  0.958 || 0.292 seconds, mem : 48308
[0m[41mEpisode 43 with 30 steps || Reward : [0.   0.09] || avg reward :  0.088 || Noise  0.957 || 0.233 seconds, mem : 48338
[0m[41mEpisode 45 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.086 || Noise  0.955 || 0.238 seconds, mem : 48392
[0m

episode: 47/1000   4% ETA:  0:04:20 |\                                       | 

[41mEpisode 46 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.086 || Noise  0.954 || 0.280 seconds, mem : 48424
[0m[41mEpisode 47 with 31 steps || Reward : [0.   0.09] || avg reward :  0.086 || Noise  0.953 || 0.252 seconds, mem : 48455
[0m[41mEpisode 48 with 49 steps || Reward : [-0.01  0.1 ] || avg reward :  0.087 || Noise  0.952 || 0.297 seconds, mem : 48504
[0m[41mEpisode 49 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.087 || Noise  0.951 || 0.237 seconds, mem : 48538
[0m

episode: 51/1000   5% ETA:  0:04:19 |||                                      | 

[41mEpisode 50 with 30 steps || Reward : [0.   0.09] || avg reward :  0.087 || Noise  0.950 || 0.273 seconds, mem : 48568
[0m[41mEpisode 52 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.085 || Noise  0.948 || 0.241 seconds, mem : 48615
[0m[41mEpisode 53 with 30 steps || Reward : [0.   0.09] || avg reward :  0.086 || Noise  0.947 || 0.225 seconds, mem : 48645
[0m[41mEpisode 54 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.086 || Noise  0.946 || 0.275 seconds, mem : 48676
[0m

episode: 56/1000   5% ETA:  0:04:14 |//                                      | 

[41mEpisode 55 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.086 || Noise  0.946 || 0.247 seconds, mem : 48707
[0m[41mEpisode 56 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.086 || Noise  0.945 || 0.281 seconds, mem : 48740
[0m[41mEpisode 57 with 31 steps || Reward : [0.   0.09] || avg reward :  0.086 || Noise  0.944 || 0.234 seconds, mem : 48771
[0m[41mEpisode 58 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.087 || Noise  0.943 || 0.284 seconds, mem : 48802
[0m

episode: 60/1000   6% ETA:  0:04:12 |--                                      | 

[41mEpisode 59 with 30 steps || Reward : [0.   0.09] || avg reward :  0.087 || Noise  0.942 || 0.234 seconds, mem : 48832
[0m[41mEpisode 60 with 33 steps || Reward : [0.09 0.  ] || avg reward :  0.087 || Noise  0.941 || 0.253 seconds, mem : 48865
[0m[41mEpisode 61 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.087 || Noise  0.940 || 0.240 seconds, mem : 48898
[0m[41mEpisode 62 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.087 || Noise  0.939 || 0.275 seconds, mem : 48929
[0m

episode: 64/1000   6% ETA:  0:04:10 |\\                                      | 

[41mEpisode 63 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.087 || Noise  0.938 || 0.242 seconds, mem : 48962
[0m[41mEpisode 64 with 30 steps || Reward : [0.   0.09] || avg reward :  0.087 || Noise  0.937 || 0.238 seconds, mem : 48992
[0m[41mEpisode 65 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.088 || Noise  0.936 || 0.239 seconds, mem : 49026
[0m[41mEpisode 66 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.088 || Noise  0.935 || 0.281 seconds, mem : 49057
[0m

episode: 68/1000   6% ETA:  0:04:10 |||                                      | 

[44mEpisode 67 with 70 steps || Reward : [0.1  0.09] || avg reward :  0.088 || Noise  0.934 || 0.385 seconds, mem : 49127
[0m[41mEpisode 68 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.088 || Noise  0.933 || 0.249 seconds, mem : 49161
[0m[41mEpisode 69 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.088 || Noise  0.932 || 0.278 seconds, mem : 49194
[0m[41mEpisode 70 with 30 steps || Reward : [0.   0.09] || avg reward :  0.088 || Noise  0.931 || 0.232 seconds, mem : 49224
[0m

episode: 72/1000   7% ETA:  0:04:08 |//                                      | 

[41mEpisode 71 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.088 || Noise  0.930 || 0.247 seconds, mem : 49257
[0m[41mEpisode 72 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.089 || Noise  0.930 || 0.251 seconds, mem : 49290
[0m[41mEpisode 73 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.089 || Noise  0.929 || 0.270 seconds, mem : 49321
[0m[41mEpisode 74 with 41 steps || Reward : [-0.01  0.1 ] || avg reward :  0.089 || Noise  0.928 || 0.279 seconds, mem : 49362
[0m

episode: 76/1000   7% ETA:  0:04:07 |---                                     | 

[41mEpisode 75 with 32 steps || Reward : [0.   0.09] || avg reward :  0.089 || Noise  0.927 || 0.247 seconds, mem : 49394
[0m[41mEpisode 76 with 31 steps || Reward : [0.   0.09] || avg reward :  0.089 || Noise  0.926 || 0.249 seconds, mem : 49425
[0m[41mEpisode 78 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.088 || Noise  0.924 || 0.252 seconds, mem : 49470
[0m

episode: 80/1000   8% ETA:  0:04:05 |\\\                                     | 

[44mEpisode 79 with 52 steps || Reward : [0.09 0.1 ] || avg reward :  0.088 || Noise  0.923 || 0.317 seconds, mem : 49522
[0m[41mEpisode 80 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.088 || Noise  0.922 || 0.249 seconds, mem : 49554
[0m[41mEpisode 81 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.088 || Noise  0.921 || 0.274 seconds, mem : 49586
[0m[41mEpisode 82 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.089 || Noise  0.920 || 0.242 seconds, mem : 49619
[0m

episode: 84/1000   8% ETA:  0:04:04 ||||                                     | 

[41mEpisode 83 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.089 || Noise  0.919 || 0.262 seconds, mem : 49651
[0m[41mEpisode 84 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.089 || Noise  0.918 || 0.268 seconds, mem : 49683
[0m

episode: 90/1000   9% ETA:  0:03:58 |///                                     | 

[41mEpisode 92 with 31 steps || Reward : [0.   0.09] || avg reward :  0.082 || Noise  0.911 || 0.232 seconds, mem : 49811
[0m

episode: 95/1000   9% ETA:  0:03:54 |---                                     | 

[41mEpisode 93 with 32 steps || Reward : [0.   0.09] || avg reward :  0.082 || Noise  0.910 || 0.248 seconds, mem : 49843
[0m[41mEpisode 96 with 31 steps || Reward : [0.   0.09] || avg reward :  0.081 || Noise  0.908 || 0.250 seconds, mem : 49903
[0m[41mEpisode 97 with 32 steps || Reward : [0.   0.09] || avg reward :  0.081 || Noise  0.907 || 0.247 seconds, mem : 49935
[0m

episode: 100/1000  10% ETA:  0:03:50 |\\\                                    | 

# 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```