# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [2]:
env_file_name = "Tennis_Windows_x86_64/Tennis.exe"
# env = UnityEnvironment(file_name=env_file_name)
env = UnityEnvironment(file_name=env_file_name,no_graphics=True)

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1.  If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01.  Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents 
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])
print('states shape : ',states.shape)
print('Both states look like : ',states)
print(2*states)

Number of agents: 2
Size of each action: 2
There are 2 agents. Each observes a state with length: 24
The state for the first agent looks like: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]
states shape :  (2, 24)
Both states look like :  [[ 0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.         -6.65278625 -1.5
  -0.          0.          6.83172083  6.         -0.          0.        ]
 [ 0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.         -6.4669857  -1.5
   0.          0.         -6.83172083  6.          0.          0.

### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agents and receive feedback from the environment.

Once this cell is executed, you will watch the agents' performance, if they select actions at random with each time step.  A window should pop up that allows you to observe the agents.

Of course, as part of the project, you'll have to change the code so that the agents are able to use their experiences to gradually choose better actions when interacting with the environment!

In [5]:
if False:
    total_scores = []
    for i in range(100):                                        # play game for 5 episodes
        env_info = env.reset(train_mode=True)[brain_name]     # reset the environment    
        states = env_info.vector_observations                  # get the current state (for each agent)
        scores = np.zeros(num_agents)                          # initialize the score (for each agent)
        t = 0
        while True:
            actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
            actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
            # print('actions : ',actions)
            env_info = env.step(actions)[brain_name]           # send all actions to tne environment
            t += 1
            next_states = env_info.vector_observations         # get next state (for each agent)
            rewards = env_info.rewards                         # get reward (for each agent)
            dones = env_info.local_done                        # see if episode finished
            scores += env_info.rewards                         # update the score (for each agent)
            states = next_states                               # roll over states to next time step
            if np.any(dones):                                  # exit loop if episode finished
                break
        print('Score (max over agents) from episode {}: {}, and {} steps taken'.format(i, np.max(scores),t))
        print(scores)
        total_scores.append(scores)
    print('Average Random Score : ', np.mean(total_scores))
        
def plot_results(results):
    import matplotlib.pyplot as plt
    import torch
    plt.ion()

    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(results.all_rewards)), [np.sum(ar) for ar in results.all_rewards])
    plt.plot(np.arange(len(results.avg_rewards)), results.avg_rewards)
    plt.ylabel('Rewards')
    plt.xlabel('Episode #')
    plt.show()

    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(results.critic_loss)), results.critic_loss)
    plt.ylabel('critic_losses')
    plt.xlabel('Learn Step #')
    plt.show()

    fig = plt.figure()
    ax = fig.add_subplot(111)
    plt.plot(np.arange(len(results.actor_loss)), results.actor_loss)
    plt.ylabel('actor_losses')
    plt.xlabel('Learn Step #')
    plt.show()


When finished, you can close the environment.

In [None]:
from maddpg import maddpg
import cProfile
DoProfile = False

config = {
    'gamma'               : 0.99,
    'tau_low'             : 0.01, # 0.005
    'tau_high'            : 0.01,
    'action_size'         : action_size,
    'state_size'          : state_size,
    'hidden_size'         : 512,
    'buffer_size'         : 50000,
    'batch_size_low'      : 512,
    'batch_size_high'     : 512,
    'alpha_low'           : 0.5,
    'alpha_high'          : 0.7, #0.6,
    'beta_low'            : 0.5,
    'beta_high'           : 0.6, #0.5,
    'dropout'             : 0.01,
    'seed'                : 289,
    'max_episodes'        : 4000,
    'learn_every_low'     : 2,
    'learn_every_high'    : 10,
    'joined_states'       : True,
    'critic_learning_rate': 1e-3,
    'actor_learning_rate' : 1e-3,
    'noise_decay_fast'    : 0.998,
    'noise_decay_slow'    : 0.99995,
    'noise_scale_trigger' : 0.45,
    'sigma'               : 0.3,
    'num_agents'          : num_agents,
    'env_file_name'       : env_file_name,
    'train_mode'          : True,
    'brain_name'          : brain_name}

def print_config(config):
    print('Config Parameters    : ')
    for c,k in config.items():
        print('{:20s} : {}'.format(c,k))

config_list = []
result_list = []
var_range = []
# learn = [2, 5, 10]
# batch = [512,1024]
# nd = [0.999, 0.998]
# for l in learn:
    # for b in batch:
        # var_range.append([l,b])
        # for h in hidden:
            # for n in nd:
var_range = [0.4] #, 0.45, 0.5]
# var_range = [0.9998, 0.9999, 0.99995] # , 0.0003, 0.0005, 0.001]# [0.2, 0.25, 0.3]
selected_seeds = [303] # 239,286,303,305, 306] # agent specific rewards  [228,233] [230,233,239,241,276, 286,287,303,305, 306]
# num_runs = 20
for param in range(len(var_range)):
    alt_config = config.copy()
    # alt_config['tau_low'] = var_range[param]
    # alt_config['alpha_high'] = var_range[param][0]
    # alt_config['beta_high'] = var_range[param][1]
    # alt_config['batch_size_high'] = var_range[param][1]
    # alt_config['hidden_size'] = var_range[param][2]
    # alt_config['noise_scale_trigger'] = var_range[param]
    # alt_config['actor_learning_rate'] = var_range[param]
    # alt_config['learn_every_low'] = var_range[param][0]
    # alt_config['tau'] = config['tau']*curmult
    # alt_config['critic_learning_rate'] = config['critic_learning_rate']*curmult
    # alt_config['actor_learning_rate'] = config['actor_learning_rate']*curmult
    num_runs = len(selected_seeds)
    for main in range(num_runs):#len(tau_range)):
        print('-------------------------------------')
        print('New Run :')
        print('-------------------------------------')
        # alt_config['seed'] += 1
        alt_config['seed'] = selected_seeds[main]
        print_config(alt_config)
        config_list.append(alt_config.copy())
        agent = maddpg(env, alt_config)
        if DoProfile:cProfile.run("results = agent.train()",'PerfStats')
        else:results = agent.train()
        result_list.append(results)
        # all_rewards,avg_rewards,critic_losses,actor_losses = agent.train()
        print_config(alt_config)
        plot_results(results)
print('-------------------------------------')
print('-------------------------------------')
print('Summary :')
print('-------------------------------------')
print('-------------------------------------')
for param in range(len(var_range)):
    for main in range(num_runs):
        print_config(config_list[param*num_runs+main])
        plot_results(result_list[param*num_runs+main])
    
env.close()

episode: 0/4000   0% ETA:  --:--:-- |                                        | 

-------------------------------------
New Run :
-------------------------------------
Config Parameters    : 
gamma                : 0.99
tau_low              : 0.01
tau_high             : 0.01
action_size          : 2
state_size           : 24
hidden_size          : 512
buffer_size          : 50000
batch_size_low       : 512
batch_size_high      : 512
alpha_low            : 0.5
alpha_high           : 0.7
beta_low             : 0.5
beta_high            : 0.6
dropout              : 0.01
seed                 : 303
max_episodes         : 4000
learn_every_low      : 2
learn_every_high     : 10
joined_states        : True
critic_learning_rate : 0.001
actor_learning_rate  : 0.001
noise_decay_fast     : 0.998
noise_decay_slow     : 0.99995
noise_scale_trigger  : 0.45
sigma                : 0.3
num_agents           : 2
env_file_name        : Tennis_Windows_x86_64/Tennis.exe
train_mode           : True
brain_name           : TennisBrain
Running on device :  cpu
Episode 0 with 15 steps || Reward

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)
episode: 12/4000   0% ETA:  0:05:37 |                                        | 

Until buffer filled batches are smaller (256 vs. later 512)


episode: 23/4000   0% ETA:  0:05:53 |                                        | 

Episode 20 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.959 || 0.422 seconds, mem : 299
[0mUntil buffer filled batches are smaller (256 vs. later 512)


episode: 41/4000   1% ETA:  0:06:38 |                                        | 

Episode 40 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.921 || 0.797 seconds, mem : 583
[0m

episode: 61/4000   1% ETA:  0:06:40 |                                        | 

Episode 60 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.885 || 0.512 seconds, mem : 867
[0m

episode: 81/4000   2% ETA:  0:06:37 |                                        | 

Episode 80 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.850 || 0.489 seconds, mem : 1151
[0m

episode: 101/4000   2% ETA:  0:06:36 |                                       | 

Episode 100 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.817 || 0.465 seconds, mem : 1453
[0m

episode: 123/4000   3% ETA:  0:06:29 ||                                      | 

Episode 120 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.785 || 0.466 seconds, mem : 1737
[0m

episode: 134/4000   3% ETA:  0:06:26 |/                                      | 

[41mEpisode 134 with 29 steps || Reward : [-0.01  0.1 ] || avg reward :  0.001 || Noise  0.763 || 0.118 seconds, mem : 1951
[0m

episode: 143/4000   3% ETA:  0:06:29 |-                                      | 

[41mEpisode 140 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.002 || Noise  0.754 || 0.505 seconds, mem : 2054
[0m[41mEpisode 148 with 30 steps || Reward : [ 0.1  -0.01] || avg reward :  0.003 || Noise  0.742 || 0.117 seconds, mem : 2183
[0m

episode: 164/4000   4% ETA:  0:06:25 ||                                      | 

Episode 160 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.003 || Noise  0.724 || 0.432 seconds, mem : 2354
[0m[41mEpisode 167 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.004 || Noise  0.714 || 0.140 seconds, mem : 2471
[0m

episode: 174/4000   4% ETA:  0:06:25 |/                                      | 

[44mEpisode 177 with 31 steps || Reward : [0.1  0.09] || avg reward :  0.005 || Noise  0.700 || 0.135 seconds, mem : 2630
[0m[41mEpisode 179 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.006 || Noise  0.697 || 0.148 seconds, mem : 2676
[0m

episode: 182/4000   4% ETA:  0:06:29 |-                                      | 

Episode 180 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.006 || Noise  0.696 || 0.433 seconds, mem : 2690
[0m

episode: 204/4000   5% ETA:  0:06:23 ||                                      | 

Episode 200 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.006 || Noise  0.669 || 0.417 seconds, mem : 2975
[0m[41mEpisode 209 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.007 || Noise  0.657 || 0.139 seconds, mem : 3120
[0m

episode: 214/4000   5% ETA:  0:06:23 |//                                     | 

[41mEpisode 213 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.008 || Noise  0.652 || 0.130 seconds, mem : 3196
[0m[41mEpisode 215 with 31 steps || Reward : [0.   0.09] || avg reward :  0.009 || Noise  0.649 || 0.124 seconds, mem : 3241
[0m[41mEpisode 216 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.010 || Noise  0.648 || 0.148 seconds, mem : 3274
[0m[41mEpisode 218 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.011 || Noise  0.645 || 0.140 seconds, mem : 3321
[0m

episode: 221/4000   5% ETA:  0:06:29 |--                                     | 

[41mEpisode 220 with 31 steps || Reward : [0.   0.09] || avg reward :  0.012 || Noise  0.642 || 0.487 seconds, mem : 3366
[0m[41mEpisode 229 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.013 || Noise  0.631 || 0.120 seconds, mem : 3511
[0m

episode: 231/4000   5% ETA:  0:06:28 |\\                                     | 

[41mEpisode 232 with 31 steps || Reward : [0.   0.09] || avg reward :  0.014 || Noise  0.627 || 0.140 seconds, mem : 3570
[0m[41mEpisode 237 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.014 || Noise  0.621 || 0.122 seconds, mem : 3663
[0m[41mEpisode 238 with 30 steps || Reward : [0.   0.09] || avg reward :  0.015 || Noise  0.620 || 0.116 seconds, mem : 3693
[0m

episode: 241/4000   6% ETA:  0:06:31 |||                                     | 

Episode 240 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.014 || Noise  0.617 || 0.456 seconds, mem : 3721
[0m[41mEpisode 241 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.015 || Noise  0.616 || 0.132 seconds, mem : 3752
[0m[41mEpisode 242 with 31 steps || Reward : [0.   0.09] || avg reward :  0.016 || Noise  0.615 || 0.129 seconds, mem : 3783
[0m[41mEpisode 247 with 31 steps || Reward : [0.   0.09] || avg reward :  0.016 || Noise  0.609 || 0.134 seconds, mem : 3890
[0m

episode: 251/4000   6% ETA:  0:06:33 |//                                     | 

[41mEpisode 253 with 32 steps || Reward : [0.   0.09] || avg reward :  0.016 || Noise  0.601 || 0.131 seconds, mem : 4012
[0m[41mEpisode 257 with 33 steps || Reward : [0.   0.09] || avg reward :  0.017 || Noise  0.597 || 0.136 seconds, mem : 4106
[0m

episode: 261/4000   6% ETA:  0:06:34 |--                                     | 

Episode 260 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.017 || Noise  0.593 || 0.408 seconds, mem : 4149
[0m[41mEpisode 266 with 32 steps || Reward : [0.   0.09] || avg reward :  0.018 || Noise  0.586 || 0.142 seconds, mem : 4253
[0m[44mEpisode 267 with 44 steps || Reward : [-0.01  0.2 ] || avg reward :  0.019 || Noise  0.585 || 0.197 seconds, mem : 4297
[0m

episode: 271/4000   6% ETA:  0:06:36 |\\                                     | 

[41mEpisode 272 with 31 steps || Reward : [0.   0.09] || avg reward :  0.020 || Noise  0.579 || 0.132 seconds, mem : 4395
[0m[41mEpisode 275 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.021 || Noise  0.575 || 0.128 seconds, mem : 4456
[0m[41mEpisode 278 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.021 || Noise  0.572 || 0.135 seconds, mem : 4517
[0m

episode: 281/4000   7% ETA:  0:06:37 |||                                     | 

Episode 280 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.020 || Noise  0.570 || 0.425 seconds, mem : 4545
[0m[41mEpisode 281 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.021 || Noise  0.569 || 0.158 seconds, mem : 4578
[0m[41mEpisode 286 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.022 || Noise  0.563 || 0.133 seconds, mem : 4668
[0m[41mEpisode 289 with 32 steps || Reward : [0.   0.09] || avg reward :  0.023 || Noise  0.560 || 0.140 seconds, mem : 4728
[0m

episode: 291/4000   7% ETA:  0:06:38 |//                                     | 

[41mEpisode 297 with 31 steps || Reward : [0.   0.09] || avg reward :  0.024 || Noise  0.551 || 0.120 seconds, mem : 4859
[0m

episode: 301/4000   7% ETA:  0:06:36 |--                                     | 

Episode 300 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.024 || Noise  0.547 || 0.438 seconds, mem : 4906
[0m[41mEpisode 307 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.025 || Noise  0.540 || 0.146 seconds, mem : 5025
[0m

episode: 311/4000   7% ETA:  0:06:35 |\\\                                    | 

[41mEpisode 316 with 29 steps || Reward : [-0.01  0.1 ] || avg reward :  0.022 || Noise  0.530 || 0.129 seconds, mem : 5167
[0m

episode: 321/4000   8% ETA:  0:06:33 ||||                                    | 

Episode 320 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.020 || Noise  0.526 || 0.423 seconds, mem : 5223
[0m[41mEpisode 328 with 31 steps || Reward : [0.   0.09] || avg reward :  0.021 || Noise  0.518 || 0.121 seconds, mem : 5372
[0m

episode: 331/4000   8% ETA:  0:06:33 |///                                    | 

[41mEpisode 331 with 32 steps || Reward : [0.   0.09] || avg reward :  0.021 || Noise  0.514 || 0.131 seconds, mem : 5432
[0m[41mEpisode 333 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.021 || Noise  0.512 || 0.130 seconds, mem : 5477
[0m

episode: 341/4000   8% ETA:  0:06:33 |---                                    | 

[41mEpisode 340 with 32 steps || Reward : [0.   0.09] || avg reward :  0.020 || Noise  0.505 || 0.465 seconds, mem : 5594
[0m[41mEpisode 343 with 31 steps || Reward : [0.   0.09] || avg reward :  0.019 || Noise  0.502 || 0.135 seconds, mem : 5653
[0m[41mEpisode 344 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.020 || Noise  0.501 || 0.129 seconds, mem : 5686
[0m[41mEpisode 347 with 33 steps || Reward : [ 0.1  -0.02] || avg reward :  0.020 || Noise  0.498 || 0.133 seconds, mem : 5751
[0m

episode: 351/4000   8% ETA:  0:06:33 |\\\                                    | 

[41mEpisode 352 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.021 || Noise  0.493 || 0.128 seconds, mem : 5839
[0m[41mEpisode 353 with 32 steps || Reward : [0.   0.09] || avg reward :  0.021 || Noise  0.492 || 0.150 seconds, mem : 5871
[0m[41mEpisode 355 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.022 || Noise  0.490 || 0.126 seconds, mem : 5918
[0m[41mEpisode 357 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.022 || Noise  0.488 || 0.130 seconds, mem : 5967
[0m[41mEpisode 359 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.023 || Noise  0.486 || 0.139 seconds, mem : 6014
[0m

episode: 361/4000   9% ETA:  0:06:35 ||||                                    | 

Episode 360 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.023 || Noise  0.485 || 0.458 seconds, mem : 6028
[0m[41mEpisode 362 with 31 steps || Reward : [0.   0.09] || avg reward :  0.024 || Noise  0.483 || 0.123 seconds, mem : 6073
[0m[41mEpisode 363 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.025 || Noise  0.483 || 0.133 seconds, mem : 6106
[0m

episode: 371/4000   9% ETA:  0:06:34 |///                                    | 

[41mEpisode 371 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.023 || Noise  0.475 || 0.138 seconds, mem : 6239
[0m[41mEpisode 372 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.023 || Noise  0.474 || 0.131 seconds, mem : 6271
[0m

episode: 381/4000   9% ETA:  0:06:33 |---                                    | 

Episode 380 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.021 || Noise  0.466 || 0.419 seconds, mem : 6385
[0m[41mEpisode 381 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.021 || Noise  0.465 || 0.147 seconds, mem : 6419
[0m[41mEpisode 384 with 46 steps || Reward : [0.   0.09] || avg reward :  0.022 || Noise  0.463 || 0.178 seconds, mem : 6496
[0m

episode: 391/4000   9% ETA:  0:06:33 |\\\                                    | 

Noisetrigger : Bridging
[41mEpisode 399 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.020 || Noise  0.450 || 0.136 seconds, mem : 6727
[0m

episode: 401/4000  10% ETA:  0:06:32 ||||                                    | 

Episode 400 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.020 || Noise  0.450 || 0.400 seconds, mem : 6742
[0mNoisetrigger : Changing noise decay
[41mEpisode 401 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.021 || Noise  0.450 || 0.140 seconds, mem : 6774
[0m[41mEpisode 402 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.022 || Noise  0.450 || 0.499 seconds, mem : 6807
[0m

episode: 405/4000  10% ETA:  0:06:37 |///                                    | 

[41mEpisode 405 with 33 steps || Reward : [0.   0.09] || avg reward :  0.023 || Noise  0.450 || 0.135 seconds, mem : 6868
[0m

episode: 409/4000  10% ETA:  0:06:43 |---                                    | 

[41mEpisode 411 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.023 || Noise  0.450 || 0.135 seconds, mem : 6972
[0m

episode: 417/4000  10% ETA:  0:06:54 |||||                                   | 

[41mEpisode 416 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.023 || Noise  0.450 || 0.592 seconds, mem : 7068
[0m

episode: 421/4000  10% ETA:  0:07:00 |////                                   | 

Episode 420 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.023 || Noise  0.449 || 0.529 seconds, mem : 7127
[0m

episode: 425/4000  10% ETA:  0:07:05 |----                                   | 

[41mEpisode 425 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.024 || Noise  0.449 || 0.149 seconds, mem : 7217
[0m[41mEpisode 426 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.025 || Noise  0.449 || 0.603 seconds, mem : 7250
[0m

episode: 433/4000  10% ETA:  0:07:18 |||||                                   | 

[41mEpisode 433 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.023 || Noise  0.449 || 0.145 seconds, mem : 7368
[0m[41mEpisode 435 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.024 || Noise  0.449 || 0.139 seconds, mem : 7415
[0m

episode: 437/4000  10% ETA:  0:07:25 |////                                   | 

[41mEpisode 437 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.025 || Noise  0.449 || 0.143 seconds, mem : 7462
[0m

episode: 441/4000  11% ETA:  0:07:32 |----                                   | 

Episode 440 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.025 || Noise  0.449 || 0.578 seconds, mem : 7505
[0m

episode: 445/4000  11% ETA:  0:07:38 |\\\\                                   | 

[41mEpisode 446 with 32 steps || Reward : [0.   0.09] || avg reward :  0.024 || Noise  0.449 || 0.724 seconds, mem : 7615
[0m

episode: 449/4000  11% ETA:  0:07:45 |||||                                   | 

[41mEpisode 448 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.024 || Noise  0.449 || 0.668 seconds, mem : 7661
[0m

episode: 453/4000  11% ETA:  0:07:51 |////                                   | 

[41mEpisode 455 with 35 steps || Reward : [0.   0.09] || avg reward :  0.022 || Noise  0.449 || 0.149 seconds, mem : 7783
[0m

episode: 457/4000  11% ETA:  0:07:57 |----                                   | 

[41mEpisode 459 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.021 || Noise  0.449 || 0.137 seconds, mem : 7859
[0m

episode: 461/4000  11% ETA:  0:08:04 |\\\\                                   | 

Episode 460 with 16 steps || Reward : [ 0.   -0.01] || avg reward :  0.021 || Noise  0.449 || 0.636 seconds, mem : 7875
[0m

episode: 477/4000  11% ETA:  0:08:26 |\\\\                                   | 

[41mEpisode 479 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.018 || Noise  0.448 || 0.131 seconds, mem : 8169
[0m

episode: 481/4000  12% ETA:  0:08:33 |||||                                   | 

[41mEpisode 480 with 32 steps || Reward : [0.   0.09] || avg reward :  0.019 || Noise  0.448 || 0.748 seconds, mem : 8201
[0m[41mEpisode 482 with 32 steps || Reward : [0.   0.09] || avg reward :  0.018 || Noise  0.448 || 0.688 seconds, mem : 8247
[0m[41mEpisode 483 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.019 || Noise  0.448 || 0.135 seconds, mem : 8279
[0m

episode: 485/4000  12% ETA:  0:08:39 |////                                   | 

[41mEpisode 485 with 32 steps || Reward : [0.   0.09] || avg reward :  0.019 || Noise  0.448 || 0.149 seconds, mem : 8326
[0m[41mEpisode 487 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.020 || Noise  0.448 || 0.141 seconds, mem : 8373
[0m

episode: 493/4000  12% ETA:  0:08:50 |\\\\                                   | 

[41mEpisode 494 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.021 || Noise  0.448 || 0.710 seconds, mem : 8491
[0m

episode: 501/4000  12% ETA:  0:09:00 |////                                   | 

Episode 500 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.020 || Noise  0.448 || 0.601 seconds, mem : 8579
[0m[41mEpisode 503 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.019 || Noise  0.448 || 0.122 seconds, mem : 8638
[0m

episode: 509/4000  12% ETA:  0:09:10 |\\\\                                   | 

[41mEpisode 511 with 32 steps || Reward : [0.   0.09] || avg reward :  0.018 || Noise  0.447 || 0.135 seconds, mem : 8770
[0m

episode: 513/4000  12% ETA:  0:09:15 ||||||                                  | 

[41mEpisode 513 with 32 steps || Reward : [ 0.1  -0.02] || avg reward :  0.019 || Noise  0.447 || 0.142 seconds, mem : 8817
[0m

episode: 517/4000  12% ETA:  0:09:20 |/////                                  | 

[41mEpisode 516 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.019 || Noise  0.447 || 0.730 seconds, mem : 8878
[0m[41mEpisode 519 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.020 || Noise  0.447 || 0.140 seconds, mem : 8939
[0m

episode: 521/4000  13% ETA:  0:09:25 |-----                                  | 

Episode 520 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.020 || Noise  0.447 || 0.618 seconds, mem : 8953
[0m[41mEpisode 521 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.021 || Noise  0.447 || 0.149 seconds, mem : 8986
[0m

episode: 524/4000  13% ETA:  0:09:28 |\\\\\                                  | 

[41mEpisode 522 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.022 || Noise  0.447 || 0.696 seconds, mem : 9019
[0m[41mEpisode 523 with 38 steps || Reward : [0.   0.09] || avg reward :  0.023 || Noise  0.447 || 0.158 seconds, mem : 9057
[0m

episode: 527/4000  13% ETA:  0:09:32 ||||||                                  | 

[41mEpisode 527 with 31 steps || Reward : [0.   0.09] || avg reward :  0.022 || Noise  0.447 || 0.136 seconds, mem : 9131
[0m[41mEpisode 528 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.023 || Noise  0.447 || 0.700 seconds, mem : 9164
[0m

episode: 539/4000  13% ETA:  0:09:46 |\\\\\                                  | 

[41mEpisode 538 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.021 || Noise  0.447 || 0.700 seconds, mem : 9339
[0m[44mEpisode 540 with 43 steps || Reward : [0.1  0.09] || avg reward :  0.022 || Noise  0.447 || 0.772 seconds, mem : 9397
[0m[41mEpisode 541 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.023 || Noise  0.447 || 0.134 seconds, mem : 9430
[0m

episode: 543/4000  13% ETA:  0:09:51 ||||||                                  | 

[41mEpisode 542 with 30 steps || Reward : [ 0.1  -0.01] || avg reward :  0.024 || Noise  0.447 || 0.664 seconds, mem : 9460
[0m[41mEpisode 543 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.025 || Noise  0.447 || 0.146 seconds, mem : 9492
[0m[41mEpisode 545 with 30 steps || Reward : [-0.01  0.1 ] || avg reward :  0.026 || Noise  0.447 || 0.139 seconds, mem : 9536
[0m

episode: 547/4000  13% ETA:  0:09:56 |/////                                  | 

[41mEpisode 546 with 32 steps || Reward : [0.   0.09] || avg reward :  0.026 || Noise  0.447 || 0.657 seconds, mem : 9568
[0m[41mEpisode 547 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.027 || Noise  0.447 || 0.141 seconds, mem : 9601
[0m

episode: 551/4000  13% ETA:  0:10:00 |-----                                  | 

[41mEpisode 552 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.027 || Noise  0.446 || 0.701 seconds, mem : 9691
[0m

episode: 555/4000  13% ETA:  0:10:04 |\\\\\                                  | 

[41mEpisode 554 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.028 || Noise  0.446 || 0.687 seconds, mem : 9737
[0m[41mEpisode 555 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.028 || Noise  0.446 || 0.138 seconds, mem : 9770
[0m[41mEpisode 557 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.446 || 0.132 seconds, mem : 9817
[0m

episode: 559/4000  13% ETA:  0:10:09 ||||||                                  | 

[41mEpisode 559 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.446 || 0.152 seconds, mem : 9864
[0mEpisode 560 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.029 || Noise  0.446 || 0.563 seconds, mem : 9878
[0m

episode: 563/4000  14% ETA:  0:10:13 |/////                                  | 

[41mEpisode 562 with 32 steps || Reward : [0.   0.09] || avg reward :  0.030 || Noise  0.446 || 0.683 seconds, mem : 9924
[0m[41mEpisode 564 with 33 steps || Reward : [ 0.1  -0.02] || avg reward :  0.031 || Noise  0.446 || 0.717 seconds, mem : 9971
[0m

episode: 567/4000  14% ETA:  0:10:17 |-----                                  | 

[41mEpisode 567 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.032 || Noise  0.446 || 0.148 seconds, mem : 10048
[0m[41mEpisode 569 with 32 steps || Reward : [0.   0.09] || avg reward :  0.033 || Noise  0.446 || 0.150 seconds, mem : 10095
[0m

episode: 571/4000  14% ETA:  0:10:21 |\\\\\                                  | 

[41mEpisode 572 with 26 steps || Reward : [ 0.1  -0.01] || avg reward :  0.034 || Noise  0.446 || 0.675 seconds, mem : 10150
[0m

episode: 575/4000  14% ETA:  0:10:24 ||||||                                  | 

[41mEpisode 577 with 34 steps || Reward : [-0.01  0.1 ] || avg reward :  0.035 || Noise  0.446 || 0.138 seconds, mem : 10240
[0m

episode: 579/4000  14% ETA:  0:10:27 |/////                                  | 

Episode 580 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.033 || Noise  0.446 || 0.587 seconds, mem : 10282
[0m

episode: 583/4000  14% ETA:  0:10:30 |-----                                  | 

[41mEpisode 582 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.033 || Noise  0.446 || 0.656 seconds, mem : 10329
[0m

episode: 591/4000  14% ETA:  0:10:36 ||||||                                  | 

[41mEpisode 591 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.031 || Noise  0.446 || 0.135 seconds, mem : 10485
[0m[41mEpisode 592 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.032 || Noise  0.446 || 0.728 seconds, mem : 10518
[0m

episode: 595/4000  14% ETA:  0:10:39 |/////                                  | 

[41mEpisode 596 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.032 || Noise  0.445 || 0.680 seconds, mem : 10593
[0m

episode: 599/4000  14% ETA:  0:10:43 |-----                                  | 

[44mEpisode 600 with 42 steps || Reward : [0.1  0.09] || avg reward :  0.033 || Noise  0.445 || 0.708 seconds, mem : 10678
[0m

episode: 607/4000  15% ETA:  0:10:48 ||||||                                  | 

[41mEpisode 606 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.033 || Noise  0.445 || 0.684 seconds, mem : 10782
[0m[41mEpisode 609 with 33 steps || Reward : [0.   0.09] || avg reward :  0.034 || Noise  0.445 || 0.142 seconds, mem : 10843
[0m

episode: 619/4000  15% ETA:  0:10:56 |\\\\\\                                 | 

[41mEpisode 619 with 33 steps || Reward : [0.   0.09] || avg reward :  0.031 || Noise  0.445 || 0.134 seconds, mem : 11040
[0mEpisode 620 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.031 || Noise  0.445 || 0.598 seconds, mem : 11054
[0m[41mEpisode 621 with 29 steps || Reward : [-0.01  0.1 ] || avg reward :  0.031 || Noise  0.445 || 0.153 seconds, mem : 11083
[0m

episode: 623/4000  15% ETA:  0:10:59 |||||||                                 | 

[41mEpisode 624 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.030 || Noise  0.445 || 0.678 seconds, mem : 11144
[0m

episode: 627/4000  15% ETA:  0:11:02 |//////                                 | 

[41mEpisode 627 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.031 || Noise  0.445 || 0.132 seconds, mem : 11205
[0m

episode: 630/4000  15% ETA:  0:11:04 |------                                 | 

[44mEpisode 628 with 46 steps || Reward : [0.1  0.09] || avg reward :  0.031 || Noise  0.445 || 0.745 seconds, mem : 11251
[0m[41mEpisode 629 with 32 steps || Reward : [0.   0.09] || avg reward :  0.031 || Noise  0.445 || 0.151 seconds, mem : 11283
[0m

episode: 633/4000  15% ETA:  0:11:07 |\\\\\\                                 | 

[41mEpisode 634 with 38 steps || Reward : [-0.01  0.1 ] || avg reward :  0.032 || Noise  0.445 || 0.718 seconds, mem : 11378
[0m

episode: 641/4000  16% ETA:  0:11:11 |//////                                 | 

Episode 640 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.030 || Noise  0.445 || 0.625 seconds, mem : 11463
[0m[41mEpisode 642 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.444 || 0.659 seconds, mem : 11517
[0m

episode: 645/4000  16% ETA:  0:11:14 |------                                 | 

[41mEpisode 644 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.444 || 0.658 seconds, mem : 11564
[0m[41mEpisode 645 with 27 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.444 || 0.121 seconds, mem : 11591
[0m[41mEpisode 646 with 31 steps || Reward : [0.   0.09] || avg reward :  0.029 || Noise  0.444 || 0.663 seconds, mem : 11622
[0m

episode: 649/4000  16% ETA:  0:11:16 |\\\\\\                                 | 

[41mEpisode 649 with 28 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.444 || 0.125 seconds, mem : 11679
[0m[41mEpisode 650 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.030 || Noise  0.444 || 0.656 seconds, mem : 11712
[0m

episode: 661/4000  16% ETA:  0:11:23 |------                                 | 

[41mEpisode 660 with 35 steps || Reward : [ 0.1  -0.01] || avg reward :  0.026 || Noise  0.444 || 0.699 seconds, mem : 11875
[0m[41mEpisode 663 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.027 || Noise  0.444 || 0.137 seconds, mem : 11936
[0m

episode: 665/4000  16% ETA:  0:11:25 |\\\\\\                                 | 

[41mEpisode 664 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.027 || Noise  0.444 || 0.667 seconds, mem : 11969
[0m[41mEpisode 665 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.028 || Noise  0.444 || 0.140 seconds, mem : 12001
[0m

episode: 669/4000  16% ETA:  0:11:28 |||||||                                 | 

[41mEpisode 668 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.028 || Noise  0.444 || 0.668 seconds, mem : 12063
[0m[41mEpisode 670 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.028 || Noise  0.444 || 0.671 seconds, mem : 12110
[0m[41mEpisode 671 with 34 steps || Reward : [0.   0.09] || avg reward :  0.029 || Noise  0.444 || 0.142 seconds, mem : 12144
[0m

episode: 677/4000  16% ETA:  0:11:32 |------                                 | 

[41mEpisode 676 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.444 || 0.647 seconds, mem : 12236
[0m

episode: 681/4000  17% ETA:  0:11:33 |\\\\\\                                 | 

Episode 680 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.028 || Noise  0.444 || 0.577 seconds, mem : 12293
[0m[41mEpisode 681 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.444 || 0.128 seconds, mem : 12325
[0m[41mEpisode 683 with 33 steps || Reward : [0.   0.09] || avg reward :  0.028 || Noise  0.444 || 0.133 seconds, mem : 12372
[0m

episode: 685/4000  17% ETA:  0:11:35 |||||||                                 | 

[41mEpisode 685 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.444 || 0.143 seconds, mem : 12419
[0m

episode: 693/4000  17% ETA:  0:11:38 |------                                 | 

[41mEpisode 693 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.028 || Noise  0.443 || 0.131 seconds, mem : 12549
[0m[41mEpisode 695 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.443 || 0.137 seconds, mem : 12596
[0m

episode: 697/4000  17% ETA:  0:11:41 |\\\\\\                                 | 

[41mEpisode 696 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.029 || Noise  0.443 || 0.650 seconds, mem : 12629
[0m[44mEpisode 698 with 40 steps || Reward : [0.1  0.09] || avg reward :  0.030 || Noise  0.443 || 0.687 seconds, mem : 12683
[0m

episode: 701/4000  17% ETA:  0:11:42 |||||||                                 | 

Episode 700 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.029 || Noise  0.443 || 0.560 seconds, mem : 12711
[0m

episode: 705/4000  17% ETA:  0:11:44 |//////                                 | 

[41mEpisode 705 with 35 steps || Reward : [ 0.1  -0.01] || avg reward :  0.030 || Noise  0.443 || 0.157 seconds, mem : 12803
[0m[41mEpisode 706 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.030 || Noise  0.443 || 0.647 seconds, mem : 12835
[0m

episode: 709/4000  17% ETA:  0:11:46 |------                                 | 

[41mEpisode 709 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.031 || Noise  0.443 || 0.141 seconds, mem : 12899
[0m

episode: 717/4000  17% ETA:  0:11:49 |||||||                                 | 

[41mEpisode 716 with 20 steps || Reward : [0.09 0.  ] || avg reward :  0.031 || Noise  0.443 || 0.599 seconds, mem : 13004
[0m[41mEpisode 718 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.032 || Noise  0.443 || 0.669 seconds, mem : 13050
[0m

episode: 721/4000  18% ETA:  0:11:50 |///////                                | 

Episode 720 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.032 || Noise  0.443 || 0.619 seconds, mem : 13079
[0m[41mEpisode 723 with 21 steps || Reward : [ 0.1  -0.01] || avg reward :  0.032 || Noise  0.443 || 0.090 seconds, mem : 13128
[0m

episode: 725/4000  18% ETA:  0:11:51 |-------                                | 

[41mEpisode 727 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.031 || Noise  0.443 || 0.138 seconds, mem : 13203
[0m

episode: 729/4000  18% ETA:  0:11:53 |\\\\\\\                                | 

[41mEpisode 730 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.030 || Noise  0.443 || 0.692 seconds, mem : 13266
[0m

episode: 733/4000  18% ETA:  0:11:55 ||||||||                                | 

[41mEpisode 734 with 30 steps || Reward : [ 0.1  -0.01] || avg reward :  0.030 || Noise  0.442 || 0.741 seconds, mem : 13376
[0m

episode: 737/4000  18% ETA:  0:11:57 |///////                                | 

[41mEpisode 738 with 33 steps || Reward : [0.   0.09] || avg reward :  0.031 || Noise  0.442 || 0.672 seconds, mem : 13452
[0m[41mEpisode 739 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.032 || Noise  0.442 || 0.132 seconds, mem : 13485
[0m

episode: 741/4000  18% ETA:  0:11:59 |-------                                | 

Episode 740 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.032 || Noise  0.442 || 0.586 seconds, mem : 13499
[0m

episode: 745/4000  18% ETA:  0:12:00 |\\\\\\\                                | 

[41mEpisode 747 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.442 || 0.127 seconds, mem : 13634
[0m

episode: 749/4000  18% ETA:  0:12:01 ||||||||                                | 

[41mEpisode 749 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.442 || 0.139 seconds, mem : 13681
[0m

episode: 753/4000  18% ETA:  0:12:03 |///////                                | 

[41mEpisode 752 with 32 steps || Reward : [0.   0.09] || avg reward :  0.029 || Noise  0.442 || 0.615 seconds, mem : 13742
[0m

episode: 757/4000  18% ETA:  0:12:03 |-------                                | 

[41mEpisode 759 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.030 || Noise  0.442 || 0.130 seconds, mem : 13860
[0m

episode: 761/4000  19% ETA:  0:12:04 |\\\\\\\                                | 

Episode 760 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.029 || Noise  0.442 || 0.576 seconds, mem : 13875
[0m

episode: 765/4000  19% ETA:  0:12:05 ||||||||                                | 

[41mEpisode 767 with 36 steps || Reward : [0.   0.09] || avg reward :  0.026 || Noise  0.442 || 0.144 seconds, mem : 13996
[0m

episode: 769/4000  19% ETA:  0:12:07 |///////                                | 

[41mEpisode 768 with 35 steps || Reward : [ 0.1  -0.01] || avg reward :  0.026 || Noise  0.442 || 0.724 seconds, mem : 14031
[0m[41mEpisode 769 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.027 || Noise  0.442 || 0.130 seconds, mem : 14063
[0m

episode: 773/4000  19% ETA:  0:12:08 |-------                                | 

[41mEpisode 772 with 21 steps || Reward : [ 0.1  -0.01] || avg reward :  0.027 || Noise  0.442 || 0.568 seconds, mem : 14113
[0m[41mEpisode 775 with 35 steps || Reward : [ 0.1  -0.01] || avg reward :  0.028 || Noise  0.442 || 0.136 seconds, mem : 14177
[0m

episode: 777/4000  19% ETA:  0:12:09 |\\\\\\\                                | 

[41mEpisode 778 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.028 || Noise  0.441 || 0.614 seconds, mem : 14237
[0m

episode: 781/4000  19% ETA:  0:12:10 ||||||||                                | 

[41mEpisode 780 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.441 || 0.742 seconds, mem : 14283
[0m[41mEpisode 782 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.441 || 0.655 seconds, mem : 14330
[0m

episode: 785/4000  19% ETA:  0:12:12 |///////                                | 

[41mEpisode 784 with 21 steps || Reward : [0.09 0.  ] || avg reward :  0.029 || Noise  0.441 || 0.653 seconds, mem : 14366
[0m

episode: 789/4000  19% ETA:  0:12:13 |-------                                | 

[41mEpisode 789 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.441 || 0.140 seconds, mem : 14460
[0m[41mEpisode 790 with 27 steps || Reward : [ 0.1  -0.01] || avg reward :  0.030 || Noise  0.441 || 0.613 seconds, mem : 14487
[0m[41mEpisode 791 with 30 steps || Reward : [-0.01  0.1 ] || avg reward :  0.031 || Noise  0.441 || 0.118 seconds, mem : 14517
[0m

episode: 793/4000  19% ETA:  0:12:14 |\\\\\\\                                | 

[41mEpisode 794 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.031 || Noise  0.441 || 0.634 seconds, mem : 14583
[0m

episode: 797/4000  19% ETA:  0:12:15 ||||||||                                | 

[41mEpisode 796 with 37 steps || Reward : [-0.01  0.1 ] || avg reward :  0.030 || Noise  0.441 || 0.665 seconds, mem : 14634
[0m[41mEpisode 797 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.031 || Noise  0.441 || 0.162 seconds, mem : 14667
[0m

episode: 801/4000  20% ETA:  0:12:16 |///////                                | 

Episode 800 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.030 || Noise  0.441 || 0.565 seconds, mem : 14709
[0m

episode: 805/4000  20% ETA:  0:12:16 |-------                                | 

[41mEpisode 804 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.031 || Noise  0.441 || 0.612 seconds, mem : 14783
[0m[41mEpisode 806 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.030 || Noise  0.441 || 0.635 seconds, mem : 14830
[0m

episode: 809/4000  20% ETA:  0:12:17 |\\\\\\\                                | 

[41mEpisode 809 with 33 steps || Reward : [0.   0.09] || avg reward :  0.029 || Noise  0.441 || 0.155 seconds, mem : 14893
[0m[41mEpisode 811 with 28 steps || Reward : [ 0.1  -0.01] || avg reward :  0.030 || Noise  0.441 || 0.146 seconds, mem : 14935
[0m

episode: 813/4000  20% ETA:  0:12:19 ||||||||                                | 

[41mEpisode 812 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.031 || Noise  0.441 || 0.656 seconds, mem : 14968
[0m[41mEpisode 814 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.032 || Noise  0.441 || 0.675 seconds, mem : 15015
[0m[41mEpisode 815 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.033 || Noise  0.441 || 0.128 seconds, mem : 15048
[0m

episode: 817/4000  20% ETA:  0:12:20 |///////                                | 

[41mEpisode 816 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.034 || Noise  0.441 || 0.659 seconds, mem : 15080
[0m

episode: 821/4000  20% ETA:  0:12:20 |--------                               | 

Episode 820 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.033 || Noise  0.441 || 0.559 seconds, mem : 15139
[0m

episode: 825/4000  20% ETA:  0:12:21 |\\\\\\\\                               | 

[41mEpisode 825 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.033 || Noise  0.440 || 0.156 seconds, mem : 15235
[0m[41mEpisode 826 with 30 steps || Reward : [-0.01  0.1 ] || avg reward :  0.034 || Noise  0.440 || 0.625 seconds, mem : 15265
[0m

episode: 829/4000  20% ETA:  0:12:21 |||||||||                               | 

[41mEpisode 831 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.033 || Noise  0.440 || 0.132 seconds, mem : 15355
[0m

episode: 841/4000  21% ETA:  0:12:22 |\\\\\\\\                               | 

Episode 840 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.030 || Noise  0.440 || 0.559 seconds, mem : 15483
[0m[42mEpisode 841 with 58 steps || Reward : [0.1  0.19] || avg reward :  0.032 || Noise  0.440 || 0.275 seconds, mem : 15541
[0m

episode: 845/4000  21% ETA:  0:12:23 |||||||||                               | 

[41mEpisode 845 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.033 || Noise  0.440 || 0.136 seconds, mem : 15616
[0m[41mEpisode 847 with 33 steps || Reward : [0.   0.09] || avg reward :  0.032 || Noise  0.440 || 0.134 seconds, mem : 15663
[0m

episode: 853/4000  21% ETA:  0:12:24 |--------                               | 

[41mEpisode 852 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.032 || Noise  0.440 || 0.614 seconds, mem : 15753
[0m[41mEpisode 853 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.033 || Noise  0.440 || 0.146 seconds, mem : 15786
[0m[41mEpisode 855 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.034 || Noise  0.440 || 0.133 seconds, mem : 15833
[0m

episode: 857/4000  21% ETA:  0:12:25 |\\\\\\\\                               | 

[41mEpisode 858 with 33 steps || Reward : [0.   0.09] || avg reward :  0.034 || Noise  0.440 || 0.617 seconds, mem : 15894
[0m[41mEpisode 859 with 32 steps || Reward : [0.   0.09] || avg reward :  0.034 || Noise  0.440 || 0.152 seconds, mem : 15926
[0m

episode: 861/4000  21% ETA:  0:12:26 |||||||||                               | 

Episode 860 with 24 steps || Reward : [-0.01  0.  ] || avg reward :  0.034 || Noise  0.440 || 0.621 seconds, mem : 15950
[0m

episode: 865/4000  21% ETA:  0:12:26 |////////                               | 

[41mEpisode 867 with 31 steps || Reward : [ 0.1  -0.01] || avg reward :  0.034 || Noise  0.439 || 0.131 seconds, mem : 16078
[0m

episode: 873/4000  21% ETA:  0:12:27 |\\\\\\\\                               | 

[41mEpisode 873 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.032 || Noise  0.439 || 0.146 seconds, mem : 16185
[0m[41mEpisode 875 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.032 || Noise  0.439 || 0.133 seconds, mem : 16232
[0m

episode: 877/4000  21% ETA:  0:12:28 |||||||||                               | 

[41mEpisode 876 with 33 steps || Reward : [0.   0.09] || avg reward :  0.033 || Noise  0.439 || 0.666 seconds, mem : 16265
[0m[41mEpisode 879 with 32 steps || Reward : [ 0.1  -0.01] || avg reward :  0.033 || Noise  0.439 || 0.141 seconds, mem : 16326
[0m

episode: 881/4000  22% ETA:  0:12:29 |////////                               | 

[41mEpisode 880 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.033 || Noise  0.439 || 0.636 seconds, mem : 16359
[0m[41mEpisode 883 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.033 || Noise  0.439 || 0.130 seconds, mem : 16420
[0m

episode: 885/4000  22% ETA:  0:12:29 |--------                               | 

[41mEpisode 884 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.033 || Noise  0.439 || 0.603 seconds, mem : 16453
[0m

episode: 901/4000  22% ETA:  0:12:29 |--------                               | 

[41mEpisode 900 with 26 steps || Reward : [-0.01  0.1 ] || avg reward :  0.028 || Noise  0.439 || 0.570 seconds, mem : 16712
[0m[41mEpisode 901 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.439 || 0.155 seconds, mem : 16745
[0m[44mEpisode 903 with 37 steps || Reward : [0.1  0.09] || avg reward :  0.030 || Noise  0.439 || 0.162 seconds, mem : 16796
[0m

episode: 909/4000  22% ETA:  0:12:30 |||||||||                               | 

[41mEpisode 908 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.439 || 0.611 seconds, mem : 16885
[0m[41mEpisode 911 with 33 steps || Reward : [ 0.1  -0.01] || avg reward :  0.029 || Noise  0.439 || 0.129 seconds, mem : 16947
[0m

episode: 913/4000  22% ETA:  0:12:30 |////////                               | 

[41mEpisode 912 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.029 || Noise  0.438 || 0.665 seconds, mem : 16979
[0m

episode: 917/4000  22% ETA:  0:12:30 |--------                               | 

[41mEpisode 916 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.027 || Noise  0.438 || 0.617 seconds, mem : 17055
[0m

episode: 921/4000  23% ETA:  0:12:30 |\\\\\\\\                               | 

Episode 920 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.027 || Noise  0.438 || 0.562 seconds, mem : 17112
[0m

episode: 929/4000  23% ETA:  0:12:30 |/////////                              | 

[41mEpisode 931 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.025 || Noise  0.438 || 0.142 seconds, mem : 17290
[0m

episode: 941/4000  23% ETA:  0:12:30 ||||||||||                              | 

Episode 940 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.025 || Noise  0.438 || 0.548 seconds, mem : 17417
[0m[41mEpisode 941 with 35 steps || Reward : [0.   0.09] || avg reward :  0.024 || Noise  0.438 || 0.148 seconds, mem : 17452
[0m[41mEpisode 942 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.025 || Noise  0.438 || 0.633 seconds, mem : 17485
[0m[41mEpisode 943 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.026 || Noise  0.438 || 0.135 seconds, mem : 17518
[0m

episode: 945/4000  23% ETA:  0:12:31 |/////////                              | 

[41mEpisode 944 with 32 steps || Reward : [0.   0.09] || avg reward :  0.026 || Noise  0.438 || 0.616 seconds, mem : 17550
[0m

episode: 949/4000  23% ETA:  0:12:31 |---------                              | 

[41mEpisode 948 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.026 || Noise  0.438 || 0.620 seconds, mem : 17625
[0m

episode: 961/4000  24% ETA:  0:12:30 |/////////                              | 

Episode 960 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.021 || Noise  0.437 || 0.544 seconds, mem : 17795
[0m

episode: 973/4000  24% ETA:  0:12:30 ||||||||||                              | 

[41mEpisode 974 with 22 steps || Reward : [0.   0.09] || avg reward :  0.020 || Noise  0.437 || 0.572 seconds, mem : 18002
[0m

episode: 977/4000  24% ETA:  0:12:30 |/////////                              | 

[41mEpisode 978 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.019 || Noise  0.437 || 0.634 seconds, mem : 18076
[0m

episode: 981/4000  24% ETA:  0:12:30 |---------                              | 

[41mEpisode 980 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.018 || Noise  0.437 || 0.609 seconds, mem : 18123
[0m[41mEpisode 983 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.018 || Noise  0.437 || 0.131 seconds, mem : 18185
[0m

episode: 993/4000  24% ETA:  0:12:30 |/////////                              | 

[41mEpisode 992 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.018 || Noise  0.437 || 0.687 seconds, mem : 18331
[0m[41mEpisode 993 with 34 steps || Reward : [0.   0.09] || avg reward :  0.019 || Noise  0.437 || 0.145 seconds, mem : 18365
[0m

episode: 1001/4000  25% ETA:  0:12:29 |\\\\\\\\\                             | 

Episode 1000 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.018 || Noise  0.437 || 0.544 seconds, mem : 18465
[0m

episode: 1021/4000  25% ETA:  0:12:28 ||||||||||                             | 

Episode 1020 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.012 || Noise  0.436 || 0.576 seconds, mem : 18749
[0m

episode: 1033/4000  25% ETA:  0:12:29 |\\\\\\\\\                             | 

[41mEpisode 1032 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.012 || Noise  0.436 || 0.705 seconds, mem : 18938
[0m

episode: 1037/4000  25% ETA:  0:12:28 ||||||||||                             | 

[41mEpisode 1037 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.013 || Noise  0.436 || 0.138 seconds, mem : 19027
[0m[41mEpisode 1039 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.014 || Noise  0.436 || 0.133 seconds, mem : 19074
[0m

episode: 1041/4000  26% ETA:  0:12:29 |/////////                             | 

Episode 1040 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.014 || Noise  0.436 || 0.617 seconds, mem : 19088
[0m[41mEpisode 1041 with 33 steps || Reward : [0.   0.09] || avg reward :  0.014 || Noise  0.436 || 0.135 seconds, mem : 19121
[0m

episode: 1045/4000  26% ETA:  0:12:29 |---------                             | 

[41mEpisode 1046 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.012 || Noise  0.436 || 0.626 seconds, mem : 19211
[0m

episode: 1049/4000  26% ETA:  0:12:28 |\\\\\\\\\                             | 

[41mEpisode 1049 with 34 steps || Reward : [0.   0.09] || avg reward :  0.012 || Noise  0.436 || 0.154 seconds, mem : 19273
[0m[41mEpisode 1051 with 34 steps || Reward : [ 0.1  -0.01] || avg reward :  0.013 || Noise  0.435 || 0.141 seconds, mem : 19323
[0m

episode: 1053/4000  26% ETA:  0:12:28 |||||||||||                            | 

[41mEpisode 1055 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.014 || Noise  0.435 || 0.153 seconds, mem : 19399
[0m

episode: 1057/4000  26% ETA:  0:12:29 |//////////                            | 

[41mEpisode 1058 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.015 || Noise  0.435 || 0.661 seconds, mem : 19460
[0m

episode: 1061/4000  26% ETA:  0:12:29 |----------                            | 

[41mEpisode 1060 with 33 steps || Reward : [0.   0.09] || avg reward :  0.016 || Noise  0.435 || 0.647 seconds, mem : 19507
[0m

episode: 1069/4000  26% ETA:  0:12:29 |||||||||||                            | 

[41mEpisode 1068 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.017 || Noise  0.435 || 0.659 seconds, mem : 19639
[0m[41mEpisode 1070 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.018 || Noise  0.435 || 0.717 seconds, mem : 19686
[0m[41mEpisode 1071 with 32 steps || Reward : [0.   0.09] || avg reward :  0.018 || Noise  0.435 || 0.159 seconds, mem : 19718
[0m

episode: 1073/4000  26% ETA:  0:12:29 |//////////                            | 

[41mEpisode 1075 with 33 steps || Reward : [0.   0.09] || avg reward :  0.018 || Noise  0.435 || 0.133 seconds, mem : 19812
[0m

episode: 1081/4000  27% ETA:  0:12:29 |\\\\\\\\\\                            | 

[41mEpisode 1080 with 34 steps || Reward : [0.   0.09] || avg reward :  0.017 || Noise  0.435 || 0.651 seconds, mem : 19902
[0m[41mEpisode 1082 with 31 steps || Reward : [-0.01  0.1 ] || avg reward :  0.018 || Noise  0.435 || 0.662 seconds, mem : 19948
[0m

episode: 1085/4000  27% ETA:  0:12:29 |||||||||||                            | 

[41mEpisode 1084 with 34 steps || Reward : [0.   0.09] || avg reward :  0.018 || Noise  0.435 || 0.611 seconds, mem : 19996
[0m

episode: 1089/4000  27% ETA:  0:12:29 |//////////                            | 

[41mEpisode 1089 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.019 || Noise  0.435 || 0.136 seconds, mem : 20086
[0m

episode: 1097/4000  27% ETA:  0:12:28 |\\\\\\\\\\                            | 

[41mEpisode 1098 with 34 steps || Reward : [-0.01  0.1 ] || avg reward :  0.018 || Noise  0.434 || 0.601 seconds, mem : 20234
[0m

episode: 1101/4000  27% ETA:  0:12:27 |||||||||||                            | 

Episode 1100 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.018 || Noise  0.434 || 0.550 seconds, mem : 20262
[0m[41mEpisode 1101 with 33 steps || Reward : [0.   0.09] || avg reward :  0.019 || Noise  0.434 || 0.155 seconds, mem : 20295
[0m

episode: 1105/4000  27% ETA:  0:12:27 |//////////                            | 

[41mEpisode 1105 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.020 || Noise  0.434 || 0.146 seconds, mem : 20370
[0m[41mEpisode 1106 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.021 || Noise  0.434 || 0.698 seconds, mem : 20403
[0m[41mEpisode 1107 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.022 || Noise  0.434 || 0.132 seconds, mem : 20436
[0m

episode: 1109/4000  27% ETA:  0:12:27 |----------                            | 

[44mEpisode 1110 with 37 steps || Reward : [0.1  0.09] || avg reward :  0.023 || Noise  0.434 || 0.622 seconds, mem : 20501
[0m

episode: 1121/4000  28% ETA:  0:12:26 |//////////                            | 

[41mEpisode 1120 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.024 || Noise  0.434 || 0.580 seconds, mem : 20663
[0m

episode: 1133/4000  28% ETA:  0:12:25 |||||||||||                            | 

[41mEpisode 1135 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.024 || Noise  0.434 || 0.112 seconds, mem : 20895
[0m

episode: 1141/4000  28% ETA:  0:12:25 |----------                            | 

Episode 1140 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.022 || Noise  0.434 || 0.666 seconds, mem : 20966
[0m[41mEpisode 1141 with 33 steps || Reward : [0.   0.09] || avg reward :  0.022 || Noise  0.434 || 0.179 seconds, mem : 20999
[0m

episode: 1145/4000  28% ETA:  0:12:26 |\\\\\\\\\\                            | 

[41mEpisode 1145 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.023 || Noise  0.433 || 0.165 seconds, mem : 21074
[0m

episode: 1149/4000  28% ETA:  0:12:26 |||||||||||                            | 

[41mEpisode 1148 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.023 || Noise  0.433 || 0.724 seconds, mem : 21135
[0m

episode: 1153/4000  28% ETA:  0:12:26 |//////////                            | 

[41mEpisode 1154 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.022 || Noise  0.433 || 0.795 seconds, mem : 21239
[0m

episode: 1157/4000  28% ETA:  0:12:27 |----------                            | 

[41mEpisode 1159 with 35 steps || Reward : [0.   0.09] || avg reward :  0.021 || Noise  0.433 || 0.175 seconds, mem : 21331
[0m

episode: 1161/4000  29% ETA:  0:12:27 |\\\\\\\\\\\                           | 

Episode 1160 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.020 || Noise  0.433 || 0.633 seconds, mem : 21345
[0m

episode: 1167/4000  29% ETA:  0:12:28 |///////////                           | 

[41mEpisode 1166 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.021 || Noise  0.433 || 0.765 seconds, mem : 21487
[0m

episode: 1175/4000  29% ETA:  0:12:28 |\\\\\\\\\\\                           | 

[41mEpisode 1174 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.019 || Noise  0.433 || 0.754 seconds, mem : 21619
[0m[41mEpisode 1176 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.020 || Noise  0.433 || 0.703 seconds, mem : 21666
[0m

episode: 1179/4000  29% ETA:  0:12:28 ||||||||||||                           | 

Episode 1180 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.019 || Noise  0.433 || 0.605 seconds, mem : 21723
[0m

episode: 1183/4000  29% ETA:  0:12:28 |///////////                           | 

[41mEpisode 1182 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.019 || Noise  0.433 || 0.738 seconds, mem : 21770
[0m

episode: 1201/4000  30% ETA:  0:12:28 |-----------                           | 

[41mEpisode 1200 with 31 steps || Reward : [0.   0.09] || avg reward :  0.017 || Noise  0.432 || 0.608 seconds, mem : 22043
[0m

episode: 1216/4000  30% ETA:  0:12:30 |\\\\\\\\\\\                           | 

[41mEpisode 1215 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.013 || Noise  0.432 || 0.168 seconds, mem : 22275
[0m

episode: 1219/4000  30% ETA:  0:12:33 ||||||||||||                           | 

[41mEpisode 1218 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.014 || Noise  0.432 || 1.570 seconds, mem : 22336
[0m

episode: 1221/4000  30% ETA:  0:12:34 |///////////                           | 

Episode 1220 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.013 || Noise  0.432 || 1.006 seconds, mem : 22364
[0m[44mEpisode 1222 with 37 steps || Reward : [0.1  0.09] || avg reward :  0.014 || Noise  0.432 || 0.641 seconds, mem : 22415
[0m

episode: 1229/4000  30% ETA:  0:12:32 |\\\\\\\\\\\                           | 

[41mEpisode 1229 with 32 steps || Reward : [0.   0.09] || avg reward :  0.015 || Noise  0.432 || 0.140 seconds, mem : 22532
[0m

episode: 1237/4000  30% ETA:  0:12:31 |///////////                           | 

[41mEpisode 1236 with 33 steps || Reward : [-0.02  0.1 ] || avg reward :  0.015 || Noise  0.431 || 0.633 seconds, mem : 22650
[0m[41mEpisode 1238 with 33 steps || Reward : [0.   0.09] || avg reward :  0.016 || Noise  0.431 || 0.661 seconds, mem : 22697
[0m

episode: 1241/4000  31% ETA:  0:12:31 |-----------                           | 

Episode 1240 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.016 || Noise  0.431 || 0.546 seconds, mem : 22725
[0m

episode: 1249/4000  31% ETA:  0:12:29 ||||||||||||                           | 

[41mEpisode 1250 with 19 steps || Reward : [0.   0.09] || avg reward :  0.014 || Noise  0.431 || 0.552 seconds, mem : 22872
[0m

episode: 1261/4000  31% ETA:  0:12:27 |\\\\\\\\\\\                           | 

Episode 1260 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.012 || Noise  0.431 || 0.568 seconds, mem : 23033
[0m[41mEpisode 1261 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.013 || Noise  0.431 || 0.144 seconds, mem : 23066
[0m

episode: 1277/4000  31% ETA:  0:12:25 |\\\\\\\\\\\\                          | 

[41mEpisode 1276 with 32 steps || Reward : [0.   0.09] || avg reward :  0.011 || Noise  0.431 || 0.612 seconds, mem : 23297
[0m

episode: 1281/4000  32% ETA:  0:12:24 |||||||||||||                          | 

Episode 1280 with 14 steps || Reward : [ 0.   -0.01] || avg reward :  0.011 || Noise  0.431 || 0.553 seconds, mem : 23354
[0m

episode: 1301/4000  32% ETA:  0:12:22 |////////////                          | 

Episode 1300 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.009 || Noise  0.430 || 0.553 seconds, mem : 23638
[0m

episode: 1309/4000  32% ETA:  0:12:20 |\\\\\\\\\\\\                          | 

[41mEpisode 1311 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.010 || Noise  0.430 || 0.139 seconds, mem : 23813
[0m

episode: 1313/4000  32% ETA:  0:12:19 |||||||||||||                          | 

[41mEpisode 1315 with 32 steps || Reward : [0.   0.09] || avg reward :  0.010 || Noise  0.430 || 0.129 seconds, mem : 23888
[0m

episode: 1321/4000  33% ETA:  0:12:19 |------------                          | 

Episode 1320 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.009 || Noise  0.430 || 0.778 seconds, mem : 23965
[0m

episode: 1325/4000  33% ETA:  0:12:19 |\\\\\\\\\\\\                          | 

[41mEpisode 1324 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.009 || Noise  0.430 || 0.825 seconds, mem : 24040
[0m[41mEpisode 1325 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.010 || Noise  0.430 || 0.176 seconds, mem : 24073
[0m

episode: 1329/4000  33% ETA:  0:12:20 |////////////                          | 

[41mEpisode 1328 with 35 steps || Reward : [ 0.1  -0.01] || avg reward :  0.011 || Noise  0.429 || 1.005 seconds, mem : 24136
[0m

episode: 1333/4000  33% ETA:  0:12:20 |------------                          | 

[41mEpisode 1334 with 32 steps || Reward : [0.   0.09] || avg reward :  0.011 || Noise  0.429 || 0.609 seconds, mem : 24258
[0m

episode: 1341/4000  33% ETA:  0:12:18 |||||||||||||                          | 

Episode 1340 with 14 steps || Reward : [-0.01  0.  ] || avg reward :  0.009 || Noise  0.429 || 0.635 seconds, mem : 24343
[0m

episode: 1345/4000  33% ETA:  0:12:18 |////////////                          | 

[41mEpisode 1346 with 33 steps || Reward : [-0.01  0.1 ] || avg reward :  0.010 || Noise  0.429 || 0.728 seconds, mem : 24447
[0m

episode: 1361/4000  34% ETA:  0:12:15 |////////////                          | 

Episode 1360 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.009 || Noise  0.429 || 0.570 seconds, mem : 24646
[0m[41mEpisode 1362 with 20 steps || Reward : [0.   0.09] || avg reward :  0.009 || Noise  0.429 || 0.535 seconds, mem : 24680
[0m

episode: 1365/4000  34% ETA:  0:12:14 |------------                          | 

[41mEpisode 1364 with 32 steps || Reward : [-0.01  0.1 ] || avg reward :  0.010 || Noise  0.429 || 0.597 seconds, mem : 24727
[0m

episode: 1381/4000  34% ETA:  0:12:10 |-------------                         | 

Episode 1380 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.009 || Noise  0.428 || 0.573 seconds, mem : 24955
[0m

episode: 1401/4000  35% ETA:  0:12:06 |\\\\\\\\\\\\\                         | 

Episode 1400 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.009 || Noise  0.428 || 0.523 seconds, mem : 25239
[0m

episode: 1421/4000  35% ETA:  0:12:06 |-------------                         | 

Episode 1420 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.007 || Noise  0.428 || 0.943 seconds, mem : 25523
[0m

episode: 1441/4000  36% ETA:  0:12:02 |\\\\\\\\\\\\\                         | 

Episode 1440 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.003 || Noise  0.427 || 0.533 seconds, mem : 25807
[0m

episode: 1461/4000  36% ETA:  0:11:56 ||||||||||||||                         | 

Episode 1460 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.002 || Noise  0.427 || 0.511 seconds, mem : 26091
[0m

episode: 1481/4000  37% ETA:  0:11:52 |//////////////                        | 

Episode 1480 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.426 || 0.607 seconds, mem : 26375
[0m

episode: 1499/4000  37% ETA:  0:11:50 |--------------                        | 

Episode 1500 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.426 || 0.526 seconds, mem : 26659
[0m

episode: 1519/4000  37% ETA:  0:11:46 |\\\\\\\\\\\\\\                        | 

Episode 1520 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.425 || 0.544 seconds, mem : 26943
[0m

episode: 1539/4000  38% ETA:  0:11:42 |||||||||||||||                        | 

Episode 1540 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.425 || 0.662 seconds, mem : 27227
[0m

episode: 1559/4000  38% ETA:  0:11:40 |//////////////                        | 

Episode 1560 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.425 || 0.688 seconds, mem : 27511
[0m

episode: 1579/4000  39% ETA:  0:11:35 |---------------                       | 

Episode 1580 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.424 || 0.519 seconds, mem : 27795
[0m

episode: 1599/4000  39% ETA:  0:11:30 |\\\\\\\\\\\\\\\                       | 

Episode 1600 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.424 || 0.523 seconds, mem : 28079
[0m

episode: 1619/4000  40% ETA:  0:11:25 ||||||||||||||||                       | 

Episode 1620 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.423 || 0.524 seconds, mem : 28363
[0m

episode: 1639/4000  40% ETA:  0:11:19 |///////////////                       | 

Episode 1640 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.423 || 0.524 seconds, mem : 28647
[0m

episode: 1659/4000  41% ETA:  0:11:14 |---------------                       | 

Episode 1660 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.422 || 0.503 seconds, mem : 28931
[0m

episode: 1679/4000  41% ETA:  0:11:08 |\\\\\\\\\\\\\\\                       | 

Episode 1680 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.422 || 0.541 seconds, mem : 29215
[0m

episode: 1699/4000  42% ETA:  0:11:03 |||||||||||||||||                      | 

Episode 1700 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.422 || 0.563 seconds, mem : 29499
[0m

episode: 1719/4000  42% ETA:  0:10:57 |////////////////                      | 

Episode 1720 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.421 || 0.530 seconds, mem : 29783
[0m

episode: 1739/4000  43% ETA:  0:10:51 |----------------                      | 

Episode 1740 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.421 || 0.519 seconds, mem : 30067
[0m

episode: 1759/4000  43% ETA:  0:10:45 |\\\\\\\\\\\\\\\\                      | 

Episode 1760 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.420 || 0.508 seconds, mem : 30351
[0m

episode: 1779/4000  44% ETA:  0:10:40 |||||||||||||||||                      | 

Episode 1780 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.420 || 0.492 seconds, mem : 30635
[0m

episode: 1799/4000  44% ETA:  0:10:34 |/////////////////                     | 

Episode 1800 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.419 || 0.518 seconds, mem : 30919
[0m

episode: 1819/4000  45% ETA:  0:10:28 |-----------------                     | 

Episode 1820 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.419 || 0.498 seconds, mem : 31203
[0m

episode: 1839/4000  45% ETA:  0:10:23 |\\\\\\\\\\\\\\\\\                     | 

Episode 1840 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.419 || 0.546 seconds, mem : 31487
[0m

episode: 1859/4000  46% ETA:  0:10:17 ||||||||||||||||||                     | 

Episode 1860 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.418 || 0.500 seconds, mem : 31771
[0m

episode: 1879/4000  46% ETA:  0:10:12 |/////////////////                     | 

Episode 1880 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.418 || 0.568 seconds, mem : 32055
[0m

episode: 1899/4000  47% ETA:  0:10:07 |------------------                    | 

Episode 1900 with 15 steps || Reward : [-0.01  0.  ] || avg reward :  0.000 || Noise  0.417 || 0.542 seconds, mem : 32339
[0m

episode: 1919/4000  47% ETA:  0:10:02 |\\\\\\\\\\\\\\\\\\                    | 

Episode 1920 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.417 || 0.594 seconds, mem : 32623
[0m

episode: 1939/4000  48% ETA:  0:09:57 |||||||||||||||||||                    | 

Episode 1940 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.417 || 0.532 seconds, mem : 32907
[0m

episode: 1959/4000  48% ETA:  0:09:52 |//////////////////                    | 

Episode 1960 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.416 || 0.571 seconds, mem : 33191
[0m

episode: 1979/4000  49% ETA:  0:09:47 |------------------                    | 

Episode 1980 with 15 steps || Reward : [ 0.   -0.01] || avg reward :  0.000 || Noise  0.416 || 0.555 seconds, mem : 33475
[0m

episode: 1983/4000  49% ETA:  0:09:45 |\\\\\\\\\\\\\\\\\\                    | 

# 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```