# Continuous Control

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the second project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np
import torch
import random

seed = 1337
random.seed(1337)
np.random.seed(1337)
torch.manual_seed(1337)
torch.backends.cudnn.deterministic = True

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Reacher.app"`
- **Windows** (x86): `"path/to/Reacher_Windows_x86/Reacher.exe"`
- **Windows** (x86_64): `"path/to/Reacher_Windows_x86_64/Reacher.exe"`
- **Linux** (x86): `"path/to/Reacher_Linux/Reacher.x86"`
- **Linux** (x86_64): `"path/to/Reacher_Linux/Reacher.x86_64"`
- **Linux** (x86, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86"`
- **Linux** (x86_64, headless): `"path/to/Reacher_Linux_NoVis/Reacher.x86_64"`

For instance, if you are using a Mac, then you downloaded `Reacher.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Reacher.app")
```

In [2]:
env = UnityEnvironment(file_name='Reacher.app',seed=seed)

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		goal_speed -> 1.0
		goal_size -> 5.0
Unity brain name: ReacherBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 33
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): 4
        Vector Action descriptions: , , , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, a double-jointed arm can move to target locations. A reward of `+0.1` is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of `33` variables corresponding to position, rotation, velocity, and angular velocities of the arm.  Each action is a vector with four numbers, corresponding to torque applicable to two joints.  Every entry in the action vector must be a number between `-1` and `1`.

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 1
Size of each action: 4
There are 1 agents. Each observes a state with length: 33
The state for the first agent looks like: [ 0.00000000e+00 -4.00000000e+00  0.00000000e+00  1.00000000e+00
 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00 -1.00000000e+01  0.00000000e+00
  1.00000000e+00 -0.00000000e+00 -0.00000000e+00 -4.37113883e-08
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  4.81451988e+00 -1.00000000e+00
  6.38908386e+00  0.00000000e+00  1.00000000e+00  0.00000000e+00
  8.53890657e-01]


### 3. Take Random Actions in the Environment

In the next code cell, you will learn how to use the Python API to control the agent and receive feedback from the environment.

Once this cell is executed, you will watch the agent's performance, if it selects an action at random with each time step.  A window should pop up that allows you to observe the agent, as it moves through the environment.  

Of course, as part of the project, you'll have to change the code so that the agent is able to use its experience to gradually choose better actions when interacting with the environment!

In [5]:
# env_info = env.reset(train_mode=False)[brain_name]     # reset the environment    
# states = env_info.vector_observations                  # get the current state (for each agent)
# scores = np.zeros(num_agents)                          # initialize the score (for each agent)
# while True:
#     actions = np.random.randn(num_agents, action_size) # select an action (for each agent)
#     actions = np.clip(actions, -1, 1)                  # all actions between -1 and 1
#     env_info = env.step(actions)[brain_name]           # send all actions to tne environment
#     next_states = env_info.vector_observations         # get next state (for each agent)
#     rewards = env_info.rewards                         # get reward (for each agent)
#     dones = env_info.local_done                        # see if episode finished
#     scores += env_info.rewards                         # update the score (for each agent)
#     states = next_states                               # roll over states to next time step
#     if np.any(dones):                                  # exit loop if episode finished
#         break
# print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))

When finished, you can close the environment.

In [6]:
# env.close()

In [7]:
# Run agent

def run_agent(model_path):
    from ppo_agent import Agent

    brain_name = env.brain_names[0]
    brain = env.brains[brain_name]
    env_info = env.reset(train_mode=False)[brain_name]
    n_observations = env_info.vector_observations.shape[1]
    n_actions = brain.vector_action_space_size
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    agent = Agent(n_observations, n_actions)
    agent.load_state_dict(torch.load(model_path))

    scores = np.zeros(1)                          # initialize the score (for each agent)
    while True:
        obs = torch.Tensor(np.expand_dims(env_info.vector_observations[0], 0))
        with torch.no_grad():
            action, _, _, _ = agent.get_action_and_value(obs)
        torch.clamp(action, -1, 1)
        action = action.numpy()
        env_info = env.step(action)[brain_name]           # send all actions to the environment
        rewards = env_info.rewards                         # get reward (for each agent)
        dones = env_info.local_done                        # see if episode finished
        scores += env_info.rewards                         # update the score (for each agent)
        if np.any(dones):                                  # exit loop if episode finished
            break
    print('Total score (averaged over agents) this episode: {}'.format(np.mean(scores)))
# run_agent('checkpoints/model_step_976.pickle')

### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [8]:
from ppo import run_ppo

run_ppo(env)

update 1/1953. Last update in 3.0994415283203125e-06s
last 100 returns: 0.0
update 2/1953. Last update in 6.53510308265686s
last 100 returns: 0.0
update 3/1953. Last update in 6.0561909675598145s
last 100 returns: 0.0414285705025707
update 4/1953. Last update in 5.958120822906494s
last 100 returns: 0.05333333214124044
update 5/1953. Last update in 5.983843803405762s
last 100 returns: 0.09090908887711438
update 6/1953. Last update in 6.02529501914978s
last 100 returns: 0.12846153559019932
update 7/1953. Last update in 6.023164987564087s
last 100 returns: 0.13066666374603908
update 8/1953. Last update in 6.488737106323242s
last 100 returns: 0.13882352630881703
update 9/1953. Last update in 7.174682140350342s
last 100 returns: 0.1547368386466252
update 10/1953. Last update in 6.8416359424591064s
last 100 returns: 0.1933333290119966
update 11/1953. Last update in 6.2476677894592285s
last 100 returns: 0.20521738671738168
update 12/1953. Last update in 6.2190101146698s
last 100 returns: 0.20

last 100 returns: 4.006399910449982
update 96/1953. Last update in 6.025965929031372s
last 100 returns: 4.152499907184392
update 97/1953. Last update in 6.102081060409546s
last 100 returns: 4.2500999050028625
update 98/1953. Last update in 6.049624919891357s
last 100 returns: 4.331399903185666
update 99/1953. Last update in 6.037513971328735s
last 100 returns: 4.392899901811034
update 100/1953. Last update in 6.048241853713989s
last 100 returns: 4.423199901133776
update 101/1953. Last update in 6.015877962112427s
last 100 returns: 4.466899900157005
update 102/1953. Last update in 6.126468896865845s
last 100 returns: 4.582799897566438
update 103/1953. Last update in 6.085202932357788s
last 100 returns: 4.626799896582961
update 104/1953. Last update in 6.401291847229004s
last 100 returns: 4.680299895387143
update 105/1953. Last update in 6.151695966720581s
last 100 returns: 4.750999893806875
update 106/1953. Last update in 6.1721031665802s
last 100 returns: 4.802699892651289
update 107/1

last 100 returns: 5.94509986711666
update 191/1953. Last update in 6.029892206192017s
last 100 returns: 5.954799866899848
update 192/1953. Last update in 6.062159061431885s
last 100 returns: 5.969099866580218
update 193/1953. Last update in 6.100198268890381s
last 100 returns: 5.983199866265059
update 194/1953. Last update in 6.013911008834839s
last 100 returns: 6.038599865026772
update 195/1953. Last update in 6.03206992149353s
last 100 returns: 6.191799861602485
update 196/1953. Last update in 6.058341979980469s
last 100 returns: 6.15459986243397
update 197/1953. Last update in 6.0831458568573s
last 100 returns: 6.217899861019105
update 198/1953. Last update in 6.055453062057495s
last 100 returns: 6.2922998593561354
update 199/1953. Last update in 6.0471580028533936s
last 100 returns: 6.287799859456718
update 200/1953. Last update in 6.089251756668091s
last 100 returns: 6.363999857753515
update 201/1953. Last update in 6.0117456912994385s
last 100 returns: 6.352099858019501
update 20

last 100 returns: 8.984299799185246
update 286/1953. Last update in 6.077677249908447s
last 100 returns: 8.955699799824506
update 287/1953. Last update in 6.369926929473877s
last 100 returns: 9.170799795016647
update 288/1953. Last update in 6.331512212753296s
last 100 returns: 9.175599794909358
update 289/1953. Last update in 6.8175950050354s
last 100 returns: 9.181699794773012
update 290/1953. Last update in 7.692864894866943s
last 100 returns: 9.165999795123934
update 291/1953. Last update in 6.139258146286011s
last 100 returns: 9.268899792823941
update 292/1953. Last update in 6.607767105102539s
last 100 returns: 9.379799790345132
update 293/1953. Last update in 6.199220180511475s
last 100 returns: 9.520399787202477
update 294/1953. Last update in 6.07744574546814s
last 100 returns: 9.434499789122492
update 295/1953. Last update in 6.0758280754089355s
last 100 returns: 9.507099787499756
update 296/1953. Last update in 6.038191080093384s
last 100 returns: 9.497699787709863
update 29

last 100 returns: 8.061199819818139
update 381/1953. Last update in 6.499131917953491s
last 100 returns: 8.247999815642833
update 382/1953. Last update in 6.13294792175293s
last 100 returns: 8.227099816109986
update 383/1953. Last update in 6.137362003326416s
last 100 returns: 8.037299820352345
update 384/1953. Last update in 6.1246278285980225s
last 100 returns: 7.898799823448062
update 385/1953. Last update in 6.317251920700073s
last 100 returns: 8.00939982097596
update 386/1953. Last update in 6.822179079055786s
last 100 returns: 8.128799818307161
update 387/1953. Last update in 6.917719125747681s
last 100 returns: 8.212799816429616
update 388/1953. Last update in 7.3798508644104s
last 100 returns: 8.163399817533792
update 389/1953. Last update in 6.725473880767822s
last 100 returns: 8.228899816069752
update 390/1953. Last update in 6.290724039077759s
last 100 returns: 8.19319981686771
update 391/1953. Last update in 6.853965997695923s
last 100 returns: 8.349299813378602
update 392/

last 100 returns: 8.292699814643711
update 476/1953. Last update in 8.055217027664185s
last 100 returns: 8.44519981123507
update 477/1953. Last update in 7.779220104217529s
last 100 returns: 8.592599807940424
update 478/1953. Last update in 7.666984796524048s
last 100 returns: 8.537499809172004
update 479/1953. Last update in 6.5980730056762695s
last 100 returns: 8.33619981367141
update 480/1953. Last update in 6.44555401802063s
last 100 returns: 8.402099812198431
update 481/1953. Last update in 6.169730186462402s
last 100 returns: 8.434099811483174
update 482/1953. Last update in 6.625265121459961s
last 100 returns: 8.369799812920391
update 483/1953. Last update in 6.4800190925598145s
last 100 returns: 8.399499812256545
update 484/1953. Last update in 8.731419086456299s
last 100 returns: 8.480399810448288
update 485/1953. Last update in 7.022885084152222s
last 100 returns: 8.521499809529633
update 486/1953. Last update in 7.560588121414185s
last 100 returns: 8.424399811699987
update 4

last 100 returns: 8.009499820973724
update 571/1953. Last update in 6.471693992614746s
last 100 returns: 8.011299820933491
update 572/1953. Last update in 6.25848913192749s
last 100 returns: 7.948099822346121
update 573/1953. Last update in 6.311612844467163s
last 100 returns: 8.102999818883836
update 574/1953. Last update in 6.343396902084351s
last 100 returns: 8.207299816552549
update 575/1953. Last update in 6.384558200836182s
last 100 returns: 8.28499981481582
update 576/1953. Last update in 6.240672588348389s
last 100 returns: 8.447599811181426
update 577/1953. Last update in 6.292611122131348s
last 100 returns: 8.517799809612335
update 578/1953. Last update in 6.505997896194458s
last 100 returns: 8.472299810629337
update 579/1953. Last update in 6.325314044952393s
last 100 returns: 8.434499811474234
update 580/1953. Last update in 6.336679220199585s
last 100 returns: 8.306999814324081
update 581/1953. Last update in 6.280102014541626s
last 100 returns: 8.228299816083164
update 58

last 100 returns: 10.191899772193283
update 666/1953. Last update in 6.375926971435547s
last 100 returns: 10.271499770414085
update 667/1953. Last update in 6.649173021316528s
last 100 returns: 10.237699771169574
update 668/1953. Last update in 6.187244653701782s
last 100 returns: 10.349399768672884
update 669/1953. Last update in 6.283159971237183s
last 100 returns: 10.418999767117203
update 670/1953. Last update in 6.23813009262085s
last 100 returns: 10.44319976657629
update 671/1953. Last update in 6.127094984054565s
last 100 returns: 10.349599768668414
update 672/1953. Last update in 6.219496011734009s
last 100 returns: 10.40089976752177
update 673/1953. Last update in 6.252566814422607s
last 100 returns: 10.641599762141704
update 674/1953. Last update in 6.118491172790527s
last 100 returns: 10.482499765697867
update 675/1953. Last update in 6.12975001335144s
last 100 returns: 10.298599769808352
update 676/1953. Last update in 6.165234804153442s
last 100 returns: 10.33359976902604


last 100 returns: 10.456199766285717
update 760/1953. Last update in 6.389383792877197s
last 100 returns: 10.540099764410407
update 761/1953. Last update in 6.272727012634277s
last 100 returns: 10.613799762763083
update 762/1953. Last update in 6.428269147872925s
last 100 returns: 10.479099765773862
update 763/1953. Last update in 6.35971999168396s
last 100 returns: 10.430099766869098
update 764/1953. Last update in 6.206599950790405s
last 100 returns: 10.195899772103877
update 765/1953. Last update in 6.229691982269287s
last 100 returns: 10.060899775121362
update 766/1953. Last update in 6.254639148712158s
last 100 returns: 10.07559977479279
update 767/1953. Last update in 6.250277757644653s
last 100 returns: 10.071399774886668
update 768/1953. Last update in 6.2418129444122314s
last 100 returns: 10.110499774012714
update 769/1953. Last update in 6.48319411277771s
last 100 returns: 10.085799774564803
update 770/1953. Last update in 7.121348857879639s
last 100 returns: 9.94879977762699

last 100 returns: 9.59629978550598
update 854/1953. Last update in 6.1559669971466064s
last 100 returns: 9.480299788098783
update 855/1953. Last update in 6.170027017593384s
last 100 returns: 9.330099791456014
update 856/1953. Last update in 6.171463966369629s
last 100 returns: 9.462299788501113
update 857/1953. Last update in 6.182687997817993s
last 100 returns: 9.457599788606167
update 858/1953. Last update in 6.160186052322388s
last 100 returns: 9.47859978813678
update 859/1953. Last update in 6.153270959854126s
last 100 returns: 9.499099787678569
update 860/1953. Last update in 6.153846979141235s
last 100 returns: 9.514899787325412
update 861/1953. Last update in 6.205162048339844s
last 100 returns: 9.445899788867683
update 862/1953. Last update in 6.183558940887451s
last 100 returns: 9.425799789316953
update 863/1953. Last update in 6.172865152359009s
last 100 returns: 9.497499787714332
update 864/1953. Last update in 6.159487009048462s
last 100 returns: 9.481499788071961
update 8

last 100 returns: 10.47759976580739
update 949/1953. Last update in 6.183500051498413s
last 100 returns: 10.550699764173478
update 950/1953. Last update in 6.210147857666016s
last 100 returns: 10.629999762400985
update 951/1953. Last update in 6.193317174911499s
last 100 returns: 10.74989975972101
update 952/1953. Last update in 6.2198240756988525s
last 100 returns: 10.746799759790301
update 953/1953. Last update in 6.203282833099365s
last 100 returns: 10.731399760134519
update 954/1953. Last update in 6.2083611488342285s
last 100 returns: 10.582899763453751
update 955/1953. Last update in 6.214582920074463s
last 100 returns: 10.54139976438135
update 956/1953. Last update in 6.184293031692505s
last 100 returns: 10.577999763563275
update 957/1953. Last update in 6.176366806030273s
last 100 returns: 10.52379976477474
update 958/1953. Last update in 6.195585012435913s
last 100 returns: 10.410299767311663
update 959/1953. Last update in 6.211228847503662s
last 100 returns: 10.3298997691087

last 100 returns: 8.930999800376593
update 1043/1953. Last update in 6.171483039855957s
last 100 returns: 8.943999800086022
update 1044/1953. Last update in 6.181201934814453s
last 100 returns: 9.014599798507989
update 1045/1953. Last update in 6.219739198684692s
last 100 returns: 8.971999799460173
update 1046/1953. Last update in 6.173736810684204s
last 100 returns: 8.767899804022163
update 1047/1953. Last update in 6.212549924850464s
last 100 returns: 8.773799803890288
update 1048/1953. Last update in 6.1858580112457275s
last 100 returns: 8.776099803838878
update 1049/1953. Last update in 6.2010979652404785s
last 100 returns: 8.874599801637233
update 1050/1953. Last update in 6.178964853286743s
last 100 returns: 8.941199800148606
update 1051/1953. Last update in 6.201229095458984s
last 100 returns: 8.993199798986316
update 1052/1953. Last update in 6.166290044784546s
last 100 returns: 9.077099797111005
update 1053/1953. Last update in 6.187491178512573s
last 100 returns: 9.0737997971

last 100 returns: 10.45979976620525
update 1136/1953. Last update in 6.405255079269409s
last 100 returns: 10.401399767510593
update 1137/1953. Last update in 6.313764810562134s
last 100 returns: 10.436899766717106
update 1138/1953. Last update in 6.365162134170532s
last 100 returns: 10.348399768695236
update 1139/1953. Last update in 6.450442790985107s
last 100 returns: 10.2759997703135
update 1140/1953. Last update in 7.051335096359253s
last 100 returns: 10.326499769184739
update 1141/1953. Last update in 6.997830152511597s
last 100 returns: 10.362499768380076
update 1142/1953. Last update in 6.646235942840576s
last 100 returns: 10.466599766053259
update 1143/1953. Last update in 6.610998153686523s
last 100 returns: 10.55569976406172
update 1144/1953. Last update in 6.5123279094696045s
last 100 returns: 10.63589976226911
update 1145/1953. Last update in 6.499166965484619s
last 100 returns: 10.581999763473869
update 1146/1953. Last update in 6.75308084487915s
last 100 returns: 10.64249

last 100 returns: 10.719899760391563
update 1229/1953. Last update in 6.405190944671631s
last 100 returns: 10.745599759817123
update 1230/1953. Last update in 6.541152000427246s
last 100 returns: 10.716799760460853
update 1231/1953. Last update in 7.241338014602661s
last 100 returns: 10.65869976175949
update 1232/1953. Last update in 6.4674859046936035s
last 100 returns: 10.628999762423337
update 1233/1953. Last update in 6.287728786468506s
last 100 returns: 10.63569976227358
update 1234/1953. Last update in 6.261089086532593s
last 100 returns: 10.638399762213231
update 1235/1953. Last update in 6.360606908798218s
last 100 returns: 10.678899761307985
update 1236/1953. Last update in 6.3446009159088135s
last 100 returns: 10.746599759794771
update 1237/1953. Last update in 6.212431192398071s
last 100 returns: 10.81369975829497
update 1238/1953. Last update in 6.430752992630005s
last 100 returns: 10.778399759083987
update 1239/1953. Last update in 6.214839935302734s
last 100 returns: 10.7

last 100 returns: 10.437099766712636
update 1322/1953. Last update in 6.636926174163818s
last 100 returns: 10.510099765080959
update 1323/1953. Last update in 6.917179822921753s
last 100 returns: 10.607299762908369
update 1324/1953. Last update in 6.517070055007935s
last 100 returns: 10.62139976259321
update 1325/1953. Last update in 6.407634973526001s
last 100 returns: 10.49419976543635
update 1326/1953. Last update in 6.3356568813323975s
last 100 returns: 10.548899764213711
update 1327/1953. Last update in 7.175986051559448s
last 100 returns: 10.517699764911086
update 1328/1953. Last update in 6.88037896156311s
last 100 returns: 10.478099765796214
update 1329/1953. Last update in 6.88757586479187s
last 100 returns: 10.499199765324592
update 1330/1953. Last update in 6.602360963821411s
last 100 returns: 10.523199764788151
update 1331/1953. Last update in 6.818899154663086s
last 100 returns: 10.5349997645244
update 1332/1953. Last update in 6.895195960998535s
last 100 returns: 10.53309

last 100 returns: 11.296899747494608
update 1415/1953. Last update in 6.2802910804748535s
last 100 returns: 11.120099751446396
update 1416/1953. Last update in 6.87137508392334s
last 100 returns: 11.016399753764272
update 1417/1953. Last update in 8.15005898475647s
last 100 returns: 11.151799750737846
update 1418/1953. Last update in 7.349257946014404s
last 100 returns: 11.184799750000238
update 1419/1953. Last update in 6.960151195526123s
last 100 returns: 11.289299747664481
update 1420/1953. Last update in 7.281775236129761s
last 100 returns: 11.247099748607726
update 1421/1953. Last update in 7.990580081939697s
last 100 returns: 11.37919974565506
update 1422/1953. Last update in 6.681030035018921s
last 100 returns: 11.435199744403363
update 1423/1953. Last update in 6.261751890182495s
last 100 returns: 11.49439974308014
update 1424/1953. Last update in 6.304850816726685s
last 100 returns: 11.65169973956421
update 1425/1953. Last update in 6.411242961883545s
last 100 returns: 11.6187

last 100 returns: 11.406299745049328
update 1508/1953. Last update in 6.528751850128174s
last 100 returns: 11.363999745994807
update 1509/1953. Last update in 6.875740051269531s
last 100 returns: 11.394999745301902
update 1510/1953. Last update in 7.717491865158081s
last 100 returns: 11.422399744689464
update 1511/1953. Last update in 6.7748801708221436s
last 100 returns: 11.615899740364403
update 1512/1953. Last update in 6.730683088302612s
last 100 returns: 11.499099742975085
update 1513/1953. Last update in 7.380641937255859s
last 100 returns: 11.478399743437768
update 1514/1953. Last update in 6.76689600944519s
last 100 returns: 11.651799739561975
update 1515/1953. Last update in 6.495920896530151s
last 100 returns: 11.53129974225536
update 1516/1953. Last update in 6.816948890686035s
last 100 returns: 11.467799743674696
update 1517/1953. Last update in 6.49940824508667s
last 100 returns: 11.541299742031843
update 1518/1953. Last update in 6.606777906417847s
last 100 returns: 11.56

KeyboardInterrupt: 

In [None]:
def copy_model_and_plot_learning_curve():
    import pickle
    import matplotlib.pyplot as plt
    from collections import deque
    import os
    import datetime
    import shutil
    
    datetime_stamp = datetime.datetime.now().strftime('%y%m%d_%H%M')
    plot_path = f'checkpoints/{datetime_stamp}'
    
    if not os.path.exists(plot_path):
        os.makedirs(plot_path)
    else:
        print(f'directory {plot_path} already exists')
        return
    
    shutil.copyfile('checkpoints/eplen_and_returns.pickle', f'{plot_path}/eplen_and_returns.pickle')
    shutil.copyfile('checkpoints/model_step_976.pickle', f'{plot_path}/final_model.pickle')

    with open(f'{plot_path}/eplen_and_returns.pickle', 'rb') as f:
        _, total_rewards = zip(*pickle.load(f))

    smoothed = []
    queue = deque([], maxlen=10)
    for r in total_rewards:
        queue.append(r)
        smoothed.append(sum(queue)/len(queue))
    fig,ax = plt.subplots()
    ax.plot(smoothed)
    ax.set_xlabel('episodes')
    plt.savefig(f'{plot_path}/learning_curve.png')
    plt.show()
copy_model_and_plot_learning_curve()

In [None]:
path = f'checkpoints/03/eplen_and_returns_976.pickle'
import os
print(os.path.dirname(path))

In [None]:
# from ddpg_agent import Agent

# agent = Agent(state_size=33, action_size=4, random_seed=2)
# scores = agent.run_unity_ddpg(env)
# env.close()

# fig = plt.figure()
# ax = fig.add_subplot(111)
# plt.plot(np.arange(1, len(scores)+1), scores)
# plt.ylabel('Score')
# plt.xlabel('Episode #')
# plt.show()