# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [2]:
env = UnityEnvironment(file_name="Tennis.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1.  If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01.  Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents 
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 2
Size of each action: 2
There are 2 agents. Each observes a state with length: 24
The state for the first agent looks like: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]


### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [5]:
from ppo import run_ppo

run_ppo(env)

update 1/10000. finished 0 episodes. Last update in 2.1457672119140625e-06s


  print(f'last 100 returns: {np.array(scores[-100:]).mean()}')
  ret = ret.dtype.type(ret / rcount)


last 100 returns: nan
update 2/10000. finished 0 episodes. Last update in 0.22732281684875488s
last 100 returns: -0.004999999888241291
update 3/10000. finished 1 episodes. Last update in 0.12699103355407715s
last 100 returns: 0.02000000048428774
update 4/10000. finished 2 episodes. Last update in 0.12059497833251953s
last 100 returns: 0.028333333941797417
update 5/10000. finished 3 episodes. Last update in 0.07661914825439453s
last 100 returns: 0.032500000670552254
update 6/10000. finished 4 episodes. Last update in 0.1070408821105957s
last 100 returns: 0.028333333941797417
update 7/10000. finished 6 episodes. Last update in 0.11444091796875s
last 100 returns: 0.02000000048428774
update 8/10000. finished 8 episodes. Last update in 0.11344289779663086s
last 100 returns: 0.022777778303457633
update 9/10000. finished 9 episodes. Last update in 0.12096786499023438s
last 100 returns: 0.025000000558793544
update 10/10000. finished 10 episodes. Last update in 0.10813307762145996s
last 100 ret

last 100 returns: 0.010000000335276127
update 76/10000. finished 101 episodes. Last update in 0.11198997497558594s
last 100 returns: 0.009000000320374965
update 77/10000. finished 102 episodes. Last update in 0.14980196952819824s
last 100 returns: 0.009000000320374965
update 78/10000. finished 104 episodes. Last update in 0.11316418647766113s
last 100 returns: 0.008000000305473804
update 79/10000. finished 106 episodes. Last update in 0.11603403091430664s
last 100 returns: 0.008000000305473804
update 80/10000. finished 108 episodes. Last update in 0.10200285911560059s
last 100 returns: 0.008000000305473804
update 81/10000. finished 109 episodes. Last update in 0.10888099670410156s
last 100 returns: 0.008000000305473804
update 82/10000. finished 111 episodes. Last update in 0.11965107917785645s
last 100 returns: 0.008000000305473804
update 83/10000. finished 112 episodes. Last update in 0.1094200611114502s
last 100 returns: 0.007000000290572643
update 84/10000. finished 114 episodes. La

last 100 returns: 0.03500000070780516
update 148/10000. finished 188 episodes. Last update in 0.11128020286560059s
last 100 returns: 0.036000000722706316
update 149/10000. finished 189 episodes. Last update in 0.11966276168823242s
last 100 returns: 0.03700000073760748
update 150/10000. finished 190 episodes. Last update in 0.10870099067687988s
last 100 returns: 0.03800000075250864
update 151/10000. finished 191 episodes. Last update in 0.11397004127502441s
last 100 returns: 0.03700000073760748
update 152/10000. finished 192 episodes. Last update in 0.11377620697021484s
last 100 returns: 0.03800000075250864
update 153/10000. finished 193 episodes. Last update in 0.11744904518127441s
last 100 returns: 0.0390000007674098
update 154/10000. finished 194 episodes. Last update in 0.11037182807922363s
last 100 returns: 0.04000000078231096
update 155/10000. finished 196 episodes. Last update in 0.07801294326782227s
last 100 returns: 0.0390000007674098
update 156/10000. finished 198 episodes. La

last 100 returns: 0.016000000424683095
update 221/10000. finished 283 episodes. Last update in 0.11811399459838867s
last 100 returns: 0.016000000424683095
update 222/10000. finished 283 episodes. Last update in 0.10709404945373535s
last 100 returns: 0.016000000424683095
update 223/10000. finished 283 episodes. Last update in 0.07580995559692383s
last 100 returns: 0.016000000424683095
update 224/10000. finished 283 episodes. Last update in 0.08261227607727051s
last 100 returns: 0.024000000543892383
update 225/10000. finished 285 episodes. Last update in 0.08368992805480957s
last 100 returns: 0.023000000528991222
update 226/10000. finished 286 episodes. Last update in 0.07987380027770996s
last 100 returns: 0.024000000543892383
update 227/10000. finished 287 episodes. Last update in 0.08319211006164551s
last 100 returns: 0.02200000051409006
update 228/10000. finished 289 episodes. Last update in 0.08040595054626465s
last 100 returns: 0.02200000051409006
update 229/10000. finished 291 epis

last 100 returns: 0.02900000061839819
update 292/10000. finished 362 episodes. Last update in 0.11281323432922363s
last 100 returns: 0.02900000061839819
update 293/10000. finished 363 episodes. Last update in 0.1141500473022461s
last 100 returns: 0.027000000588595866
update 294/10000. finished 365 episodes. Last update in 0.10545492172241211s
last 100 returns: 0.027000000588595866
update 295/10000. finished 367 episodes. Last update in 0.10635805130004883s
last 100 returns: 0.027000000588595866
update 296/10000. finished 368 episodes. Last update in 0.10629105567932129s
last 100 returns: 0.026000000573694705
update 297/10000. finished 369 episodes. Last update in 0.10808229446411133s
last 100 returns: 0.025000000558793544
update 298/10000. finished 371 episodes. Last update in 0.1201331615447998s
last 100 returns: 0.026000000573694705
update 299/10000. finished 372 episodes. Last update in 0.11649894714355469s
last 100 returns: 0.023000000528991222
update 300/10000. finished 374 episod

last 100 returns: 0.021900000516325235
update 364/10000. finished 442 episodes. Last update in 0.12103891372680664s
last 100 returns: 0.021900000516325235
update 365/10000. finished 443 episodes. Last update in 0.10969185829162598s
last 100 returns: 0.02200000051409006
update 366/10000. finished 444 episodes. Last update in 0.11738324165344238s
last 100 returns: 0.02200000051409006
update 367/10000. finished 446 episodes. Last update in 0.10993504524230957s
last 100 returns: 0.02200000051409006
update 368/10000. finished 447 episodes. Last update in 0.11565709114074707s
last 100 returns: 0.02200000051409006
update 369/10000. finished 448 episodes. Last update in 0.1023707389831543s
last 100 returns: 0.02200000051409006
update 370/10000. finished 448 episodes. Last update in 0.10780811309814453s
last 100 returns: 0.02200000051409006
update 371/10000. finished 448 episodes. Last update in 0.11918306350708008s
last 100 returns: 0.025000000558793544
update 372/10000. finished 449 episodes.

last 100 returns: 0.019900000486522913
update 435/10000. finished 531 episodes. Last update in 0.11969208717346191s
last 100 returns: 0.019900000486522913
update 436/10000. finished 533 episodes. Last update in 0.11280584335327148s
last 100 returns: 0.019900000486522913
update 437/10000. finished 534 episodes. Last update in 0.1154170036315918s
last 100 returns: 0.018900000471621752
update 438/10000. finished 535 episodes. Last update in 0.11187505722045898s
last 100 returns: 0.01790000045672059
update 439/10000. finished 537 episodes. Last update in 0.11549592018127441s
last 100 returns: 0.01690000044181943
update 440/10000. finished 539 episodes. Last update in 0.11411881446838379s
last 100 returns: 0.014900000412017106
update 441/10000. finished 541 episodes. Last update in 0.11239314079284668s
last 100 returns: 0.014900000412017106
update 442/10000. finished 542 episodes. Last update in 0.11481595039367676s
last 100 returns: 0.01590000042691827
update 443/10000. finished 543 episod

last 100 returns: 0.012000000365078449
update 507/10000. finished 634 episodes. Last update in 0.11736822128295898s
last 100 returns: 0.011000000350177288
update 508/10000. finished 635 episodes. Last update in 0.11198115348815918s
last 100 returns: 0.010000000335276127
update 509/10000. finished 637 episodes. Last update in 0.11571693420410156s
last 100 returns: 0.010000000335276127
update 510/10000. finished 639 episodes. Last update in 0.09905600547790527s
last 100 returns: 0.010000000335276127
update 511/10000. finished 640 episodes. Last update in 0.11233091354370117s
last 100 returns: 0.010000000335276127
update 512/10000. finished 641 episodes. Last update in 0.11690521240234375s
last 100 returns: 0.010000000335276127
update 513/10000. finished 642 episodes. Last update in 0.11518311500549316s
last 100 returns: 0.011000000350177288
update 514/10000. finished 644 episodes. Last update in 0.11521410942077637s
last 100 returns: 0.011000000350177288
update 515/10000. finished 645 ep

last 100 returns: 0.018800000473856927
update 579/10000. finished 727 episodes. Last update in 0.11344408988952637s
last 100 returns: 0.018800000473856927
update 580/10000. finished 728 episodes. Last update in 0.11577200889587402s
last 100 returns: 0.018800000473856927
update 581/10000. finished 729 episodes. Last update in 0.11510610580444336s
last 100 returns: 0.018800000473856927
update 582/10000. finished 729 episodes. Last update in 0.11111092567443848s
last 100 returns: 0.02180000051856041
update 583/10000. finished 730 episodes. Last update in 0.1165001392364502s
last 100 returns: 0.021700000520795584
update 584/10000. finished 731 episodes. Last update in 0.11446690559387207s
last 100 returns: 0.021700000520795584
update 585/10000. finished 732 episodes. Last update in 0.10942292213439941s
last 100 returns: 0.021700000520795584
update 586/10000. finished 734 episodes. Last update in 0.11691594123840332s
last 100 returns: 0.021700000520795584
update 587/10000. finished 737 epis

last 100 returns: 0.014900000412017106
update 651/10000. finished 830 episodes. Last update in 0.09387421607971191s
last 100 returns: 0.014000000394880772
update 652/10000. finished 832 episodes. Last update in 0.11550402641296387s
last 100 returns: 0.014000000394880772
update 653/10000. finished 833 episodes. Last update in 0.11708617210388184s
last 100 returns: 0.014000000394880772
update 654/10000. finished 835 episodes. Last update in 0.11033082008361816s
last 100 returns: 0.014000000394880772
update 655/10000. finished 837 episodes. Last update in 0.1403813362121582s
last 100 returns: 0.015000000409781934
update 656/10000. finished 838 episodes. Last update in 0.10890483856201172s
last 100 returns: 0.015000000409781934
update 657/10000. finished 840 episodes. Last update in 0.12076020240783691s
last 100 returns: 0.015000000409781934
update 658/10000. finished 842 episodes. Last update in 0.08464598655700684s
last 100 returns: 0.014000000394880772
update 659/10000. finished 844 epi

last 100 returns: 0.012000000365078449
update 722/10000. finished 935 episodes. Last update in 0.08747696876525879s
last 100 returns: 0.01300000037997961
update 723/10000. finished 938 episodes. Last update in 0.12200498580932617s
last 100 returns: 0.01300000037997961
update 724/10000. finished 940 episodes. Last update in 0.11374688148498535s
last 100 returns: 0.01300000037997961
update 725/10000. finished 941 episodes. Last update in 0.11054301261901855s
last 100 returns: 0.01300000037997961
update 726/10000. finished 943 episodes. Last update in 0.11476016044616699s
last 100 returns: 0.011000000350177288
update 727/10000. finished 945 episodes. Last update in 0.11567902565002441s
last 100 returns: 0.010000000335276127
update 728/10000. finished 947 episodes. Last update in 0.11492729187011719s
last 100 returns: 0.009000000320374965
update 729/10000. finished 949 episodes. Last update in 0.1169888973236084s
last 100 returns: 0.009000000320374965
update 730/10000. finished 951 episode

last 100 returns: 0.010000000335276127
update 794/10000. finished 1040 episodes. Last update in 0.11400389671325684s
last 100 returns: 0.010000000335276127
update 795/10000. finished 1041 episodes. Last update in 0.09285092353820801s
last 100 returns: 0.011000000350177288
update 796/10000. finished 1043 episodes. Last update in 0.12084794044494629s
last 100 returns: 0.011000000350177288
update 797/10000. finished 1044 episodes. Last update in 0.11096811294555664s
last 100 returns: 0.011000000350177288
update 798/10000. finished 1045 episodes. Last update in 0.1188969612121582s
last 100 returns: 0.012000000365078449
update 799/10000. finished 1047 episodes. Last update in 0.11295819282531738s
last 100 returns: 0.012000000365078449
update 800/10000. finished 1048 episodes. Last update in 0.11731410026550293s
last 100 returns: 0.01300000037997961
update 801/10000. finished 1050 episodes. Last update in 0.08981132507324219s
last 100 returns: 0.012000000365078449
update 802/10000. finished 

last 100 returns: 0.014000000394880772
update 865/10000. finished 1142 episodes. Last update in 0.11328577995300293s
last 100 returns: 0.015000000409781934
update 866/10000. finished 1144 episodes. Last update in 0.11764073371887207s
last 100 returns: 0.01300000037997961
update 867/10000. finished 1146 episodes. Last update in 0.11108708381652832s
last 100 returns: 0.014000000394880772
update 868/10000. finished 1147 episodes. Last update in 0.11633419990539551s
last 100 returns: 0.01300000037997961
update 869/10000. finished 1149 episodes. Last update in 0.11557197570800781s
last 100 returns: 0.012000000365078449
update 870/10000. finished 1150 episodes. Last update in 0.11448097229003906s
last 100 returns: 0.012000000365078449
update 871/10000. finished 1151 episodes. Last update in 0.11462521553039551s
last 100 returns: 0.01300000037997961
update 872/10000. finished 1152 episodes. Last update in 0.11545610427856445s
last 100 returns: 0.01300000037997961
update 873/10000. finished 11

last 100 returns: 0.005000000260770321
update 936/10000. finished 1252 episodes. Last update in 0.1155998706817627s
last 100 returns: 0.004000000245869159
update 937/10000. finished 1253 episodes. Last update in 0.11314797401428223s
last 100 returns: 0.005000000260770321
update 938/10000. finished 1254 episodes. Last update in 0.11378097534179688s
last 100 returns: 0.006000000275671482
update 939/10000. finished 1255 episodes. Last update in 0.11558914184570312s
last 100 returns: 0.006000000275671482
update 940/10000. finished 1256 episodes. Last update in 0.09471392631530762s
last 100 returns: 0.007000000290572643
update 941/10000. finished 1258 episodes. Last update in 0.11637306213378906s
last 100 returns: 0.007000000290572643
update 942/10000. finished 1260 episodes. Last update in 0.11142206192016602s
last 100 returns: 0.006000000275671482
update 943/10000. finished 1262 episodes. Last update in 0.1206667423248291s
last 100 returns: 0.006000000275671482
update 944/10000. finished 

last 100 returns: -0.0029999998584389685
update 1007/10000. finished 1386 episodes. Last update in 0.1156620979309082s
last 100 returns: -0.00399999987334013
update 1008/10000. finished 1388 episodes. Last update in 0.11300086975097656s
last 100 returns: -0.00399999987334013
update 1009/10000. finished 1391 episodes. Last update in 0.11549782752990723s
last 100 returns: -0.00399999987334013
update 1010/10000. finished 1392 episodes. Last update in 0.09426403045654297s
last 100 returns: -0.0029999998584389685
update 1011/10000. finished 1394 episodes. Last update in 0.11642074584960938s
last 100 returns: -0.0029999998584389685
update 1012/10000. finished 1396 episodes. Last update in 0.11270880699157715s
last 100 returns: -0.0029999998584389685
update 1013/10000. finished 1398 episodes. Last update in 0.12335515022277832s
last 100 returns: -0.0029999998584389685
update 1014/10000. finished 1399 episodes. Last update in 0.08450603485107422s
last 100 returns: -0.0029999998584389685
update

last 100 returns: -0.00399999987334013
update 1077/10000. finished 1526 episodes. Last update in 0.11803889274597168s
last 100 returns: -0.00399999987334013
update 1078/10000. finished 1528 episodes. Last update in 0.16234207153320312s
last 100 returns: -0.00399999987334013
update 1079/10000. finished 1530 episodes. Last update in 0.11707496643066406s
last 100 returns: -0.00399999987334013
update 1080/10000. finished 1532 episodes. Last update in 0.08919215202331543s
last 100 returns: -0.00399999987334013
update 1081/10000. finished 1535 episodes. Last update in 0.11958789825439453s
last 100 returns: -0.00399999987334013
update 1082/10000. finished 1537 episodes. Last update in 0.09761714935302734s
last 100 returns: -0.00399999987334013
update 1083/10000. finished 1539 episodes. Last update in 0.12000799179077148s
last 100 returns: -0.00399999987334013
update 1084/10000. finished 1542 episodes. Last update in 0.1141970157623291s
last 100 returns: -0.00399999987334013
update 1085/10000.

last 100 returns: -0.004999999888241291
update 1147/10000. finished 1683 episodes. Last update in 0.08782601356506348s
last 100 returns: -0.004999999888241291
update 1148/10000. finished 1686 episodes. Last update in 0.12203407287597656s
last 100 returns: -0.004999999888241291
update 1149/10000. finished 1688 episodes. Last update in 0.11965107917785645s
last 100 returns: -0.004999999888241291
update 1150/10000. finished 1690 episodes. Last update in 0.11431431770324707s
last 100 returns: -0.004999999888241291
update 1151/10000. finished 1692 episodes. Last update in 0.11837387084960938s
last 100 returns: -0.004999999888241291
update 1152/10000. finished 1695 episodes. Last update in 0.12945294380187988s
last 100 returns: -0.004999999888241291
update 1153/10000. finished 1697 episodes. Last update in 0.12952208518981934s
last 100 returns: -0.004999999888241291
update 1154/10000. finished 1699 episodes. Last update in 0.08827900886535645s
last 100 returns: -0.004999999888241291
update 1

last 100 returns: -0.004999999888241291
update 1216/10000. finished 1839 episodes. Last update in 0.11234831809997559s
last 100 returns: -0.004999999888241291
update 1217/10000. finished 1841 episodes. Last update in 0.11634302139282227s
last 100 returns: -0.004999999888241291
update 1218/10000. finished 1843 episodes. Last update in 0.11544990539550781s
last 100 returns: -0.004999999888241291
update 1219/10000. finished 1845 episodes. Last update in 0.11643862724304199s
last 100 returns: -0.004999999888241291
update 1220/10000. finished 1848 episodes. Last update in 0.11470723152160645s
last 100 returns: -0.004999999888241291
update 1221/10000. finished 1850 episodes. Last update in 0.11011672019958496s
last 100 returns: -0.004999999888241291
update 1222/10000. finished 1852 episodes. Last update in 0.12398624420166016s
last 100 returns: -0.004999999888241291
update 1223/10000. finished 1854 episodes. Last update in 0.08825325965881348s
last 100 returns: -0.004999999888241291
update 1

last 100 returns: 0.0078000003099441525
update 1285/10000. finished 1965 episodes. Last update in 0.12648415565490723s
last 100 returns: 0.006800000295042991
update 1286/10000. finished 1966 episodes. Last update in 0.129119873046875s
last 100 returns: 0.006800000295042991
update 1287/10000. finished 1968 episodes. Last update in 0.12952613830566406s
last 100 returns: 0.006800000295042991
update 1288/10000. finished 1970 episodes. Last update in 0.08753108978271484s
last 100 returns: 0.00580000028014183
update 1289/10000. finished 1971 episodes. Last update in 0.1285991668701172s
last 100 returns: 0.00580000028014183
update 1290/10000. finished 1973 episodes. Last update in 0.12976598739624023s
last 100 returns: 0.006800000295042991
update 1291/10000. finished 1974 episodes. Last update in 0.08825087547302246s
last 100 returns: 0.006800000295042991
update 1292/10000. finished 1976 episodes. Last update in 0.12205004692077637s
last 100 returns: 0.0078000003099441525
update 1293/10000. f

last 100 returns: -0.00399999987334013
update 1356/10000. finished 2088 episodes. Last update in 0.12095379829406738s
last 100 returns: -0.00399999987334013
update 1357/10000. finished 2091 episodes. Last update in 0.08766293525695801s
last 100 returns: -0.00399999987334013
update 1358/10000. finished 2093 episodes. Last update in 0.1650092601776123s
last 100 returns: -0.00399999987334013
update 1359/10000. finished 2094 episodes. Last update in 0.12180399894714355s
last 100 returns: -0.0029999998584389685
update 1360/10000. finished 2096 episodes. Last update in 0.1252748966217041s
last 100 returns: -0.0029999998584389685
update 1361/10000. finished 2098 episodes. Last update in 0.17327880859375s
last 100 returns: -0.0019999998435378074
update 1362/10000. finished 2099 episodes. Last update in 0.11703705787658691s
last 100 returns: -0.0019999998435378074
update 1363/10000. finished 2101 episodes. Last update in 0.08894801139831543s
last 100 returns: -0.0019999998435378074
update 1364/

last 100 returns: -0.004999999888241291
update 1425/10000. finished 2241 episodes. Last update in 0.09227180480957031s
last 100 returns: -0.004999999888241291
update 1426/10000. finished 2243 episodes. Last update in 0.1222381591796875s
last 100 returns: -0.004999999888241291
update 1427/10000. finished 2245 episodes. Last update in 0.11890196800231934s
last 100 returns: -0.004999999888241291
update 1428/10000. finished 2247 episodes. Last update in 0.12444019317626953s
last 100 returns: -0.004999999888241291
update 1429/10000. finished 2250 episodes. Last update in 0.1182410717010498s
last 100 returns: -0.004999999888241291
update 1430/10000. finished 2252 episodes. Last update in 0.11485910415649414s
last 100 returns: -0.004999999888241291
update 1431/10000. finished 2254 episodes. Last update in 0.11218786239624023s
last 100 returns: -0.004999999888241291
update 1432/10000. finished 2256 episodes. Last update in 0.12018084526062012s
last 100 returns: -0.004999999888241291
update 143

last 100 returns: -0.004999999888241291
update 1495/10000. finished 2398 episodes. Last update in 0.1243131160736084s
last 100 returns: -0.004999999888241291
update 1496/10000. finished 2401 episodes. Last update in 0.11442875862121582s
last 100 returns: -0.004999999888241291
update 1497/10000. finished 2403 episodes. Last update in 0.13141989707946777s
last 100 returns: -0.004999999888241291
update 1498/10000. finished 2405 episodes. Last update in 0.11213517189025879s
last 100 returns: -0.004999999888241291
update 1499/10000. finished 2407 episodes. Last update in 0.11775708198547363s
last 100 returns: -0.004999999888241291
update 1500/10000. finished 2410 episodes. Last update in 0.12918710708618164s
last 100 returns: -0.004999999888241291
update 1501/10000. finished 2412 episodes. Last update in 0.09278106689453125s
last 100 returns: -0.004999999888241291
update 1502/10000. finished 2414 episodes. Last update in 0.1193079948425293s
last 100 returns: -0.004999999888241291
update 150

last 100 returns: -0.004999999888241291
update 1565/10000. finished 2556 episodes. Last update in 0.15805482864379883s
last 100 returns: -0.004999999888241291
update 1566/10000. finished 2558 episodes. Last update in 0.11721920967102051s
last 100 returns: -0.004999999888241291
update 1567/10000. finished 2561 episodes. Last update in 0.1254117488861084s
last 100 returns: -0.004999999888241291
update 1568/10000. finished 2563 episodes. Last update in 0.12211894989013672s
last 100 returns: -0.004999999888241291
update 1569/10000. finished 2565 episodes. Last update in 0.11919999122619629s
last 100 returns: -0.004999999888241291
update 1570/10000. finished 2567 episodes. Last update in 0.08946895599365234s
last 100 returns: -0.004999999888241291
update 1571/10000. finished 2570 episodes. Last update in 0.12602019309997559s
last 100 returns: -0.004999999888241291
update 1572/10000. finished 2572 episodes. Last update in 0.12614130973815918s
last 100 returns: -0.004999999888241291
update 15

last 100 returns: -0.004999999888241291
update 1634/10000. finished 2712 episodes. Last update in 0.09290599822998047s
last 100 returns: -0.004999999888241291
update 1635/10000. finished 2714 episodes. Last update in 0.11720967292785645s
last 100 returns: -0.004999999888241291
update 1636/10000. finished 2716 episodes. Last update in 0.13230395317077637s
last 100 returns: -0.004999999888241291
update 1637/10000. finished 2718 episodes. Last update in 0.0908668041229248s
last 100 returns: -0.004999999888241291
update 1638/10000. finished 2721 episodes. Last update in 0.13334989547729492s
last 100 returns: -0.004999999888241291
update 1639/10000. finished 2723 episodes. Last update in 0.14002108573913574s
last 100 returns: -0.004999999888241291
update 1640/10000. finished 2725 episodes. Last update in 0.09123396873474121s
last 100 returns: -0.004999999888241291
update 1641/10000. finished 2727 episodes. Last update in 0.11968374252319336s
last 100 returns: -0.004999999888241291
update 16

last 100 returns: -0.004999999888241291
update 1703/10000. finished 2867 episodes. Last update in 0.12179875373840332s
last 100 returns: -0.004999999888241291
update 1704/10000. finished 2869 episodes. Last update in 0.12890005111694336s
last 100 returns: -0.004999999888241291
update 1705/10000. finished 2872 episodes. Last update in 0.11787271499633789s
last 100 returns: -0.004999999888241291
update 1706/10000. finished 2874 episodes. Last update in 0.09168696403503418s
last 100 returns: -0.004999999888241291
update 1707/10000. finished 2876 episodes. Last update in 0.11703705787658691s
last 100 returns: -0.004999999888241291
update 1708/10000. finished 2878 episodes. Last update in 0.10207891464233398s
last 100 returns: -0.004999999888241291
update 1709/10000. finished 2881 episodes. Last update in 0.09136605262756348s
last 100 returns: -0.004999999888241291
update 1710/10000. finished 2883 episodes. Last update in 0.12116813659667969s
last 100 returns: -0.004999999888241291
update 1

last 100 returns: -0.004999999888241291
update 1772/10000. finished 3023 episodes. Last update in 0.14780902862548828s
last 100 returns: -0.004999999888241291
update 1773/10000. finished 3025 episodes. Last update in 0.13604497909545898s
last 100 returns: -0.004999999888241291
update 1774/10000. finished 3027 episodes. Last update in 0.13280701637268066s
last 100 returns: -0.004999999888241291
update 1775/10000. finished 3029 episodes. Last update in 0.13106608390808105s
last 100 returns: -0.004999999888241291
update 1776/10000. finished 3032 episodes. Last update in 0.13393306732177734s
last 100 returns: -0.004999999888241291
update 1777/10000. finished 3034 episodes. Last update in 0.12688422203063965s
last 100 returns: -0.004999999888241291
update 1778/10000. finished 3036 episodes. Last update in 0.11388492584228516s
last 100 returns: -0.004999999888241291
update 1779/10000. finished 3038 episodes. Last update in 0.1442420482635498s
last 100 returns: -0.004999999888241291
update 17

KeyboardInterrupt: 

In [None]:
def copy_model_and_plot_learning_curve():
    import pickle
    import matplotlib.pyplot as plt
    from collections import deque
    import os
    import datetime
    import shutil
    
    datetime_stamp = datetime.datetime.now().strftime('%y%m%d_%H%M')
    plot_path = f'checkpoints/{datetime_stamp}'
    
    if not os.path.exists(plot_path):
        os.makedirs(plot_path)
    else:
        print(f'directory {plot_path} already exists')
        return
    
    shutil.copyfile(f'{brain_name}_scores.pickle', f'{plot_path}/scores.pickle')
    shutil.copyfile(f'{brain_name}_model_checkpoint.pickle', f'{plot_path}/model.pickle')

    with open(f'{plot_path}/scores.pickle', 'rb') as f:
        total_rewards = pickle.load(f)

    smoothed = []
    queue = deque([], maxlen=10)
    for r in total_rewards:
        queue.append(r)
        smoothed.append(sum(queue)/len(queue))
    fig,ax = plt.subplots()
    ax.plot(smoothed)
    ax.set_xlabel('total episodes (across all agents)')
    plt.savefig(f'{plot_path}/learning_curve.png')
    plt.show()
copy_model_and_plot_learning_curve()