# Collaboration and Competition

---

In this notebook, you will learn how to use the Unity ML-Agents environment for the third project of the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.

### 1. Start the Environment

We begin by importing the necessary packages.  If the code cell below returns an error, please revisit the project instructions to double-check that you have installed [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Installation.md) and [NumPy](http://www.numpy.org/).

In [1]:
from unityagents import UnityEnvironment
import numpy as np

Next, we will start the environment!  **_Before running the code cell below_**, change the `file_name` parameter to match the location of the Unity environment that you downloaded.

- **Mac**: `"path/to/Tennis.app"`
- **Windows** (x86): `"path/to/Tennis_Windows_x86/Tennis.exe"`
- **Windows** (x86_64): `"path/to/Tennis_Windows_x86_64/Tennis.exe"`
- **Linux** (x86): `"path/to/Tennis_Linux/Tennis.x86"`
- **Linux** (x86_64): `"path/to/Tennis_Linux/Tennis.x86_64"`
- **Linux** (x86, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86"`
- **Linux** (x86_64, headless): `"path/to/Tennis_Linux_NoVis/Tennis.x86_64"`

For instance, if you are using a Mac, then you downloaded `Tennis.app`.  If this file is in the same folder as the notebook, then the line below should appear as follows:
```
env = UnityEnvironment(file_name="Tennis.app")
```

In [2]:
env = UnityEnvironment(file_name="Tennis.app")

INFO:unityagents:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 1
        Number of External Brains : 1
        Lesson number : 0
        Reset Parameters :
		
Unity brain name: TennisBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space type: continuous
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 3
        Vector Action space type: continuous
        Vector Action space size (per agent): 2
        Vector Action descriptions: , 


Environments contain **_brains_** which are responsible for deciding the actions of their associated agents. Here we check for the first brain available, and set it as the default brain we will be controlling from Python.

In [3]:
# get the default brain
brain_name = env.brain_names[0]
brain = env.brains[brain_name]

### 2. Examine the State and Action Spaces

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1.  If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01.  Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping. 

Run the code cell below to print some information about the environment.

In [4]:
# reset the environment
env_info = env.reset(train_mode=True)[brain_name]

# number of agents 
num_agents = len(env_info.agents)
print('Number of agents:', num_agents)

# size of each action
action_size = brain.vector_action_space_size
print('Size of each action:', action_size)

# examine the state space 
states = env_info.vector_observations
state_size = states.shape[1]
print('There are {} agents. Each observes a state with length: {}'.format(states.shape[0], state_size))
print('The state for the first agent looks like:', states[0])

Number of agents: 2
Size of each action: 2
There are 2 agents. Each observes a state with length: 24
The state for the first agent looks like: [ 0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.         -6.65278625 -1.5
 -0.          0.          6.83172083  6.         -0.          0.        ]


### 4. It's Your Turn!

Now it's your turn to train your own agent to solve the environment!  When training the environment, set `train_mode=True`, so that the line for resetting the environment looks like the following:
```python
env_info = env.reset(train_mode=True)[brain_name]
```

In [None]:
from ppo import run_ppo

run_ppo(env)

update 1/2000. Last update in 1.9073486328125e-06s
last 100 returns: 0.04500000085681677
update 2/2000. Last update in 0.360882043838501s
last 100 returns: 0.032500000670552254
update 3/2000. Last update in 0.20538592338562012s
last 100 returns: 0.036666667399307094
update 4/2000. Last update in 0.2036130428314209s
last 100 returns: 0.0338888895801372
update 5/2000. Last update in 0.20710015296936035s
last 100 returns: 0.028333333941797417
update 6/2000. Last update in 0.21252989768981934s
last 100 returns: 0.01852941222708015
update 7/2000. Last update in 0.21431612968444824s
last 100 returns: 0.014047619443209399
update 8/2000. Last update in 0.20713591575622559s
last 100 returns: 0.010384615725622727
update 9/2000. Last update in 0.216691255569458s
last 100 returns: 0.00879310376556783
update 10/2000. Last update in 0.19997119903564453s
last 100 returns: 0.01015151548904903
update 11/2000. Last update in 0.16592788696289062s
last 100 returns: 0.010277778117193116
update 12/2000. Las

last 100 returns: 0.026000000573694705
update 93/2000. Last update in 0.21378779411315918s
last 100 returns: 0.027000000588595866
update 94/2000. Last update in 0.18922090530395508s
last 100 returns: 0.027000000588595866
update 95/2000. Last update in 0.17388486862182617s
last 100 returns: 0.026000000573694705
update 96/2000. Last update in 0.2051100730895996s
last 100 returns: 0.02900000061839819
update 97/2000. Last update in 0.2146899700164795s
last 100 returns: 0.027000000588595866
update 98/2000. Last update in 0.21314096450805664s
last 100 returns: 0.026000000573694705
update 99/2000. Last update in 0.21217012405395508s
last 100 returns: 0.023000000528991222
update 100/2000. Last update in 0.188978910446167s
last 100 returns: 0.02000000048428774
update 101/2000. Last update in 0.18559718132019043s
last 100 returns: 0.02000000048428774
update 102/2000. Last update in 0.19971203804016113s
last 100 returns: 0.02000000048428774
update 103/2000. Last update in 0.20154190063476562s
las

last 100 returns: 0.048000000901520255
update 185/2000. Last update in 0.21482181549072266s
last 100 returns: 0.04900000091642141
update 186/2000. Last update in 0.20876812934875488s
last 100 returns: 0.04900000091642141
update 187/2000. Last update in 0.21415400505065918s
last 100 returns: 0.05000000093132258
update 188/2000. Last update in 0.21271133422851562s
last 100 returns: 0.05000000093132258
update 189/2000. Last update in 0.2138669490814209s
last 100 returns: 0.051000000946223735
update 190/2000. Last update in 0.208082914352417s
last 100 returns: 0.051000000946223735
update 191/2000. Last update in 0.19140100479125977s
last 100 returns: 0.051000000946223735
update 192/2000. Last update in 0.2019062042236328s
last 100 returns: 0.05000000093132258
update 193/2000. Last update in 0.20686006546020508s
last 100 returns: 0.05000000093132258
update 194/2000. Last update in 0.19661378860473633s
last 100 returns: 0.0520000009611249
update 195/2000. Last update in 0.21713685989379883s


last 100 returns: 0.04000000078231096
update 276/2000. Last update in 0.21421003341674805s
last 100 returns: 0.0390000007674098
update 277/2000. Last update in 0.21150732040405273s
last 100 returns: 0.04000000078231096
update 278/2000. Last update in 0.21101593971252441s
last 100 returns: 0.044000000841915604
update 279/2000. Last update in 0.2142322063446045s
last 100 returns: 0.04300000082701445
update 280/2000. Last update in 0.19704794883728027s
last 100 returns: 0.04600000087171793
update 281/2000. Last update in 0.21329617500305176s
last 100 returns: 0.04500000085681677
update 282/2000. Last update in 0.20276618003845215s
last 100 returns: 0.04500000085681677
update 283/2000. Last update in 0.15818333625793457s
last 100 returns: 0.048000000901520255
update 284/2000. Last update in 0.18929004669189453s
last 100 returns: 0.04200000081211328
update 285/2000. Last update in 0.2059011459350586s
last 100 returns: 0.041000000797212124
update 286/2000. Last update in 0.17667293548583984s

last 100 returns: 0.02000000048428774
update 366/2000. Last update in 0.20968198776245117s
last 100 returns: 0.0210000004991889
update 367/2000. Last update in 0.13994193077087402s
last 100 returns: 0.02000000048428774
update 368/2000. Last update in 0.18954992294311523s
last 100 returns: 0.019000000469386578
update 369/2000. Last update in 0.21116995811462402s
last 100 returns: 0.0210000004991889
update 370/2000. Last update in 0.21216487884521484s
last 100 returns: 0.0210000004991889
update 371/2000. Last update in 0.2125082015991211s
last 100 returns: 0.02000000048428774
update 372/2000. Last update in 0.2124497890472412s
last 100 returns: 0.018000000454485417
update 373/2000. Last update in 0.1997377872467041s
last 100 returns: 0.018000000454485417
update 374/2000. Last update in 0.20682001113891602s
last 100 returns: 0.019000000469386578
update 375/2000. Last update in 0.21431398391723633s
last 100 returns: 0.016000000424683095
update 376/2000. Last update in 0.21632003784179688s


last 100 returns: 0.005000000260770321
update 457/2000. Last update in 0.20776104927062988s
last 100 returns: 0.006000000275671482
update 458/2000. Last update in 0.21040797233581543s
last 100 returns: 0.006000000275671482
update 459/2000. Last update in 0.215925931930542s
last 100 returns: 0.006000000275671482
update 460/2000. Last update in 0.21092915534973145s
last 100 returns: 0.006000000275671482
update 461/2000. Last update in 0.21545124053955078s
last 100 returns: 0.006000000275671482
update 462/2000. Last update in 0.21242499351501465s
last 100 returns: 0.007000000290572643
update 463/2000. Last update in 0.1871051788330078s
last 100 returns: 0.006000000275671482
update 464/2000. Last update in 0.16712093353271484s
last 100 returns: 0.005000000260770321
update 465/2000. Last update in 0.21488595008850098s
last 100 returns: 0.004000000245869159
update 466/2000. Last update in 0.20998311042785645s
last 100 returns: 0.002000000216066837
update 467/2000. Last update in 0.2163038253

last 100 returns: 0.011000000350177288
update 547/2000. Last update in 0.21094608306884766s
last 100 returns: 0.011000000350177288
update 548/2000. Last update in 0.15669679641723633s
last 100 returns: 0.011000000350177288
update 549/2000. Last update in 0.18851113319396973s
last 100 returns: 0.011000000350177288
update 550/2000. Last update in 0.20215892791748047s
last 100 returns: 0.012000000365078449
update 551/2000. Last update in 0.19894909858703613s
last 100 returns: 0.010000000335276127
update 552/2000. Last update in 0.21700215339660645s
last 100 returns: 0.010000000335276127
update 553/2000. Last update in 0.17370986938476562s
last 100 returns: 0.009000000320374965
update 554/2000. Last update in 0.21733713150024414s
last 100 returns: 0.006000000275671482
update 555/2000. Last update in 0.21501493453979492s
last 100 returns: 0.005000000260770321
update 556/2000. Last update in 0.17811799049377441s
last 100 returns: 0.0030000002309679987
update 557/2000. Last update in 0.192127

last 100 returns: 0.010000000335276127
update 637/2000. Last update in 0.21316289901733398s
last 100 returns: 0.011000000350177288
update 638/2000. Last update in 0.21415996551513672s
last 100 returns: 0.011000000350177288
update 639/2000. Last update in 0.21018505096435547s
last 100 returns: 0.012000000365078449
update 640/2000. Last update in 0.19364619255065918s
last 100 returns: 0.01300000037997961
update 641/2000. Last update in 0.21317791938781738s
last 100 returns: 0.01300000037997961
update 642/2000. Last update in 0.21762800216674805s
last 100 returns: 0.015000000409781934
update 643/2000. Last update in 0.22905492782592773s
last 100 returns: 0.015000000409781934
update 644/2000. Last update in 0.21550893783569336s
last 100 returns: 0.016000000424683095
update 645/2000. Last update in 0.20740795135498047s
last 100 returns: 0.015000000409781934
update 646/2000. Last update in 0.23173999786376953s
last 100 returns: 0.016000000424683095
update 647/2000. Last update in 0.209970951

last 100 returns: 0.015000000409781934
update 727/2000. Last update in 0.21547698974609375s
last 100 returns: 0.017000000439584256
update 728/2000. Last update in 0.1565990447998047s
last 100 returns: 0.017000000439584256
update 729/2000. Last update in 0.20336484909057617s
last 100 returns: 0.018000000454485417
update 730/2000. Last update in 0.21202731132507324s
last 100 returns: 0.018000000454485417
update 731/2000. Last update in 0.2036740779876709s
last 100 returns: 0.018000000454485417
update 732/2000. Last update in 0.21889901161193848s
last 100 returns: 0.018000000454485417
update 733/2000. Last update in 0.21503686904907227s
last 100 returns: 0.019000000469386578
update 734/2000. Last update in 0.21232295036315918s
last 100 returns: 0.016000000424683095
update 735/2000. Last update in 0.15359807014465332s
last 100 returns: 0.015000000409781934
update 736/2000. Last update in 0.22013092041015625s
last 100 returns: 0.015000000409781934
update 737/2000. Last update in 0.188860893

last 100 returns: 0.019000000469386578
update 817/2000. Last update in 0.21392369270324707s
last 100 returns: 0.02000000048428774
update 818/2000. Last update in 0.20548009872436523s
last 100 returns: 0.0210000004991889
update 819/2000. Last update in 0.2177417278289795s
last 100 returns: 0.0210000004991889
update 820/2000. Last update in 0.21524310111999512s
last 100 returns: 0.0210000004991889
update 821/2000. Last update in 0.21431803703308105s
last 100 returns: 0.023000000528991222
update 822/2000. Last update in 0.15239810943603516s
last 100 returns: 0.024000000543892383
update 823/2000. Last update in 0.14680790901184082s
last 100 returns: 0.025000000558793544
update 824/2000. Last update in 0.1415407657623291s
last 100 returns: 0.023000000528991222
update 825/2000. Last update in 0.15109705924987793s
last 100 returns: 0.024000000543892383
update 826/2000. Last update in 0.22027111053466797s
last 100 returns: 0.024000000543892383
update 827/2000. Last update in 0.2083108425140380

last 100 returns: 0.02200000051409006
update 907/2000. Last update in 0.2115330696105957s
last 100 returns: 0.02200000051409006
update 908/2000. Last update in 0.2130429744720459s
last 100 returns: 0.02200000051409006
update 909/2000. Last update in 0.2191781997680664s
last 100 returns: 0.023000000528991222
update 910/2000. Last update in 0.21454095840454102s
last 100 returns: 0.023000000528991222
update 911/2000. Last update in 0.17617392539978027s
last 100 returns: 0.024000000543892383
update 912/2000. Last update in 0.22103667259216309s
last 100 returns: 0.025000000558793544
update 913/2000. Last update in 0.2126140594482422s
last 100 returns: 0.026000000573694705
update 914/2000. Last update in 0.21333980560302734s
last 100 returns: 0.026000000573694705
update 915/2000. Last update in 0.18520498275756836s
last 100 returns: 0.027000000588595866
update 916/2000. Last update in 0.2204761505126953s
last 100 returns: 0.027000000588595866
update 917/2000. Last update in 0.194307088851928

last 100 returns: 0.0210000004991889
update 998/2000. Last update in 0.22175192832946777s
last 100 returns: 0.018000000454485417
update 999/2000. Last update in 0.21243691444396973s
last 100 returns: 0.015000000409781934
update 1000/2000. Last update in 0.2281968593597412s
last 100 returns: 0.014000000394880772
update 1001/2000. Last update in 0.2084343433380127s
last 100 returns: 0.016000000424683095
update 1002/2000. Last update in 0.21280789375305176s
last 100 returns: 0.017000000439584256
update 1003/2000. Last update in 0.2221989631652832s
last 100 returns: 0.017000000439584256
update 1004/2000. Last update in 0.22380518913269043s
last 100 returns: 0.016000000424683095
update 1005/2000. Last update in 0.15201306343078613s
last 100 returns: 0.017000000439584256
update 1006/2000. Last update in 0.224290132522583s
last 100 returns: 0.018000000454485417
update 1007/2000. Last update in 0.20948290824890137s
last 100 returns: 0.02000000048428774
update 1008/2000. Last update in 0.213173

last 100 returns: 0.03100000064820051
update 1087/2000. Last update in 0.20059800148010254s
last 100 returns: 0.03100000064820051
update 1088/2000. Last update in 0.15159893035888672s
last 100 returns: 0.03200000066310167
update 1089/2000. Last update in 0.22196412086486816s
last 100 returns: 0.03200000066310167
update 1090/2000. Last update in 0.17793703079223633s
last 100 returns: 0.033000000678002836
update 1091/2000. Last update in 0.2117171287536621s
last 100 returns: 0.034000000692903994
update 1092/2000. Last update in 0.23396825790405273s
last 100 returns: 0.034000000692903994
update 1093/2000. Last update in 0.2658817768096924s
last 100 returns: 0.036000000722706316
update 1094/2000. Last update in 0.20110297203063965s
last 100 returns: 0.036000000722706316
update 1095/2000. Last update in 0.22302889823913574s
last 100 returns: 0.036000000722706316
update 1096/2000. Last update in 0.23077392578125s
last 100 returns: 0.03700000073760748
update 1097/2000. Last update in 0.220142

last 100 returns: 0.0390000007674098
update 1177/2000. Last update in 0.21296095848083496s
last 100 returns: 0.0390000007674098
update 1178/2000. Last update in 0.22790193557739258s
last 100 returns: 0.03800000075250864
update 1179/2000. Last update in 0.2283039093017578s
last 100 returns: 0.03700000073760748
update 1180/2000. Last update in 0.22936010360717773s
last 100 returns: 0.03700000073760748
update 1181/2000. Last update in 0.15519428253173828s
last 100 returns: 0.036000000722706316
update 1182/2000. Last update in 0.21961498260498047s
last 100 returns: 0.03700000073760748
update 1183/2000. Last update in 0.15485405921936035s
last 100 returns: 0.03700000073760748
update 1184/2000. Last update in 0.20428800582885742s
last 100 returns: 0.036000000722706316
update 1185/2000. Last update in 0.21182584762573242s
last 100 returns: 0.036000000722706316
update 1186/2000. Last update in 0.21230220794677734s
last 100 returns: 0.03500000070780516
update 1187/2000. Last update in 0.2094249

last 100 returns: 0.033000000678002836
update 1267/2000. Last update in 0.17903804779052734s
last 100 returns: 0.03200000066310167
update 1268/2000. Last update in 0.15843510627746582s
last 100 returns: 0.03100000064820051
update 1269/2000. Last update in 0.2148139476776123s
last 100 returns: 0.03000000063329935
update 1270/2000. Last update in 0.21852517127990723s
last 100 returns: 0.027000000588595866
update 1271/2000. Last update in 0.15517592430114746s
last 100 returns: 0.025000000558793544
update 1272/2000. Last update in 0.1638810634613037s
last 100 returns: 0.024000000543892383
update 1273/2000. Last update in 0.2239370346069336s
last 100 returns: 0.02200000051409006
update 1274/2000. Last update in 0.22625207901000977s
last 100 returns: 0.02200000051409006
update 1275/2000. Last update in 0.15787124633789062s
last 100 returns: 0.023000000528991222
update 1276/2000. Last update in 0.1907792091369629s
last 100 returns: 0.02200000051409006
update 1277/2000. Last update in 0.218939

last 100 returns: 0.034000000692903994
update 1357/2000. Last update in 0.15332770347595215s
last 100 returns: 0.03200000066310167
update 1358/2000. Last update in 0.22337794303894043s
last 100 returns: 0.03000000063329935
update 1359/2000. Last update in 0.1534130573272705s
last 100 returns: 0.03100000064820051
update 1360/2000. Last update in 0.22067999839782715s
last 100 returns: 0.028000000603497027
update 1361/2000. Last update in 0.15192413330078125s
last 100 returns: 0.03100000064820051
update 1362/2000. Last update in 0.21625804901123047s
last 100 returns: 0.03100000064820051
update 1363/2000. Last update in 0.21935009956359863s
last 100 returns: 0.03100000064820051
update 1364/2000. Last update in 0.15007710456848145s
last 100 returns: 0.03200000066310167
update 1365/2000. Last update in 0.22806692123413086s
last 100 returns: 0.03200000066310167
update 1366/2000. Last update in 0.1498711109161377s
last 100 returns: 0.03000000063329935
update 1367/2000. Last update in 0.2242691

last 100 returns: 0.04600000087171793
update 1447/2000. Last update in 0.17572712898254395s
last 100 returns: 0.04500000085681677
update 1448/2000. Last update in 0.22403407096862793s
last 100 returns: 0.044000000841915604
update 1449/2000. Last update in 0.1543738842010498s
last 100 returns: 0.0390000007674098
update 1450/2000. Last update in 0.2200021743774414s
last 100 returns: 0.03800000075250864
update 1451/2000. Last update in 0.2270829677581787s
last 100 returns: 0.0390000007674098
update 1452/2000. Last update in 0.1947948932647705s
last 100 returns: 0.03800000075250864
update 1453/2000. Last update in 0.19658923149108887s
last 100 returns: 0.03800000075250864
update 1454/2000. Last update in 0.15698623657226562s
last 100 returns: 0.034000000692903994
update 1455/2000. Last update in 0.21999001502990723s
last 100 returns: 0.033000000678002836
update 1456/2000. Last update in 0.15560293197631836s
last 100 returns: 0.03100000064820051
update 1457/2000. Last update in 0.2186019420

last 100 returns: 0.034000000692903994
update 1537/2000. Last update in 0.22978615760803223s
last 100 returns: 0.03500000070780516
update 1538/2000. Last update in 0.15075397491455078s
last 100 returns: 0.03700000073760748
update 1539/2000. Last update in 0.22681498527526855s
last 100 returns: 0.03500000070780516
update 1540/2000. Last update in 0.22522282600402832s
last 100 returns: 0.03000000063329935
update 1541/2000. Last update in 0.23350095748901367s
last 100 returns: 0.03100000064820051
update 1542/2000. Last update in 0.14568614959716797s
last 100 returns: 0.03000000063329935
update 1543/2000. Last update in 0.22054290771484375s
last 100 returns: 0.03100000064820051
update 1544/2000. Last update in 0.1564009189605713s
last 100 returns: 0.03200000066310167
update 1545/2000. Last update in 0.17997193336486816s
last 100 returns: 0.033000000678002836
update 1546/2000. Last update in 0.18033194541931152s
last 100 returns: 0.03200000066310167
update 1547/2000. Last update in 0.219518

last 100 returns: 0.03500000070780516
update 1627/2000. Last update in 0.21814584732055664s
last 100 returns: 0.03500000070780516
update 1628/2000. Last update in 0.15871810913085938s
last 100 returns: 0.034000000692903994
update 1629/2000. Last update in 0.1800670623779297s
last 100 returns: 0.034000000692903994
update 1630/2000. Last update in 0.22014808654785156s
last 100 returns: 0.03200000066310167
update 1631/2000. Last update in 0.2319660186767578s
last 100 returns: 0.03200000066310167
update 1632/2000. Last update in 0.15076303482055664s
last 100 returns: 0.03100000064820051
update 1633/2000. Last update in 0.1981642246246338s
last 100 returns: 0.03100000064820051
update 1634/2000. Last update in 0.15459871292114258s
last 100 returns: 0.03200000066310167
update 1635/2000. Last update in 0.22350811958312988s
last 100 returns: 0.033000000678002836
update 1636/2000. Last update in 0.1590731143951416s
last 100 returns: 0.03500000070780516
update 1637/2000. Last update in 0.15732598

last 100 returns: 0.03500000070780516
update 1716/2000. Last update in 0.23117589950561523s
last 100 returns: 0.03700000073760748
update 1717/2000. Last update in 0.20870208740234375s
last 100 returns: 0.03700000073760748
update 1718/2000. Last update in 0.1793808937072754s
last 100 returns: 0.03800000075250864
update 1719/2000. Last update in 0.23074984550476074s
last 100 returns: 0.0390000007674098
update 1720/2000. Last update in 0.22510218620300293s
last 100 returns: 0.036000000722706316
update 1721/2000. Last update in 0.16100692749023438s
last 100 returns: 0.03500000070780516
update 1722/2000. Last update in 0.21822881698608398s
last 100 returns: 0.03500000070780516
update 1723/2000. Last update in 0.2328031063079834s
last 100 returns: 0.03500000070780516
update 1724/2000. Last update in 0.19907093048095703s
last 100 returns: 0.03200000066310167
update 1725/2000. Last update in 0.19980216026306152s
last 100 returns: 0.03100000064820051
update 1726/2000. Last update in 0.233639955

In [None]:
def copy_model_and_plot_learning_curve():
    import pickle
    import matplotlib.pyplot as plt
    from collections import deque
    import os
    import datetime
    import shutil
    
    datetime_stamp = datetime.datetime.now().strftime('%y%m%d_%H%M')
    plot_path = f'checkpoints/{datetime_stamp}'
    
    if not os.path.exists(plot_path):
        os.makedirs(plot_path)
    else:
        print(f'directory {plot_path} already exists')
        return
    
    shutil.copyfile(f'{brain_name}_scores.pickle', f'{plot_path}/scores.pickle')
    shutil.copyfile(f'{brain_name}_model_checkpoint.pickle', f'{plot_path}/model.pickle')

    with open(f'{plot_path}/scores.pickle', 'rb') as f:
        total_rewards = pickle.load(f)

    smoothed = []
    queue = deque([], maxlen=10)
    for r in total_rewards:
        queue.append(r)
        smoothed.append(sum(queue)/len(queue))
    fig,ax = plt.subplots()
    ax.plot(smoothed)
    ax.set_xlabel('total episodes (across all agents)')
    plt.savefig(f'{plot_path}/learning_curve.png')
    plt.show()
copy_model_and_plot_learning_curve()