Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
906a47e
Init Branch
Oct 26, 2019
8d65bd8
Added:
Oct 26, 2019
fc6af3c
Added:
Oct 27, 2019
326328d
Added:
Oct 27, 2019
694d76e
Added:
Oct 28, 2019
2c6dd98
Added:
Oct 28, 2019
9e2336c
Fixed:
Oct 28, 2019
7f77303
Fixed:
Oct 28, 2019
2ec8bb7
Added:
Oct 28, 2019
55a7ed5
Added:
Oct 28, 2019
331da03
Added:
Oct 28, 2019
54cfd20
Fixed:
Oct 29, 2019
8400d9b
Added:
Oct 30, 2019
7a62beb
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Oct 30, 2019
ac88969
Fixed:
Oct 30, 2019
49f7ddb
Fixed:
Oct 30, 2019
55c78e9
Fixed:
Oct 31, 2019
15ca480
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Oct 31, 2019
0edddf9
Fixed:
Oct 31, 2019
78b1da2
Fixed:
Nov 1, 2019
df0d729
Fixed:
Nov 2, 2019
801f3b2
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Nov 2, 2019
5c1542c
Fixed:
Nov 2, 2019
d645716
More test fixing
Nov 3, 2019
3d3acff
More test fixing
Nov 3, 2019
5509344
Fixed:
Nov 8, 2019
d9582be
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Nov 8, 2019
1a3e56c
work
Nov 10, 2019
f91f381
work
Nov 12, 2019
1ce8118
work
Nov 13, 2019
e27ad9f
More test fixing
Nov 13, 2019
07ffd4f
More test fixing
Nov 13, 2019
030c147
More test fixing
Nov 13, 2019
fc19b53
Added:
Nov 14, 2019
ca40a05
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Nov 14, 2019
5c61af7
Added:
Nov 14, 2019
270a586
Added:
Nov 14, 2019
649528c
Added:
Nov 16, 2019
2313c65
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Nov 16, 2019
9ff1c40
Added:
Nov 18, 2019
797f8da
Fixed:
Nov 19, 2019
4307f3e
Fixed:
Nov 19, 2019
1946eca
Fixed:
Nov 20, 2019
9664cba
Fixed:
Nov 22, 2019
fb81513
Fixed:
Nov 30, 2019
65e814e
Added:
Nov 30, 2019
0124b98
Added:
Nov 30, 2019
7ae6ba4
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Nov 30, 2019
0a60464
Added:
Nov 30, 2019
af31ccd
Added:
Dec 3, 2019
56c46d7
Added:
Dec 3, 2019
f8cc97d
Added:
Dec 3, 2019
25d17a0
Added:
Dec 4, 2019
0e2fa01
Added:
Dec 4, 2019
fa23bd6
Added:
Dec 4, 2019
c02255f
Added:
Dec 4, 2019
2a50168
Added:
Dec 4, 2019
98cbc2c
Fixed:
Dec 4, 2019
03ec6f8
Fixed:
Dec 5, 2019
ab04399
Added:
Dec 7, 2019
4e8a300
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Dec 7, 2019
b78bee2
Added:
Dec 7, 2019
e1c9b64
Version 0.9.5 mass refactor (#12)
josiahls Dec 16, 2019
dd2d691
Updated Version
Dec 16, 2019
b1e8341
Changed:
Dec 16, 2019
01c58ba
Changed:
Dec 16, 2019
25a69b3
Removed:
Dec 16, 2019
d4dd34c
Removed:
Dec 16, 2019
69e7bad
Merge branch 'master' into version_0_9_0
josiahls Dec 16, 2019
2268aa3
Removed:
Dec 16, 2019
61b43f3
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Dec 16, 2019
c74be06
Removed:
Dec 16, 2019
b22eb5c
Removed:
Dec 16, 2019
893a333
Removed:
Dec 16, 2019
1352717
Removed:
Dec 16, 2019
bfae98e
Removed:
Dec 16, 2019
cfa95ff
Removed:
Dec 16, 2019
3c0ed95
Merge branch 'master' into version_0_9_0
josiahls Dec 16, 2019
3d99e65
Removed:
Dec 16, 2019
1d6f60d
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Dec 16, 2019
2b0b265
Removed:
Dec 16, 2019
afa2f4e
Added:
Dec 19, 2019
a8fc416
Added:
Dec 19, 2019
c3136a3
Added:
Dec 19, 2019
9cb5014
Added:
Dec 19, 2019
c33279b
Added:
Dec 19, 2019
4a14965
Added:
Dec 19, 2019
0902506
Version 0 9 5 mass refactor (#13)
josiahls Dec 21, 2019
11b210b
Added:
Dec 21, 2019
458dcf8
Added:
Dec 21, 2019
a280187
Added:
Dec 21, 2019
0e50fe6
Added:
Dec 21, 2019
e547ba4
Added:
Dec 21, 2019
7de393b
Added:
Dec 21, 2019
19f5be9
Added:
Dec 21, 2019
b3a6fb4
Added:
Dec 21, 2019
2f3cfd1
Added:
Dec 21, 2019
a84f5ad
Added:
Dec 21, 2019
5a747b5
Added:
Dec 21, 2019
151de34
Added:
Dec 21, 2019
83c85dc
Fixed:
Dec 21, 2019
19826f4
Merge remote-tracking branch 'origin/version_0_9_0' into version_0_9_0
Dec 21, 2019
8563e1c
Added:
Dec 21, 2019
412a831
Added:
Dec 21, 2019
9d5d722
Added:
Dec 21, 2019
4be21a6
Fixed:
Dec 21, 2019
e1dfb10
Fixed:
Dec 21, 2019
1f2f497
Updated:
Dec 22, 2019
3b3a683
Updated:
Dec 22, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ gen
.gitignore

# Jupyter Notebook
/fast_rl/notebooks/.ipynb_checkpoints/
*/.ipynb_checkpoints/*

# Data Files
/docs_src/data/*
#/docs_src/data/*

# Build Files / Directories
build/*
dist/*
fast_rl.egg-info/*
150 changes: 17 additions & 133 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,6 @@ However there are also frameworks in PyTorch most notably Facebook's Horizon:
- [Horizon](https://github.com/facebookresearch/Horizon)
- [DeepRL](https://github.com/ShangtongZhang/DeepRL)

Our motivation is that existing frameworks commonly use tensorflow, which nothing against tensorflow, but we have
accomplished more in shorter periods of time using PyTorch.

Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL.
The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing
new agents.
Expand Down Expand Up @@ -72,141 +69,28 @@ working at their best. Post 1.0.0 will be more formal feature development with C
**Critical**
Testable code:
```python
from fast_rl.agents.DQN import DQN
from fast_rl.core.basic_train import AgentLearner
from fast_rl.core.MarkovDecisionProcess import MDPDataBunch

data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
model = DQN(data)
learn = AgentLearner(data, model)
learn.fit(450)
```
Result:

| ![](res/pre_interpretation_maze_dqn.gif) |
|:---:|
| *Fig 1: We are now able to train an agent using some Fastai API* |


We believe that the agent explodes after the first episode. Not to worry! We will make a RL interpreter to see whats
going on!

- [X] 0.2.0 AgentInterpretation: First method will be heatmapping the image / state space of the
environment with the expected rewards for super important debugging. In the code above, we are testing with a maze for a
good reason. Heatmapping rewards over a maze is pretty easy as opposed to other environments.

Usage example:
```python
from fast_rl.agents.DQN import DQN
from fast_rl.core.Interpreter import AgentInterpretationAlpha
from fast_rl.core.basic_train import AgentLearner
from fast_rl.core.MarkovDecisionProcess import MDPDataBunch

data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')
model = DQN(data)
learn = AgentLearner(data, model)
learn.fit(10)

# Note that the Interpretation is broken, will be fixed with documentation in 0.9
interp = AgentInterpretationAlpha(learn)
interp.plot_heatmapped_episode(5)
```

| ![](res/heat_map_1.png) |
|:---:|
| *Fig 2: Cumulative rewards calculated over states during episode 0* |
| ![](res/heat_map_2.png) |
| *Fig 3: Episode 7* |
| ![](res/heat_map_3.png) |
| *Fig 4: Unimportant parts are excluded via reward penalization* |
| ![](res/heat_map_4.png) |
| *Fig 5: Finally, state space is fully explored, and the highest rewards are near the goal state* |

If we change:
```python
interp = AgentInterpretationAlpha(learn)
interp.plot_heatmapped_episode(epoch)
```
to:
```python
interp = AgentInterpretationAlpha(learn)
interp.plot_episode(epoch)
```
We can get the following plots for specific episodes:

| ![](res/reward_plot_1.png) |
|:----:|
| *Fig 6: Rewards estimated by the agent during episode 0* |

As determined by our AgentInterpretation object, we need to either debug or improve our agent.
We will do this in parallel with creating our Learner fit function.

- [X] 0.3.0 Add DQNs: DQN, Dueling DQN, Double DQN, Fixed Target DQN, DDDQN.
- [X] 0.4.0 Learner Basic: We need to convert this into a suitable object. Will be similar to the basic fasai learner
hopefully. Possibly as add prioritize replay?
- Added PER.
- [X] 0.5.0 DDPG Agent: We need to have at least one agent able to perform continuous environment execution. As a note, we
could give discrete agents the ability to operate in a continuous domain via binning.
- [X] 0.5.0 DDPG added. let us move
- [X] 0.5.0 The DDPG paper contains a visualization for Q learning might prove useful. Add to interpreter.

| ![](res/ddpg_balancing.gif) |
|:----:|
| *Fig 7: DDPG trains stably now..* |


Added q value interpretation per explanation by Lillicrap et al., 2016. Currently both models (DQN and DDPG) have
unstable q value approximations. Below is an example from DQN.
```python
interp = AgentInterpretationAlpha(learn, ds_type=DatasetType.Train)
interp.plot_q_density(epoch)
```
Can be referenced in `fast_rl/tests/test_interpretation` for usage. A good agent will have mostly a diagonal line,
a failing one will look globular or horizontal.

| ![](res/dqn_q_estimate_1.jpg) |
|:----:|
| *Fig 8: Initial Q Value Estimate. Seems globular which is expected for an initial model.* |

| ![](res/dqn_q_estimate_2.jpg) |
|:----:|
| *Fig 9: Seems like the DQN is not learning...* |

| ![](res/dqn_q_estimate_3.jpg) |
|:----:|
| *Fig 10: Alarming later epoch results. It seems that the DQN converges to predicting a single Q value.* |

- [X] 0.6.0 Single Global fit function like Fastai's. Think about the missing batch step. Noted some of the changes to
the existing the Fastai

| ![](res/fit_func_out.jpg) |
|:----:|
| *Fig 11: Resulting output of a typical fit function using ref code below.* |

```python
from fast_rl.agents.DQN import DuelingDQN
from fast_rl.core.Learner import AgentLearner
from fast_rl.core.MarkovDecisionProcess import MDPDataBunch


data = MDPDataBunch.from_env('maze-random-5x5-v0', render='human', max_steps=1000)
model = DuelingDQN(data)
# model = DQN(data)
learn = AgentLearner(data, model)

learn.fit(5)
from fast_rl.agents.dqn import *
from fast_rl.agents.dqn_models import *
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
from fast_rl.core.data_block import MDPDataBunch
from fast_rl.core.metrics import *

data = MDPDataBunch.from_env('CartPole-v1', render='rgb_array', bs=32, add_valid=False)
model = create_dqn_model(data, FixedTargetDQNModule, opt=torch.optim.RMSprop, lr=0.00025)
memory = ExperienceReplay(memory_size=1000, reduce_ram=True)
exploration_method = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
learner = dqn_learner(data=data, model=model, memory=memory, exploration_method=exploration_method)
learner.fit(10)
```

reset commit

- [X] 0.7.0 Full test suite using multi-processing. Connect to CI.
- [X] 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
- [ ] **Working on** 0.9.0 Notebook demonstrations of basic model usage.
- [ ] **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At
- [X] 0.9.0 Notebook demonstrations of basic model usage.
- [ ] **Working on** **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At
this point, all models should have guaranteed environments they should succeed in.
- [ ] 1.2.0 Add PyBullet Fetch Environments
- [ ] 1.2.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
- [ ] 1.2.0 Add HER
- [ ] 1.8.0 Add PyBullet Fetch Environments
- [ ] 1.8.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
- [ ] 1.8.0 Add HER


## Code
Expand Down
73 changes: 36 additions & 37 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,43 +3,42 @@
# Add steps that build, run tests, deploy, and more:
# https://aka.ms/yaml

# - bash: "sudo apt-get install -y xvfb freeglut3-dev python-opengl --fix-missing"
# displayName: 'Install ffmpeg, freeglut3-dev, and xvfb'

trigger:
- master

pool:
vmImage: 'ubuntu-18.04'

steps:

#- bash: "sudo apt-get install -y ffmpeg xvfb freeglut3-dev python-opengl"
# displayName: 'Install ffmpeg, freeglut3-dev, and xvfb'

- task: UsePythonVersion@0
inputs:
versionSpec: '3.7'

# - script: sh ./build/azure_pipeline_helper.sh
# displayName: 'Complex Installs'

- script: |
# pip install Bottleneck
# python setup.py install
pip install pytest
pip install pytest-cov
displayName: 'Install Python Packages'

- script: |
xvfb-run -s "-screen 0 1400x900x24" pytest tests --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
displayName: 'Test with pytest'

- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testResultsFiles: '**/test-*.xml'
testRunTitle: 'Publish test results for Python $(python.version)'

- task: PublishCodeCoverageResults@1
inputs:
codeCoverageTool: Cobertura
summaryFileLocation: '$(System.DefaultWorkingDirectory)/**/coverage.xml'
reportDirectory: '$(System.DefaultWorkingDirectory)/**/htmlcov'
jobs:
- job: 'Test'
pool:
vmImage: 'ubuntu-16.04' # other options: 'macOS-10.13', 'vs2017-win2016'
strategy:
matrix:
Python36:
python.version: '3.6'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(python.version)'

- bash: "sudo apt-get install -y freeglut3-dev python-opengl"
displayName: 'Install freeglut3-dev'

- script: |
python -m pip install --upgrade pip setuptools wheel pytest pytest-cov -e .
python setup.py install
displayName: 'Install dependencies'

- script: sh ./build/azure_pipeline_helper.sh
displayName: 'Complex Installs'

- script: |
xvfb-run -s "-screen 0 1400x900x24" py.test tests --cov fast_rl --cov-report html --doctest-modules --junitxml=junit/test-results.xml --cov=./ --cov-report=xml --cov-report=html
displayName: 'Test with pytest'

- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testResultsFiles: '**/test-*.xml'
testRunTitle: 'Publish test results for Python $(python.version)'
20 changes: 10 additions & 10 deletions build/azure_pipeline_helper.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
#!/usr/bin/env bash

# Install pybullet
git clone https://github.com/benelot/pybullet-gym.git
cd pybullet-gym
pip install -e .
cd ../
## Install pybullet
#git clone https://github.com/benelot/pybullet-gym.git
#cd pybullet-gym
#pip install -e .
#cd ../

# Install gym_maze
git clone https://github.com/MattChanTK/gym-maze.git
cd gym-maze
python setup.py install
cd ../
## Install gym_maze
#git clone https://github.com/MattChanTK/gym-maze.git
#cd gym-maze
#python setup.py install
#cd ../

Binary file added docs_src/data/cartpole_dddqn/dddqn_er_rms.pickle
Binary file not shown.
Binary file added docs_src/data/cartpole_dddqn/dddqn_per_rms.pickle
Binary file not shown.
Binary file added docs_src/data/cartpole_ddqn/ddqn_er_rms.pickle
Binary file not shown.
Binary file added docs_src/data/cartpole_ddqn/ddqn_per_rms.pickle
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
636 changes: 636 additions & 0 deletions docs_src/rl.agents.dddqn.ipynb

Large diffs are not rendered by default.

Loading