Skip to content

Commit

Permalink
Fix JS
Browse files Browse the repository at this point in the history
  • Loading branch information
erdnaxe committed Jul 17, 2020
1 parent 8440583 commit a0426fe
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 11 deletions.
20 changes: 17 additions & 3 deletions docs/gym_environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The sum of all these rewards when the episode ends is **the return** of this epi

The **OpenAI Gym** library defines an interface to reinforcement learning
environments, making them easier to share and use.
Gym also provides a large collection of environments to benchmark different learning algorithms[^OpenAIGym].
Gym also provides a large collection of environments to benchmark different learning algorithms \[[Brockman et al., 2016](references.md#brockman2016openai)].

A Gym environment is a Python class implementing a set of methods:

Expand Down Expand Up @@ -80,6 +80,10 @@ to test each environment.

![Demo script in action](img/env_demo.jpg)

!!! Warning "Not tested"

`gym_kraby:HexapodBulletEnv-v0` and `gym_kraby:HexapodRealEnv-v0` are quite similar to the "OneLeg" variant but they were not tested in learning.

## Using Kraby Gym environments

You may use these environments as any other OpenAI Gym environment.
Expand All @@ -92,13 +96,23 @@ import gym
env = gym.make('gym_kraby:HexapodBulletEnv-v0', render=True)
observation = env.reset()
score_return = 0
done = False

for _ in range(10000):
while not done:
a = env.action_space.sample() # take a random action
observation, reward, done, _ = env.step(a) # step
score_return += reward

print("Environment episode is done, your total return was", score_return)
env.close()
```

[^OpenAIGym]: "[Gym](http://gym.openai.com/docs/)." OpenAI documentation.
As environments with only one leg cannot end prematuraly (and `done` will always returned true),
you may use a time limit wrapper:

```python
from gym.wrappers import TimeLimit

env_no_time_limit = gym.make('gym_kraby:HexapodBulletEnv-v0', render=True)
env = TimeLimit(env_no_time_limit, max_steps)
```
8 changes: 2 additions & 6 deletions docs/implementations_ppo.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ and optimization (GPU).
## Proximal Policy Optimization

Proximal Policy Optimization is a policy gradient method for reinforcement
learning developed by OpenAI[^PPO_OpenAI].
learning developed by OpenAI [[Schulman et al., 2017](references.md#schulman2017ppo)].
The following video explains clearly how Proximal Policy Optimization works.

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/5P7I-xPq8u8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
Expand All @@ -61,7 +61,7 @@ The following implementations were tested:
- [StableBaselines3](https://github.com/DLR-RM/stable-baselines3) (PyTorch),
- [OpenAI SpinningUp](https://github.com/openai/spinningup) (PyTorch and Tensorflow 1).

OpenAI developed internally a new training system called OpenAI Rapid
OpenAI developed internally a new training system called [OpenAI Rapid](https://openai.com/blog/openai-five/#rapid)
implementing PPO at large scale. It can train a policy on large cloud
platform (such as Kubernetes) using CPU workers for rollout and eval and GPU
workers for optimization[^OpenAI_Rapid].
Expand Down Expand Up @@ -164,7 +164,3 @@ docker run -it -u $(id -u):$(id -g) --gpus all --ipc=host --rm \
```

Some notebooks are available in `kraby/notebooks/spinningup/`.

[^PPO_OpenAI]: "Proximal Policy Optimization." OpenAI Blog. <https://openai.com/blog/openai-baselines-ppo/>.

[^OpenAI_Rapid]: "Rapid, OpenAI Five." OpenAI Blog. <https://openai.com/blog/openai-five/#rapid>
11 changes: 11 additions & 0 deletions docs/js/extra.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// Render KaTeX
document.addEventListener("DOMContentLoaded", function (event) {
renderMathInElement(document.body, {
delimiters: [
{ left: "$$", right: "$$", display: true },
{ left: "$", right: "$", display: false },
{ left: "\\(", right: "\\)", display: false },
{ left: "\\[", right: "\\]", display: true }
]
});
});
6 changes: 4 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,15 @@ theme:
name: readthedocs

extra_css:
- https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.css
- css/extra.css

extra_javascript:
- https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-MML-AM_CHTML
- https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.js
- https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/contrib/auto-render.min.js
- js/extra.js

markdown_extensions:
- pymdownx.arithmatex
- attr_list
- smarty
- footnotes
Expand Down

0 comments on commit a0426fe

Please sign in to comment.