Fix JS

erdnaxe · Jul 17, 2020 · a0426fe · a0426fe
1 parent 8440583
commit a0426fe
Show file tree

Hide file tree

Showing 4 changed files with 34 additions and 11 deletions.
diff --git a/docs/gym_environments.md b/docs/gym_environments.md
@@ -14,7 +14,7 @@ The sum of all these rewards when the episode ends is **the return** of this epi
 
 The **OpenAI Gym** library defines an interface to reinforcement learning
 environments, making them easier to share and use.
-Gym also provides a large collection of environments to benchmark different learning algorithms[^OpenAIGym].
+Gym also provides a large collection of environments to benchmark different learning algorithms \[[Brockman et al., 2016](references.md#brockman2016openai)].
 
 A Gym environment is a Python class implementing a set of methods:
 
@@ -80,6 +80,10 @@ to test each environment.
 
 ![Demo script in action](img/env_demo.jpg)
 
+!!! Warning "Not tested"
+
+    `gym_kraby:HexapodBulletEnv-v0` and `gym_kraby:HexapodRealEnv-v0` are quite similar to the "OneLeg" variant but they were not tested in learning.
+
 ## Using Kraby Gym environments
 
 You may use these environments as any other OpenAI Gym environment.
@@ -92,13 +96,23 @@ import gym
 env = gym.make('gym_kraby:HexapodBulletEnv-v0', render=True)
 observation = env.reset()
 score_return = 0
+done = False
 
-for _ in range(10000):
+while not done:
     a = env.action_space.sample()  # take a random action
     observation, reward, done, _ = env.step(a)  # step
     score_return += reward
 
+print("Environment episode is done, your total return was", score_return)
 env.close()
 ```
 
-[^OpenAIGym]: "[Gym](http://gym.openai.com/docs/)." OpenAI documentation.
+As environments with only one leg cannot end prematuraly (and `done` will always returned true),
+you may use a time limit wrapper:
+
+```python
+from gym.wrappers import TimeLimit
+
+env_no_time_limit = gym.make('gym_kraby:HexapodBulletEnv-v0', render=True)
+env = TimeLimit(env_no_time_limit, max_steps)
+```
diff --git a/docs/implementations_ppo.md b/docs/implementations_ppo.md
@@ -45,7 +45,7 @@ and optimization (GPU).
 ## Proximal Policy Optimization
 
 Proximal Policy Optimization is a policy gradient method for reinforcement
-learning developed by OpenAI[^PPO_OpenAI].
+learning developed by OpenAI [[Schulman et al., 2017](references.md#schulman2017ppo)].
 The following video explains clearly how Proximal Policy Optimization works.
 
 <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/5P7I-xPq8u8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
@@ -61,7 +61,7 @@ The following implementations were tested:
 -   [StableBaselines3](https://github.com/DLR-RM/stable-baselines3) (PyTorch),
 -   [OpenAI SpinningUp](https://github.com/openai/spinningup) (PyTorch and Tensorflow 1).
 
-OpenAI developed internally a new training system called OpenAI Rapid
+OpenAI developed internally a new training system called [OpenAI Rapid](https://openai.com/blog/openai-five/#rapid)
 implementing PPO at large scale. It can train a policy on large cloud
 platform (such as Kubernetes) using CPU workers for rollout and eval and GPU
 workers for optimization[^OpenAI_Rapid].
@@ -164,7 +164,3 @@ docker run -it -u $(id -u):$(id -g) --gpus all --ipc=host --rm \
 ```
 
 Some notebooks are available in `kraby/notebooks/spinningup/`.
-
-[^PPO_OpenAI]: "Proximal Policy Optimization." OpenAI Blog. <https://openai.com/blog/openai-baselines-ppo/>.
-
-[^OpenAI_Rapid]: "Rapid, OpenAI Five." OpenAI Blog. <https://openai.com/blog/openai-five/#rapid>
diff --git a/docs/js/extra.js b/docs/js/extra.js
@@ -0,0 +1,11 @@
+// Render KaTeX
+document.addEventListener("DOMContentLoaded", function (event) {
+    renderMathInElement(document.body, {
+        delimiters: [
+            { left: "$$", right: "$$", display: true },
+            { left: "$", right: "$", display: false },
+            { left: "\\(", right: "\\)", display: false },
+            { left: "\\[", right: "\\]", display: true }
+        ]
+    });
+});
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -27,13 +27,15 @@ theme:
   name: readthedocs
 
 extra_css:
+  - https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.css
   - css/extra.css
 
 extra_javascript:
-  - https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-MML-AM_CHTML
+  - https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.js
+  - https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/contrib/auto-render.min.js
+  - js/extra.js
 
 markdown_extensions:
-  - pymdownx.arithmatex
   - attr_list
   - smarty
   - footnotes