Updated documentation.

sharif1093 · Mar 12, 2020 · ccca5c2 · ccca5c2
1 parent 826e2c1
commit ccca5c2
Show file tree

Hide file tree

Showing 9 changed files with 172 additions and 45 deletions.
diff --git a/README.md b/README.md
@@ -8,40 +8,69 @@
 
 ## Introduction
 
-Developers who want to implement a new deep DeepRL algorithm, usually have to write a great amount of boilerplate code or alternatively use 3rd party packages which aim to provide the basics. However, understanding and modifying these 3rd party packages usually is not a trivial task, due to lack of either documentation or code readability/structure.
+Digideep provides a framework for deep reinforcement learning research. Digideep's focus is on code **MODULARITY** and **REUSABILITY**.
 
-Digideep tries to provide a well-documented complete pipeline for deep reinforcement learning problems, so that developers can jump directly to implementing their methods. Special attention has been paid to **decoupling** different components as well as making them **modular**.
+**Specifications**:
 
-In Digideep, [OpenAI's Gym](https://github.com/openai/gym) and [Deepmind's dm_control](https://github.com/deepmind/dm_control) co-exist and can be used with the same interface. Thanks to decoupled simulation and training parts, both [TensorFlow ](https://www.tensorflow.org/) and [PyTorch](https://github.com/pytorch/pytorch) can be used to train the agents (however, the example methods in this code repository are implemented using PyTorch).
+* Compatible with [OpenAI Gym](https://github.com/openai/gym) and [Deepmind dm_control](https://github.com/deepmind/dm_control).
+* Using PyTorch as Neural Network (TensorFlow can be used quite easily due to modularity.)
+* Implementation of three RL methods: [DDPG](https://arxiv.org/abs/1509.02971), [SAC](https://arxiv.org/abs/1801.01290), and [PPO](https://arxiv.org/abs/1707.06347).
 
-Currently, the following methods are implemented in Digideep:
+See documentation at https://digideep.readthedocs.io/en/latest/.
 
-* [DDPG](https://arxiv.org/abs/1509.02971) - Deep Deterministic Policy Gradient
-* [SAC](https://arxiv.org/abs/1801.01290) - Soft Actor Critic
-* [PPO](https://arxiv.org/abs/1707.06347) - Proximal Policy Optimization
+## Usage
 
-Digideep is written to be developer-friendly with self-descriptive codes and extensive documentation. It also provides
-some debugging tools and guidelines for implementing new methods.
+### Installation
 
-## Features
+Follow [instructions](https://digideep.readthedocs.io/en/latest/notes/01%20Installation.html).
 
-* Developer-friendly code:
-  * The code is highly readable and fairly easy to understand and modify.
-  * Extensive documentation to support the above.
-  * Written for _modularity_ and code _decoupling_.
-  * Provides _debugging tools_ as an assistance for implementation new methods.
-* Supports single-node multi-cpu multi-gpu architecture.
-* Supports _dictionary observation/action spaces_ for neat communication with environments.
-* Can be used with both `dm_control`/`gym` using the same interface:
-  * Uses `dm_control`'s native viewer for viewing.
-  * Provides batch environments for both `dm_control` and `gym`.
-* Provides a session-as-a-module (SaaM) functionality to easily load saved sessions as a Python module for post-processing.
-* Controls all parameters from a _single `parameter` file_ for transparency and easy control of all parameters from one place.
-* Supports _(de-)serialization_ structurally.
+### Running
 
-## Documentation
+* Start a training session based on a parameter file. Default parameter files are stored in `digideep/params`. Example:
+
+```bash
+# Running PPO on the 'PongNoFrameskip-v4' environment
+python -m digideep.main --params digideep.params.atari_ppo
+```
+
+* Change a parameter in parameter file from comman-line:
+
+```bash
+# Starting PPO training on 'DMCBenchCheetahRun-v0', instead.
+python -m digideep.main --params digideep.params.mujoco_ppo --cpanel '{"model_name":"DMCBenchCheetahRun-v0"}'
+```
+
+* Playing a trained policy from a checkpoint. Example:
+
+```bash
+python -m digideep.main --play --load-checkpoint "<path_to_checkpoint>"
+```
+
+* Visualizing an environment:
+
+```bash
+python -m digideep.environment.play --model "Pendulum-v0"
+```
+
+See [usage notes](https://digideep.readthedocs.io/en/latest/notes/02%20Usage.html) for more delicate usage information.
+
+
+### Sample Results
+
+Sample results of running SAC on the toy environment Pendulum-v0:
+
+```bash
+python -m digideep.main --params digideep.params.sac_params
+```
+
+<p align="center">
+  <img src="./doc/media/sac_pendulum_v0.gif" width="640">
+</p>
+
+<p align="center">
+  <img src="./doc/media/sac_pendulum_v0.svg" width="640">
+</p>
 
-Please visit https://digideep.readthedocs.io/en/latest/ for documentation.
 
 ## Changelog
 

diff --git a/digideep/agent/ddpg/agent.py b/digideep/agent/ddpg/agent.py
@@ -187,8 +187,8 @@ def step(self):
         monitor("/update/loss_actor", loss_actor.item())
         monitor("/update/loss_critic", loss_critic.item())
 
-        self.session.writer.add_scalar('loss/actor', loss_actor.item())
-        self.session.writer.add_scalar('loss/critic', loss_critic.item())
+        self.session.writer.add_scalar('loss/actor', loss_actor.item(), self.state["i_step"])
+        self.session.writer.add_scalar('loss/critic', loss_critic.item(), self.state["i_step"])
 
         self.state["i_step"] += 1
 

diff --git a/digideep/agent/sac/agent.py b/digideep/agent/sac/agent.py
@@ -165,7 +165,7 @@ def step(self):
             value_loss = self.criterion["value"](expected_value, next_value.detach())
 
             log_prob_target = expected_new_q_value - expected_value
-            # TODO: Apperantly the calculation of actor_loss is problematic: none of its ingredients have gradients! So backprop does nothing.
+            # TODO: Apparently the calculation of actor_loss is problematic: none of its ingredients have gradients! So backprop does nothing.
             actor_loss = (log_prob * (log_prob - log_prob_target).detach()).mean()
 
             mean_loss = float(self.params["methodargs"]["mean_lambda"]) * mean.pow(2).mean()
@@ -192,9 +192,9 @@ def step(self):
         monitor("/update/loss/softq", softq_loss.item())
         monitor("/update/loss/value", value_loss.item())
 
-        self.session.writer.add_scalar('loss/actor', actor_loss.item())
-        self.session.writer.add_scalar('loss/softq', softq_loss.item())
-        self.session.writer.add_scalar('loss/value', value_loss.item())
+        self.session.writer.add_scalar('loss/actor', actor_loss.item(), self.state["i_step"])
+        self.session.writer.add_scalar('loss/softq', softq_loss.item(), self.state["i_step"])
+        self.session.writer.add_scalar('loss/value', value_loss.item(), self.state["i_step"])
 
         # for key,item in locals().items():
         #     if isinstance(item, torch.Tensor):

diff --git a/digideep/environment/explorer.py b/digideep/environment/explorer.py
@@ -127,7 +127,7 @@ def report_rewards(self, infos):
                     self.monitor_n_episode()
 
                     monitor("/reward/"+self.params["mode"]+"/episodic", rew, window=self.params["win_size"])
-                    self.session.writer.add_scalar('reward/'+self.params["mode"], rew)
+                    self.session.writer.add_scalar('reward/'+self.params["mode"], rew, self.state["n_episode"])
 
     def close(self):
         """It closes all environments.

diff --git a/digideep/params/classic_ddpg.py b/digideep/params/classic_ddpg.py
@@ -83,7 +83,7 @@
 #####################
 ### Agents Parameters
 cpanel["agent_type"] = "digideep.agent.ddpg.Agent"
-cpanel["lr_actor"] = 0.001  # 0.0001
+cpanel["lr_actor"] = 0.0001 # 0.0001
 cpanel["lr_critic"] = 0.001 # 0.001
 cpanel["eps"] = 1e-5 # Epsilon parameter used in the optimizer(s) (ADAM/RMSProp/...)
 

diff --git a/doc/media/sac_pendulum_v0.gif b/doc/media/sac_pendulum_v0.gif
diff --git a/doc/media/sac_pendulum_v0.svg b/doc/media/sac_pendulum_v0.svg
diff --git a/doc/notes/01 Installation.rst b/doc/notes/01 Installation.rst
@@ -2,25 +2,31 @@
 Installation
 ============
 
-Prerequisites
--------------
+Requirements
+------------
+
+* Python 3
+* `PyTorch <https://pytorch.org/>`_ 
+* [OPTIONAL] `Tensorboard <https://github.com/facebookresearch/visdom>`_.
+* `MuJoCo <https://www.roboti.us/index.html>`_ ``v200``.
+
+.. code-block:: bash
 
-* Install `PyTorch <https://pytorch.org/>`_ and `Visdom <https://github.com/facebookresearch/visdom>`_.
-* Install `MuJoCo <https://www.roboti.us/index.html>`_ ``v150`` and ``v200``.
-* Install `mujoco_py <https://github.com/openai/mujoco-py>`_ and `Gym <https://github.com/openai/gym>`_.
-* Install `dm_control <https://github.com/deepmind/dm_control>`_.
+    pip install --user --upgrade tb-nightly
+
+* `mujoco_py <https://github.com/openai/mujoco-py>`_ and `Gym <https://github.com/openai/gym>`_.
+* `dm_control <https://github.com/deepmind/dm_control>`_.
 
 .. note::
-    If you are a student, you can use the free Student License for MuJoCo.
+    If you are a student, you can get a free student license for MuJoCo.
 
 Installation
 ------------
 
 Simply download the package using the following command and add it to your ``PYTHONPATH``:
 
-
 .. code-block:: bash
-
+    cd
     git clone https://github.com/sharif1093/digideep.git
     cd digideep
     pip install -e .
@@ -33,7 +39,21 @@ Add the following to your ``.bashrc`` or ``.zshrc``:
 
 .. code-block:: bash
 
-    # Assuming that you have installed mujoco in '$HOME/.mujoco'
-    export LD_LIBRARY_PATH=$HOME/.mujoco/mjpro150/bin:$LD_LIBRARY_PATH
-    export MUJOCO_GL=glfw  
+    # Assuming you have installed mujoco in '$HOME/.mujoco'
+    export LD_LIBRARY_PATH=$HOME/.mujoco/mujoco200_linux/bin:$LD_LIBRARY_PATH
+    export MUJOCO_GL=glfw
+
+
+.. _FixGLFW:
+
+Patch ``dm_control`` initialization issue
+-----------------------------------------
+
+If you hit an error regarding GLFW initialization, try the following patch: 
+
+Go to the ``digideep`` installation path and run:
+
+.. code-block:: python
 
+    cd <digideep_path>
+    cp patch/glfw_renderer.py `pip show dm_control | grep -Po 'Location: (\K.*)'`/dm_control/_render
diff --git a/patch/glfw_renderer.py b/patch/glfw_renderer.py
@@ -0,0 +1,77 @@
+# Copyright 2017 The dm_control Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or  implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""An OpenGL renderer backed by GLFW."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import sys
+from dm_control._render import base
+from dm_control._render import executor
+import six
+
+# Re-raise any exceptions that occur during module import as `ImportError`s.
+# This simplifies the conditional imports in `render/__init__.py`.
+try:
+  import glfw  # pylint: disable=g-import-not-at-top
+except (ImportError, IOError, OSError) as exc:
+  _, exc, tb = sys.exc_info()
+  six.reraise(ImportError, ImportError(str(exc)), tb)
+
+
+class GLFWContext(base.ContextBase):
+  """An OpenGL context backed by GLFW."""
+
+  def __init__(self, max_width, max_height):
+    # GLFWContext always uses `PassthroughRenderExecutor` rather than offloading
+    # rendering calls to a separate thread because GLFW can only be safely used
+    # from the main thread.
+    super(GLFWContext, self).__init__(max_width, max_height,
+                                      executor.PassthroughRenderExecutor)
+
+  def _platform_init(self, max_width, max_height):
+    """Initializes this context.
+
+    Args:
+      max_width: Integer specifying the maximum framebuffer width in pixels.
+      max_height: Integer specifying the maximum framebuffer height in pixels.
+    """
+    try:
+      glfw.init()
+    except glfw.GLFWError as exc:
+      _, exc, tb = sys.exc_info()
+      six.reraise(ImportError, ImportError(str(exc)), tb)
+    glfw.window_hint(glfw.VISIBLE, 0)
+    glfw.window_hint(glfw.DOUBLEBUFFER, 0)
+    self._context = glfw.create_window(width=max_width, height=max_height,
+                                       title='Invisible window', monitor=None,
+                                       share=None)
+    # This reference prevents `glfw.destroy_window` from being garbage-collected
+    # before the last window is destroyed, otherwise we may get
+    # `AttributeError`s when the `__del__` method is later called.
+    self._destroy_window = glfw.destroy_window
+
+  def _platform_make_current(self):
+    glfw.make_context_current(self._context)
+
+  def _platform_free(self):
+    """Frees resources associated with this context."""
+    if self._context:
+      if glfw.get_current_context() == self._context:
+        glfw.make_context_current(None)
+      self._destroy_window(self._context)
+      self._context = None