Skip to content

Commit

Permalink
Updated documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
sharif1093 committed Mar 12, 2020
1 parent 826e2c1 commit ccca5c2
Show file tree
Hide file tree
Showing 9 changed files with 172 additions and 45 deletions.
79 changes: 54 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,40 +8,69 @@

## Introduction

Developers who want to implement a new deep DeepRL algorithm, usually have to write a great amount of boilerplate code or alternatively use 3rd party packages which aim to provide the basics. However, understanding and modifying these 3rd party packages usually is not a trivial task, due to lack of either documentation or code readability/structure.
Digideep provides a framework for deep reinforcement learning research. Digideep's focus is on code **MODULARITY** and **REUSABILITY**.

Digideep tries to provide a well-documented complete pipeline for deep reinforcement learning problems, so that developers can jump directly to implementing their methods. Special attention has been paid to **decoupling** different components as well as making them **modular**.
**Specifications**:

In Digideep, [OpenAI's Gym](https://github.com/openai/gym) and [Deepmind's dm_control](https://github.com/deepmind/dm_control) co-exist and can be used with the same interface. Thanks to decoupled simulation and training parts, both [TensorFlow ](https://www.tensorflow.org/) and [PyTorch](https://github.com/pytorch/pytorch) can be used to train the agents (however, the example methods in this code repository are implemented using PyTorch).
* Compatible with [OpenAI Gym](https://github.com/openai/gym) and [Deepmind dm_control](https://github.com/deepmind/dm_control).
* Using PyTorch as Neural Network (TensorFlow can be used quite easily due to modularity.)
* Implementation of three RL methods: [DDPG](https://arxiv.org/abs/1509.02971), [SAC](https://arxiv.org/abs/1801.01290), and [PPO](https://arxiv.org/abs/1707.06347).

Currently, the following methods are implemented in Digideep:
See documentation at https://digideep.readthedocs.io/en/latest/.

* [DDPG](https://arxiv.org/abs/1509.02971) - Deep Deterministic Policy Gradient
* [SAC](https://arxiv.org/abs/1801.01290) - Soft Actor Critic
* [PPO](https://arxiv.org/abs/1707.06347) - Proximal Policy Optimization
## Usage

Digideep is written to be developer-friendly with self-descriptive codes and extensive documentation. It also provides
some debugging tools and guidelines for implementing new methods.
### Installation

## Features
Follow [instructions](https://digideep.readthedocs.io/en/latest/notes/01%20Installation.html).

* Developer-friendly code:
* The code is highly readable and fairly easy to understand and modify.
* Extensive documentation to support the above.
* Written for _modularity_ and code _decoupling_.
* Provides _debugging tools_ as an assistance for implementation new methods.
* Supports single-node multi-cpu multi-gpu architecture.
* Supports _dictionary observation/action spaces_ for neat communication with environments.
* Can be used with both `dm_control`/`gym` using the same interface:
* Uses `dm_control`'s native viewer for viewing.
* Provides batch environments for both `dm_control` and `gym`.
* Provides a session-as-a-module (SaaM) functionality to easily load saved sessions as a Python module for post-processing.
* Controls all parameters from a _single `parameter` file_ for transparency and easy control of all parameters from one place.
* Supports _(de-)serialization_ structurally.
### Running

## Documentation
* Start a training session based on a parameter file. Default parameter files are stored in `digideep/params`. Example:

```bash
# Running PPO on the 'PongNoFrameskip-v4' environment
python -m digideep.main --params digideep.params.atari_ppo
```

* Change a parameter in parameter file from comman-line:

```bash
# Starting PPO training on 'DMCBenchCheetahRun-v0', instead.
python -m digideep.main --params digideep.params.mujoco_ppo --cpanel '{"model_name":"DMCBenchCheetahRun-v0"}'
```

* Playing a trained policy from a checkpoint. Example:

```bash
python -m digideep.main --play --load-checkpoint "<path_to_checkpoint>"
```

* Visualizing an environment:

```bash
python -m digideep.environment.play --model "Pendulum-v0"
```

See [usage notes](https://digideep.readthedocs.io/en/latest/notes/02%20Usage.html) for more delicate usage information.


### Sample Results

Sample results of running SAC on the toy environment Pendulum-v0:

```bash
python -m digideep.main --params digideep.params.sac_params
```

<p align="center">
<img src="./doc/media/sac_pendulum_v0.gif" width="640">
</p>

<p align="center">
<img src="./doc/media/sac_pendulum_v0.svg" width="640">
</p>

Please visit https://digideep.readthedocs.io/en/latest/ for documentation.

## Changelog

Expand Down
4 changes: 2 additions & 2 deletions digideep/agent/ddpg/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,8 @@ def step(self):
monitor("/update/loss_actor", loss_actor.item())
monitor("/update/loss_critic", loss_critic.item())

self.session.writer.add_scalar('loss/actor', loss_actor.item())
self.session.writer.add_scalar('loss/critic', loss_critic.item())
self.session.writer.add_scalar('loss/actor', loss_actor.item(), self.state["i_step"])
self.session.writer.add_scalar('loss/critic', loss_critic.item(), self.state["i_step"])

self.state["i_step"] += 1

Expand Down
8 changes: 4 additions & 4 deletions digideep/agent/sac/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ def step(self):
value_loss = self.criterion["value"](expected_value, next_value.detach())

log_prob_target = expected_new_q_value - expected_value
# TODO: Apperantly the calculation of actor_loss is problematic: none of its ingredients have gradients! So backprop does nothing.
# TODO: Apparently the calculation of actor_loss is problematic: none of its ingredients have gradients! So backprop does nothing.
actor_loss = (log_prob * (log_prob - log_prob_target).detach()).mean()

mean_loss = float(self.params["methodargs"]["mean_lambda"]) * mean.pow(2).mean()
Expand All @@ -192,9 +192,9 @@ def step(self):
monitor("/update/loss/softq", softq_loss.item())
monitor("/update/loss/value", value_loss.item())

self.session.writer.add_scalar('loss/actor', actor_loss.item())
self.session.writer.add_scalar('loss/softq', softq_loss.item())
self.session.writer.add_scalar('loss/value', value_loss.item())
self.session.writer.add_scalar('loss/actor', actor_loss.item(), self.state["i_step"])
self.session.writer.add_scalar('loss/softq', softq_loss.item(), self.state["i_step"])
self.session.writer.add_scalar('loss/value', value_loss.item(), self.state["i_step"])

# for key,item in locals().items():
# if isinstance(item, torch.Tensor):
Expand Down
2 changes: 1 addition & 1 deletion digideep/environment/explorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def report_rewards(self, infos):
self.monitor_n_episode()

monitor("/reward/"+self.params["mode"]+"/episodic", rew, window=self.params["win_size"])
self.session.writer.add_scalar('reward/'+self.params["mode"], rew)
self.session.writer.add_scalar('reward/'+self.params["mode"], rew, self.state["n_episode"])

def close(self):
"""It closes all environments.
Expand Down
2 changes: 1 addition & 1 deletion digideep/params/classic_ddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
#####################
### Agents Parameters
cpanel["agent_type"] = "digideep.agent.ddpg.Agent"
cpanel["lr_actor"] = 0.001 # 0.0001
cpanel["lr_actor"] = 0.0001 # 0.0001
cpanel["lr_critic"] = 0.001 # 0.001
cpanel["eps"] = 1e-5 # Epsilon parameter used in the optimizer(s) (ADAM/RMSProp/...)

Expand Down
Binary file added doc/media/sac_pendulum_v0.gif
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/media/sac_pendulum_v0.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 32 additions & 12 deletions doc/notes/01 Installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,31 @@
Installation
============

Prerequisites
-------------
Requirements
------------

* Python 3
* `PyTorch <https://pytorch.org/>`_
* [OPTIONAL] `Tensorboard <https://github.com/facebookresearch/visdom>`_.
* `MuJoCo <https://www.roboti.us/index.html>`_ ``v200``.

.. code-block:: bash
* Install `PyTorch <https://pytorch.org/>`_ and `Visdom <https://github.com/facebookresearch/visdom>`_.
* Install `MuJoCo <https://www.roboti.us/index.html>`_ ``v150`` and ``v200``.
* Install `mujoco_py <https://github.com/openai/mujoco-py>`_ and `Gym <https://github.com/openai/gym>`_.
* Install `dm_control <https://github.com/deepmind/dm_control>`_.
pip install --user --upgrade tb-nightly
* `mujoco_py <https://github.com/openai/mujoco-py>`_ and `Gym <https://github.com/openai/gym>`_.
* `dm_control <https://github.com/deepmind/dm_control>`_.

.. note::
If you are a student, you can use the free Student License for MuJoCo.
If you are a student, you can get a free student license for MuJoCo.

Installation
------------

Simply download the package using the following command and add it to your ``PYTHONPATH``:


.. code-block:: bash
cd
git clone https://github.com/sharif1093/digideep.git
cd digideep
pip install -e .
Expand All @@ -33,7 +39,21 @@ Add the following to your ``.bashrc`` or ``.zshrc``:

.. code-block:: bash
# Assuming that you have installed mujoco in '$HOME/.mujoco'
export LD_LIBRARY_PATH=$HOME/.mujoco/mjpro150/bin:$LD_LIBRARY_PATH
export MUJOCO_GL=glfw
# Assuming you have installed mujoco in '$HOME/.mujoco'
export LD_LIBRARY_PATH=$HOME/.mujoco/mujoco200_linux/bin:$LD_LIBRARY_PATH
export MUJOCO_GL=glfw
.. _FixGLFW:

Patch ``dm_control`` initialization issue
-----------------------------------------

If you hit an error regarding GLFW initialization, try the following patch:

Go to the ``digideep`` installation path and run:

.. code-block:: python
cd <digideep_path>
cp patch/glfw_renderer.py `pip show dm_control | grep -Po 'Location: (\K.*)'`/dm_control/_render
77 changes: 77 additions & 0 deletions patch/glfw_renderer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Copyright 2017 The dm_control Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================

"""An OpenGL renderer backed by GLFW."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import sys
from dm_control._render import base
from dm_control._render import executor
import six

# Re-raise any exceptions that occur during module import as `ImportError`s.
# This simplifies the conditional imports in `render/__init__.py`.
try:
import glfw # pylint: disable=g-import-not-at-top
except (ImportError, IOError, OSError) as exc:
_, exc, tb = sys.exc_info()
six.reraise(ImportError, ImportError(str(exc)), tb)


class GLFWContext(base.ContextBase):
"""An OpenGL context backed by GLFW."""

def __init__(self, max_width, max_height):
# GLFWContext always uses `PassthroughRenderExecutor` rather than offloading
# rendering calls to a separate thread because GLFW can only be safely used
# from the main thread.
super(GLFWContext, self).__init__(max_width, max_height,
executor.PassthroughRenderExecutor)

def _platform_init(self, max_width, max_height):
"""Initializes this context.
Args:
max_width: Integer specifying the maximum framebuffer width in pixels.
max_height: Integer specifying the maximum framebuffer height in pixels.
"""
try:
glfw.init()
except glfw.GLFWError as exc:
_, exc, tb = sys.exc_info()
six.reraise(ImportError, ImportError(str(exc)), tb)
glfw.window_hint(glfw.VISIBLE, 0)
glfw.window_hint(glfw.DOUBLEBUFFER, 0)
self._context = glfw.create_window(width=max_width, height=max_height,
title='Invisible window', monitor=None,
share=None)
# This reference prevents `glfw.destroy_window` from being garbage-collected
# before the last window is destroyed, otherwise we may get
# `AttributeError`s when the `__del__` method is later called.
self._destroy_window = glfw.destroy_window

def _platform_make_current(self):
glfw.make_context_current(self._context)

def _platform_free(self):
"""Frees resources associated with this context."""
if self._context:
if glfw.get_current_context() == self._context:
glfw.make_context_current(None)
self._destroy_window(self._context)
self._context = None

0 comments on commit ccca5c2

Please sign in to comment.