diff --git a/README_old.md b/README_old.md
deleted file mode 100755
index 4aaf166..0000000
--- a/README_old.md
+++ /dev/null
@@ -1,118 +0,0 @@
-# Panda Grasp Simulator
-
-Grasp simulator for the Panda Emika Franka robot that can be used to test and train DL/RL based grasping algorithms.
-
-## Dependencies
-
-- [ROS Melodic - Desktop full](https://wiki.ros.org/melodic/Installation/Ubuntu)
-- [Franka_ros (Build from source)](https://frankaemika.github.io/docs/installation_linux.html)
-- Several system dependencies
-
-### System dependencies
-
-The system dependencies can be installed using the following command:
-
-```bash
-sudo apt-get install ros-melodic-moveit-ros-move-group ros-melodic-controller-manager* ros-melodic-moveit* ros-melodic-effort-controllers ros-melodic-joint-trajectory-controller ros-melodic-gazebo-ros* ros-melodic-rviz* libboost-filesystem-dev libjsoncpp-dev python3-pycryptodome python3-gnupg python3-tk python-future python3-pyqt5 python3-sip python3-sip-dev python3-empy
-```
-
-## How to build
-
-Since ROS does not yet fully support python3 (see #17), we need to separate the training script (python3) and the ROS gazebo simulation (python2). To do this first create a virtual environment:
-
-```bash
-sudo apt install virtualenv
-virtualenv ~/.catkin_ws_python3/openai_venv --python=python3
-```
-
-This virtual environment can the be activated using the `source ~/.catkin_ws_python3/openai_venv/bin/activate` command. After the environment is activate you you have to install the following python packages in it:
-
-```bash
-pip install tensorflow-gpu
-pip install gym
-pip install pyyaml
-pip install netifaces
-pip install rospkg
-```
-
-After this is done you can then compile the [geometry2](https://github.com/ros/geometry2), [ros_comm](https://github.com/ros/ros_comm) and [geometry_msgs](https://github.com/ros/common_msgs) for python3. To do so first install some prerequisites to use Python3 with ROS:
-
-```bash
-sudo apt update
-sudo apt install python3-catkin-pkg-modules python3-rospkg-modules python3-pyqt5 python3-sip python3-sip-dev python3-empy python3-pycryptodome python3-gnupg
-```
-
-Then prepare catkin workspace:
-
-```bash
-cd ~/.catkin_ws_python3
-mkdir src
-ROS_PYTHON_VERSION=3
-wstool init src
-rosinstall_generator ros_comm common_msgs  geometry2 --rosdistro melodic --deps | wstool merge -t src -
-wstool update -t src -j8
-rosdep install --from-paths src --ignore-src -y -r
-```
-
-Finally compile for Python 3:
-
-```
-catkin build --cmake-args \
-            -DCMAKE_BUILD_TYPE=Release \
-            -DPYTHON_EXECUTABLE=/usr/bin/python3 \
-            -DPYTHON_INCLUDE_DIR=/usr/include/python3.6m \
-            -DPYTHON_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.6m.so
-```
-
-After this is done you can clone and build the `panda_openai_sim` package by running:
-
-```
-mkdir ~/panda_openai_sim_ws
-cd ~/panda_openai_sim_ws
-git clone --recursive https://github.com/rickstaa/panda_openai_sim.git src
-rosdep install --from-paths src --ignore-src --rosdistro melodic -y --skip-keys libfranka
-catkin build -j4 -DCMAKE_BUILD_TYPE=Release -DFranka_DIR:PATH=~/libfranka/build
-```
-
-## How to train
-
-1. Open two terminals or two panels in [tmux](https://github.com/tmux/tmux/wiki).
-2. Source the ROS setup.bash file.
-3. Source the panda_openai_sim setup.bash file `source ~/panda_openai_sim/devel/setup.bash`
-4. In one panel start the training simulation environment using the `roslaunch panda_openai_sim start.launch` or `roslaunch panda_openai_sim train_ddpg.launch`.
-5. In the other terminal/panel activate the virtual environment `source ~/.catkin_ws_python3/openai_venv/bin/activate`.
-6. Source the required python3 ROS packages `source ~/.catkin_ws_python3/devel/setup.bash`.
-7. Start the training `python3 ~/panda_openai_sim_ws/src/panda_openai_sim/scripts/stable_baselines_her_pandareach_train.py`
-
-## How to configure the algorithms
-
-The algorithm parameters can be found in the `./panda_openai_sim/cfg/algorithms` folder.
-
-## How to see the progress
-
-You can use [tensorboard](https://www.tensorflow.org/tensorboard/) to visualize the model training in progress:
-
-```bash
-cd ~/panda_openai_sim_ws/src
-tensorboard --logdir ./logs
-```
-
-The file name will be displayed when starting the training.
-
-## How to use the trained model
-
-You can use the trained model by running one of the inference scripts.
-
-**Example:**
-
-```
-cd ~/panda_openai_sim_ws/src/panda_openai_sim/scripts
-source ~/.catkin_ws_python3/openai_venv/bin/activate
-python stable_baselines_her_pandareach_inference.py model:=<MODEL_NAME>
-```
-
-## Troubleshooting
-
-### The subfolders are empty
-
-Likely the repository was cloned without the `--recurse-submodules` flag. Please use the `git submodule update --recursive` and `git submodule update --init --recursive` commands to fetch the submodules.
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 0000000..d1a4a41
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,52 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+SPHINXPROJ    = panda_openai_sim
+SOURCEDIR     = source
+BUILDDIR      = build
+GH_PAGES_SOURCES = docs panda_openai_sim nodes scripts
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Publish gh-pages command
+.PHONY: gh-pages
+gh-pages:
+	cd .. && \
+	git submodule update --recursive && \
+	git checkout gh-pages && \
+	git rm -rf . && git clean -fxd && \
+	git checkout melodic-devel $(GH_PAGES_SOURCES) && \
+	git reset HEAD && \
+	cd docs && \
+	make clean && \
+	make html && \
+	cd .. && \
+	git add docs/build/html && \
+	git clean -ffxd && \
+	mv -fv docs/build/html/* ./ && \
+	touch .nojekyll && \
+	rm -rf $(GH_PAGES_SOURCES) && \
+	git add -A && \
+	git commit -m ":pencil: Generated gh-pages for `git log melodic-devel -1 --pretty=short --abbrev-commit`" && \
+	git push origin gh-pages && \
+	git checkout melodic-devel && \
+	git submodule update --recursive
+
+# Create make clean command
+.PHONY: clean
+clean:
+	@rm -rf $(BUILDDIR)/*
+	@echo "Build folder cleaned"
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/make.bat b/docs/make.bat
new file mode 100644
index 0000000..9534b01
--- /dev/null
+++ b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/manuals/Franka_manual.pdf b/docs/manuals/Franka_manual.pdf
deleted file mode 100644
index ff79562..0000000
Binary files a/docs/manuals/Franka_manual.pdf and /dev/null differ
diff --git a/docs/source/_static/theme_overrides.css b/docs/source/_static/theme_overrides.css
new file mode 100644
index 0000000..8b115bf
--- /dev/null
+++ b/docs/source/_static/theme_overrides.css
@@ -0,0 +1,13 @@
+/* Css code that is used to overwrite parts of the RTD theme css*/
+
+/* override table width restrictions */
+@media screen and (min-width: 767px) {
+  .wy-table-responsive table td {
+    /* !important prevents the common CSS stylesheets from
+       overriding this as on RTD they are loaded after this stylesheet */
+    white-space: normal !important;
+  }
+  .wy-table-responsive {
+    overflow: visible !important;
+  }
+}
diff --git a/docs/source/api/_autosummary/morvan_ddpg_panda_train_and_inference.rst b/docs/source/api/_autosummary/morvan_ddpg_panda_train_and_inference.rst
new file mode 100644
index 0000000..e5c6262
--- /dev/null
+++ b/docs/source/api/_autosummary/morvan_ddpg_panda_train_and_inference.rst
@@ -0,0 +1,38 @@
+morvan\_ddpg\_panda\_train\_and\_inference
+==========================================
+
+.. automodule:: morvan_ddpg_panda_train_and_inference
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Functions
+
+   .. autosummary::
+   
+      eval
+      train
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      Actor
+      Critic
+      Memory
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_control_server.rst b/docs/source/api/_autosummary/panda_control_server.rst
new file mode 100644
index 0000000..bffeb4a
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_control_server.rst
@@ -0,0 +1,23 @@
+panda\_control\_server
+======================
+
+.. automodule:: panda_control_server
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_moveit_server.rst b/docs/source/api/_autosummary/panda_moveit_server.rst
new file mode 100644
index 0000000..ecf4383
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_moveit_server.rst
@@ -0,0 +1,23 @@
+panda\_moveit\_server
+=====================
+
+.. automodule:: panda_moveit_server
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.core.control_server.rst b/docs/source/api/_autosummary/panda_openai_sim.core.control_server.rst
new file mode 100644
index 0000000..00db16a
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.core.control_server.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.core.control\_server
+=======================================
+
+.. automodule:: panda_openai_sim.core.control_server
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaControlServer
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.core.control_switcher.rst b/docs/source/api/_autosummary/panda_openai_sim.core.control_switcher.rst
new file mode 100644
index 0000000..54b7c74
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.core.control_switcher.rst
@@ -0,0 +1,30 @@
+panda\_openai\_sim.core.control\_switcher
+=========================================
+
+.. automodule:: panda_openai_sim.core.control_switcher
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      ControllerSwitcherResponse
+      PandaControlSwitcher
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.core.group_publisher.rst b/docs/source/api/_autosummary/panda_openai_sim.core.group_publisher.rst
new file mode 100644
index 0000000..c87cddd
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.core.group_publisher.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.core.group\_publisher
+========================================
+
+.. automodule:: panda_openai_sim.core.group_publisher
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      GroupPublisher
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.core.moveit_server.rst b/docs/source/api/_autosummary/panda_openai_sim.core.moveit_server.rst
new file mode 100644
index 0000000..48289f0
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.core.moveit_server.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.core.moveit\_server
+======================================
+
+.. automodule:: panda_openai_sim.core.moveit_server
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaMoveitPlannerServer
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.core.rst b/docs/source/api/_autosummary/panda_openai_sim.core.rst
new file mode 100644
index 0000000..be25626
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.core.rst
@@ -0,0 +1,34 @@
+panda\_openai\_sim.core
+=======================
+
+.. automodule:: panda_openai_sim.core
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
+.. rubric:: Modules
+
+.. autosummary::
+   :toctree:
+   :recursive:
+
+   panda_openai_sim.core.control_server
+   panda_openai_sim.core.control_switcher
+   panda_openai_sim.core.group_publisher
+   panda_openai_sim.core.moveit_server
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.robot_envs.panda_robot_env.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.robot_envs.panda_robot_env.rst
new file mode 100644
index 0000000..f1e9ed1
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.robot_envs.panda_robot_env.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.envs.robot\_envs.panda\_robot\_env
+=====================================================
+
+.. automodule:: panda_openai_sim.envs.robot_envs.panda_robot_env
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaRobotEnv
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.robot_envs.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.robot_envs.rst
new file mode 100644
index 0000000..4d1cb05
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.robot_envs.rst
@@ -0,0 +1,31 @@
+panda\_openai\_sim.envs.robot\_envs
+===================================
+
+.. automodule:: panda_openai_sim.envs.robot_envs
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
+.. rubric:: Modules
+
+.. autosummary::
+   :toctree:
+   :recursive:
+
+   panda_openai_sim.envs.robot_envs.panda_robot_env
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.robot_gazebo_goal_env.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.robot_gazebo_goal_env.rst
new file mode 100644
index 0000000..046933b
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.robot_gazebo_goal_env.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.envs.robot\_gazebo\_goal\_env
+================================================
+
+.. automodule:: panda_openai_sim.envs.robot_gazebo_goal_env
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      RobotGazeboGoalEnv
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.rst
new file mode 100644
index 0000000..d6dbb9e
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.rst
@@ -0,0 +1,33 @@
+panda\_openai\_sim.envs
+=======================
+
+.. automodule:: panda_openai_sim.envs
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
+.. rubric:: Modules
+
+.. autosummary::
+   :toctree:
+   :recursive:
+
+   panda_openai_sim.envs.robot_envs
+   panda_openai_sim.envs.robot_gazebo_goal_env
+   panda_openai_sim.envs.task_envs
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_pick_and_place_env.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_pick_and_place_env.rst
new file mode 100644
index 0000000..008822d
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_pick_and_place_env.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.envs.task\_envs.panda\_pick\_and\_place\_env
+===============================================================
+
+.. automodule:: panda_openai_sim.envs.task_envs.panda_pick_and_place_env
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaPickAndPlaceEnv
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_push_env.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_push_env.rst
new file mode 100644
index 0000000..99f6828
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_push_env.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.envs.task\_envs.panda\_push\_env
+===================================================
+
+.. automodule:: panda_openai_sim.envs.task_envs.panda_push_env
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaPushEnv
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_reach_env.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_reach_env.rst
new file mode 100644
index 0000000..fef6608
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_reach_env.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.envs.task\_envs.panda\_reach\_env
+====================================================
+
+.. automodule:: panda_openai_sim.envs.task_envs.panda_reach_env
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaReachEnv
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_slide_env.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_slide_env.rst
new file mode 100644
index 0000000..847c958
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_slide_env.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.envs.task\_envs.panda\_slide\_env
+====================================================
+
+.. automodule:: panda_openai_sim.envs.task_envs.panda_slide_env
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaSlideEnv
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_task_env.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_task_env.rst
new file mode 100644
index 0000000..dc1b702
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.panda_task_env.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.envs.task\_envs.panda\_task\_env
+===================================================
+
+.. automodule:: panda_openai_sim.envs.task_envs.panda_task_env
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      PandaTaskEnv
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.rst b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.rst
new file mode 100644
index 0000000..efacaa4
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.envs.task_envs.rst
@@ -0,0 +1,35 @@
+panda\_openai\_sim.envs.task\_envs
+==================================
+
+.. automodule:: panda_openai_sim.envs.task_envs
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
+.. rubric:: Modules
+
+.. autosummary::
+   :toctree:
+   :recursive:
+
+   panda_openai_sim.envs.task_envs.panda_pick_and_place_env
+   panda_openai_sim.envs.task_envs.panda_push_env
+   panda_openai_sim.envs.task_envs.panda_reach_env
+   panda_openai_sim.envs.task_envs.panda_slide_env
+   panda_openai_sim.envs.task_envs.panda_task_env
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.errors.rst b/docs/source/api/_autosummary/panda_openai_sim.errors.rst
new file mode 100644
index 0000000..f23cb26
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.errors.rst
@@ -0,0 +1,31 @@
+panda\_openai\_sim.errors
+=========================
+
+.. automodule:: panda_openai_sim.errors
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Functions
+
+   .. autosummary::
+   
+      arg_keys_error
+      arg_type_error
+      arg_value_error
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.exceptions.rst b/docs/source/api/_autosummary/panda_openai_sim.exceptions.rst
new file mode 100644
index 0000000..aa0f384
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.exceptions.rst
@@ -0,0 +1,35 @@
+panda\_openai\_sim.exceptions
+=============================
+
+.. automodule:: panda_openai_sim.exceptions
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Exceptions
+
+   .. autosummary::
+   
+      EePoseLookupError
+      EeRpyLookupError
+      InputMessageInvalidError
+      RandomEePoseError
+      RandomJointPositionsError
+      SetModelStateError
+      SpawnModelError
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.action_client_state.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.action_client_state.rst
new file mode 100644
index 0000000..58d4cd4
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.action_client_state.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.extras.action\_client\_state
+===============================================
+
+.. automodule:: panda_openai_sim.extras.action_client_state
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      ActionClientState
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.controller_info_dict.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.controller_info_dict.rst
new file mode 100644
index 0000000..20d7aa7
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.controller_info_dict.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.extras.controller\_info\_dict
+================================================
+
+.. automodule:: panda_openai_sim.extras.controller_info_dict
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      ControllerInfoDict
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.euler_angles.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.euler_angles.rst
new file mode 100644
index 0000000..93d43f6
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.euler_angles.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.extras.euler\_angles
+=======================================
+
+.. automodule:: panda_openai_sim.extras.euler_angles
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      EulerAngles
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.nested_dict.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.nested_dict.rst
new file mode 100644
index 0000000..87b0ef0
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.nested_dict.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.extras.nested\_dict
+======================================
+
+.. automodule:: panda_openai_sim.extras.nested_dict
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      NestedDict
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.quaternion.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.quaternion.rst
new file mode 100644
index 0000000..82989b1
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.quaternion.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.extras.quaternion
+====================================
+
+.. automodule:: panda_openai_sim.extras.quaternion
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      Quaternion
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.rst
new file mode 100644
index 0000000..c2c1b41
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.rst
@@ -0,0 +1,37 @@
+panda\_openai\_sim.extras
+=========================
+
+.. automodule:: panda_openai_sim.extras
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
+.. rubric:: Modules
+
+.. autosummary::
+   :toctree:
+   :recursive:
+
+   panda_openai_sim.extras.action_client_state
+   panda_openai_sim.extras.controller_info_dict
+   panda_openai_sim.extras.euler_angles
+   panda_openai_sim.extras.nested_dict
+   panda_openai_sim.extras.quaternion
+   panda_openai_sim.extras.sample_region_marker
+   panda_openai_sim.extras.target_marker
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.sample_region_marker.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.sample_region_marker.rst
new file mode 100644
index 0000000..15f53f9
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.sample_region_marker.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.extras.sample\_region\_marker
+================================================
+
+.. automodule:: panda_openai_sim.extras.sample_region_marker
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      SampleRegionMarker
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.extras.target_marker.rst b/docs/source/api/_autosummary/panda_openai_sim.extras.target_marker.rst
new file mode 100644
index 0000000..9c75653
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.extras.target_marker.rst
@@ -0,0 +1,29 @@
+panda\_openai\_sim.extras.target\_marker
+========================================
+
+.. automodule:: panda_openai_sim.extras.target_marker
+
+   
+   
+   
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Classes
+
+   .. autosummary::
+   
+      TargetMarker
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_openai_sim.functions.rst b/docs/source/api/_autosummary/panda_openai_sim.functions.rst
new file mode 100644
index 0000000..7035538
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_openai_sim.functions.rst
@@ -0,0 +1,55 @@
+panda\_openai\_sim.functions
+============================
+
+.. automodule:: panda_openai_sim.functions
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Functions
+
+   .. autosummary::
+   
+      action_dict_2_joint_trajectory_msg
+      action_server_exists
+      contains_keys
+      controller_list_array_2_dict
+      dict_clean
+      find_gazebo_model_path
+      flatten_list
+      get_duplicate_list
+      get_orientation_euler
+      get_unique_list
+      has_invalid_type
+      has_invalid_value
+      joint_positions_2_follow_joint_trajectory_goal
+      list_2_human_text
+      log_pose_dict
+      lower_first_char
+      merge_two_dicts
+      model_state_msg_2_link_state_dict
+      panda_action_msg_2_control_msgs_action_msg
+      pose_dict_2_pose_msg
+      pose_msg_2_pose_dict
+      split_bounds_dict
+      split_dict
+      split_pose_dict
+      translate_actionclient_result_error_code
+      translate_gripper_width_2_finger_joint_commands
+      wrap_space_around
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/panda_training.functions.rst b/docs/source/api/_autosummary/panda_training.functions.rst
new file mode 100644
index 0000000..0ca49fd
--- /dev/null
+++ b/docs/source/api/_autosummary/panda_training.functions.rst
@@ -0,0 +1,32 @@
+panda\_training.functions
+=========================
+
+.. automodule:: panda_training.functions
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Functions
+
+   .. autosummary::
+   
+      backup_model
+      find_lowest_positive_missing_no
+      get_unique_file_suffix
+      move_all_files_in_dir
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/stable_baselines_ddpg_panda_train_and_inference.rst b/docs/source/api/_autosummary/stable_baselines_ddpg_panda_train_and_inference.rst
new file mode 100644
index 0000000..0d35ec5
--- /dev/null
+++ b/docs/source/api/_autosummary/stable_baselines_ddpg_panda_train_and_inference.rst
@@ -0,0 +1,30 @@
+stable\_baselines\_ddpg\_panda\_train\_and\_inference
+=====================================================
+
+.. automodule:: stable_baselines_ddpg_panda_train_and_inference
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Functions
+
+   .. autosummary::
+   
+      eval
+      train
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/stable_baselines_her_ddpg_panda_train_and_inference.rst b/docs/source/api/_autosummary/stable_baselines_her_ddpg_panda_train_and_inference.rst
new file mode 100644
index 0000000..1f884dd
--- /dev/null
+++ b/docs/source/api/_autosummary/stable_baselines_her_ddpg_panda_train_and_inference.rst
@@ -0,0 +1,30 @@
+stable\_baselines\_her\_ddpg\_panda\_train\_and\_inference
+==========================================================
+
+.. automodule:: stable_baselines_her_ddpg_panda_train_and_inference
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Functions
+
+   .. autosummary::
+   
+      eval
+      train
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/_autosummary/stable_baselines_sac_panda_train_and_inference.rst b/docs/source/api/_autosummary/stable_baselines_sac_panda_train_and_inference.rst
new file mode 100644
index 0000000..b730b99
--- /dev/null
+++ b/docs/source/api/_autosummary/stable_baselines_sac_panda_train_and_inference.rst
@@ -0,0 +1,30 @@
+stable\_baselines\_sac\_panda\_train\_and\_inference
+====================================================
+
+.. automodule:: stable_baselines_sac_panda_train_and_inference
+
+   
+   
+   
+
+   
+   
+   .. rubric:: Functions
+
+   .. autosummary::
+   
+      eval
+      train
+   
+   
+
+   
+   
+   
+
+   
+   
+   
+
+
+
diff --git a/docs/source/api/api.rst b/docs/source/api/api.rst
new file mode 100644
index 0000000..7607f3b
--- /dev/null
+++ b/docs/source/api/api.rst
@@ -0,0 +1,62 @@
+.. _api:
+
+Python Code API
+======================
+
+The :panda_openai_sim_ws:`panda_openai_sim <>` package consists of two sub-packages:
+the :panda_openai_sim:`panda_openai_sim <>` ROS package and the
+:panda_training:`panda_training <>` python package. Each of the modules, classes and
+scripts in these two packages will be documented here.
+
+Panda_openai_sim package
+-------------------------------
+
+Modules
+^^^^^^^^
+
+.. autosummary::
+   :toctree: _autosummary
+   :recursive:
+
+    panda_openai_sim.core
+    panda_openai_sim.envs
+    panda_openai_sim.extras
+    panda_openai_sim.exceptions
+    panda_openai_sim.errors
+    panda_openai_sim.functions
+
+
+ROS nodes
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. autosummary::
+   :toctree: _autosummary
+   :recursive:
+
+    panda_control_server
+    panda_moveit_server
+
+Panda_training package
+-------------------------------
+
+Modules
+^^^^^^^^
+
+.. autosummary::
+   :toctree: _autosummary
+   :recursive:
+
+    panda_training.functions
+
+
+Scripts
+^^^^^^^^^^^^^^^^^^^^
+
+.. autosummary::
+   :toctree: _autosummary
+   :recursive:
+
+    morvan_ddpg_panda_train_and_inference
+    stable_baselines_ddpg_panda_train_and_inference
+    stable_baselines_her_ddpg_panda_train_and_inference
+    stable_baselines_sac_panda_train_and_inference
diff --git a/docs/source/conf.py b/docs/source/conf.py
new file mode 100644
index 0000000..9b8fd14
--- /dev/null
+++ b/docs/source/conf.py
@@ -0,0 +1,154 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+import os
+import sys
+import sphinx_rtd_theme
+
+# -- Add module, and scripts paths to the system path --
+# sys.path.insert(0, os.path.join(os.path.abspath(os.path.dirname(__name__)), "../../"))
+# sys.path.insert(0, os.path.join(os.path.abspath(os.path.dirname(__name__)), ".."))
+sys.path.insert(
+    0,
+    os.path.join(os.path.abspath(os.path.dirname(__name__)), "../panda_openai_sim/src"),
+)
+sys.path.insert(
+    0,
+    os.path.abspath(
+        os.path.join(os.path.dirname(__name__), "../../panda_openai_sim/nodes")
+    ),
+)
+sys.path.insert(
+    0,
+    os.path.abspath(
+        os.path.join(os.path.dirname(__name__), "../../panda_training/scripts")
+    ),
+)
+sys.path.insert(
+    0, os.path.join(os.path.abspath(os.path.dirname(__name__)), "../panda_training"),
+)
+
+# -- Project information -----------------------------------------------------
+project = "panda_openai_sim"
+copyright = "2020, Rick Staa"
+author = "Rick Staa"
+
+# The full version, including alpha/beta/rc tags
+release = "0.0.0"
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+
+# Extensions
+extensions = [
+    "sphinx.ext.napoleon",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.autosummary",
+    "sphinx.ext.intersphinx",
+    "sphinx.ext.viewcode",
+    "sphinx.ext.githubpages",
+    "sphinx.ext.extlinks",
+]
+
+# Extension settings
+autoclass_content = "class"
+autodoc_member_order = "bysource"
+autodoc_default_options = {
+    "show-inheritance": True,
+    "private-members": True,
+}
+napoleon_include_special_with_doc = True
+napoleon_include_init_with_doc = True
+autosummary_generate = True
+
+# Add mappings
+intersphinx_mapping = {
+    "python": ("https://docs.python.org/3", None),
+    "numpy": ("https://numpy.org/doc/stable/", None),
+}
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ["_templates"]
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+# source_suffix = [".rst", ".md"]
+source_suffix = ".rst"
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+# The master toctree document.
+master_doc = "index"
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = "sphinx"
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = False
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+html_theme = "sphinx_rtd_theme"
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ["_static"]
+html_context = {
+    "css_files": [
+        "_static/theme_overrides.css"  # overrides for wide tables in RTD theme
+    ]
+}
+
+# -- External links dictionary -----------------------------------------------
+extlinks = {
+    "panda_openai_sim_ws": ("https://github.com/rickstaa/panda_openai_sim/%s", None),
+    "panda_openai_sim": (
+        "https://github.com/rickstaa/panda_openai_sim/tree/melodic-devel/panda_openai_sim/%s",
+        None,
+    ),
+    "panda_training": (
+        "https://github.com/rickstaa/panda_openai_sim/tree/melodic-devel/panda_training/%s",
+        None,
+    ),
+    "env_config": (
+        (
+            "https://github.com/rickstaa/panda_openai_sim/blob/melodic-devel/"
+            "panda_openai_sim/cfg/env_config.yaml/%s"
+        ),
+        None,
+    ),
+    "stable-baselines": ("https://stable-baselines.readthedocs.io/en/master/%s", None),
+    "sphinx": ("http://www.sphinx-doc.org/en/master/%s", None),
+    "ros": ("https://wiki.ros.org/%s", None),
+    "gym": ("https://gym.openai.com/%s", None),
+    "openai_ros": ("https://wiki.ros.org/openai_ros/%s", None),
+    "franka": ("https://www.franka.de/%s", None),
+    "geometry_msgs": ("https://docs.ros.org/api/geometry_msgs/%s", None),
+    "visualization_msgs": ("https://docs.ros.org/api/visualization_msgs/%s", None),
+    "gazebo_msgs": ("https://docs.ros.org/api/gazebo_msgs/%s", None),
+    "control_msgs": ("https://docs.ros.org/api/control_msgs/%s", None,),
+    "controller_manager_msgs": (
+        "https://docs.ros.org/api/controller_manager_msgs/%s",
+        None,
+    ),
+}
diff --git a/docs/source/dev/doc_dev.rst b/docs/source/dev/doc_dev.rst
new file mode 100644
index 0000000..140ca46
--- /dev/null
+++ b/docs/source/dev/doc_dev.rst
@@ -0,0 +1,75 @@
+.. _doc_dev:
+
+Documentation development
+===================================
+
+Install requirements
+--------------------------
+
+To build the :panda_openai_sim_ws:`panda_openai_sim <>` documentation the
+:sphinx:`sphinx <>` tool together with some other python packages is needed. These
+python requirements can be installed by running the following pip command inside your
+python 3 environment:
+
+.. code-block:: bash
+
+    cd ~/panda_openai_sim_ws/panda_openai_sim
+    pip install .[docs]
+
+Build the documentation
+--------------------------
+
+To be able to build the code API, you need first to make sure the catkin_ws has
+been built. Additionally you need to make sure all the python dependencies for
+the :panda_openai_sim:`panda_openai_sim <>` and
+:panda_training:`panda_training <>` modules have been installed
+inside your python 3 environment. This can be done by running the following pip command
+from within the ``panda_training`` and ``panda_openai_sim`` folders:
+
+.. code-block:: bash
+
+    pip install .
+
+After all the dependencies are installed you can build the documentation using the
+following steps:
+
+    #. Source the :panda_openai_sim_ws:`panda_openai_sim <>` catkin workspace ``source ../devel/setup.bash``.
+    #. Source the devel workspace of the python 3 :ros:`ROS <>` packages ``source ~./.catkin_ws_python3/devel/setup.bash``.
+    #. Run the ``make clean`` command inside the ``docs`` folder.
+    #. Run the ``make html`` command inside the ``docs`` folder.
+
+
+.. warning::
+
+    Please make sure you use a (virtual) python 3 environment when building the
+    documentation as most of the scripts are written in python 3.
+
+Deploy the documentation
+---------------------------
+
+To deploy documentation to the Github Pages site for the repository, push the
+documentation to the ``melodic-devel`` branch and run the ``make gh-pages`` command
+inside the ``panda_autograsp/docs`` directory. This will create the documentation and
+move the HTML files to the ``gh-pages`` branch.
+
+.. warning::
+
+    Please make sure you are on the ``melodic-devel`` branch while deploying the
+    documentation. Otherwise, you will be greeted by errors.
+
+Known issues
+--------------------
+
+ROS python 3 incompatibility issues
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The :panda_openai_sim_ws:`panda_openai_sim <>` package is not yet fully ported to
+python 3 as Moveit is not yet released for :ros:`ROS Noetic <noetic>`. As a result one
+might encounter one of the following errors trying to build the documentation:
+
+- ``dynamic module does not define module export function (PyInit__tf2)``
+- ``dynamic module does not define module export function (PyInit__moveit_roscpp_initializer)``
+
+These errors are solved by building several :ros:`ROS <>` packages for python 3 from
+source and sourcing the ``devel/setup.bash``. How this is done is explained in the
+:ref:`installation instructions <catkin_ws_python>`.
diff --git a/docs/source/dev/release_dev.rst b/docs/source/dev/release_dev.rst
new file mode 100644
index 0000000..5244eba
--- /dev/null
+++ b/docs/source/dev/release_dev.rst
@@ -0,0 +1,26 @@
+.. _release_dev:
+
+Package development
+===================================
+
+Release the Package
+----------------------
+
+Before releasing the package, make sure the following steps are performed:
+
+    #. Squash and merge your branch with the desired branch.
+    #. Update the documentation according to the :ref:`documentation release guidelines <doc_dev>` if needed.
+    #. Bump the version using the `bumpversion tool <https://github.com/peritus/bumpversion>`_.
+    #. Check the version of the current branch using the ``bumpversion --list`` command.
+    #. Add a tag equal to the version specified in the last step (Check versioning guidelines below).
+    #. Update the changelog using the `auto-changelog <https://github.com/CookPete/auto-changelog>`_ tool.
+    #. Commit and push the changes to the remote.
+    #. Create a release using the github draft release tool.
+
+Versioning guidelines
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Please use the ROS version of the branch you are working on in the version tag
+(Example: ``v1.0.0-kinetic`` for the kinetic branch and ``v1.0.0-melodic`` for the
+melodic branch). Additionally please use the
+`versioning guidelines specified at semver.org <https://semver.org/>`_.
diff --git a/docs/source/general/about.rst b/docs/source/general/about.rst
new file mode 100644
index 0000000..568baed
--- /dev/null
+++ b/docs/source/general/about.rst
@@ -0,0 +1,42 @@
+.. _about:
+
+Introduction
+================================
+
+The :panda_openai_sim_ws:`panda_openai_sim <>` package contains several
+:gym:`openai gym <>` environments for the :franka:`Panda Emika franka <>` robot that can
+be used to train RL algorithms. It was based on the :gym:`openai gym <>` and
+:openai_ros:`openai_ros <>` packages and can be used similarly as the original
+:gym:`openai gym Robotics <envs/#robotics>` environments. The
+:panda_openai_sim_ws:`panda_openai_sim <>` package consists of two main sub-packages:
+
+* **The panda_openai_sim package:** This package contains all of of the available :gym:`openai gym <>` environments and the :franka:`Panda Emika Franka <>` simulation.
+* **The panda_training package:** This package includes several examples of how the :panda_openai_sim:`panda_openai_sim <>` can be used to train RL algorithms.
+
+Panda_openai_sim package
+------------------------------
+
+Environments
+^^^^^^^^^^^^^^^^^^
+
+The :panda_openai_sim:`panda_openai_sim package <>` currently contains the following task
+environments:
+
+* **PandaPickAndPlace-v0:** Lift a block into the air.
+* **PandaPush-v0:** Push a block to a goal position.
+* **PandaReach-v0:** Move fetch to a goal position.
+* **PandaSlide-v0:** Slide a puck to a goal position.
+
+Panda_training package
+----------------------------------
+
+The :panda_training:`panda_training <>` package contains the following example training
+scripts:
+
+* **morvan_ddpg_panda_train_and_inference:** Trains a DDPG_ RL algorithm using the classes of MorvanZhou_.
+* **stable_baselines_ddpg_panda_train_and_inference:** Trains a :stable-baselines:`DDPG <modules/ddpg.html>` RL algorithm using the :stable-baselines:`stable-baselines package <>`.
+* **stable_baselines_her_ddpg_panda_train_and_inference:** Trains a :stable-baselines:`DDPG <modules/ddpg.html>` RL algorithm using the :stable-baselines:`stable-baselines <>` :stable-baselines:`HER wrapper<modules/her.html>`.
+* **stable_baselines_sac_panda_train_and_inference:** Trains a :stable-baselines:`SAC <modules/sac.html>` RL algorithm using the :stable-baselines:`stable-baselines package <>`.
+
+.. _DDPG: https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/
+.. _MorvanZhou: https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/9_Deep_Deterministic_Policy_Gradient_DDPG/
diff --git a/docs/source/get_started/install.rst b/docs/source/get_started/install.rst
new file mode 100644
index 0000000..40ad709
--- /dev/null
+++ b/docs/source/get_started/install.rst
@@ -0,0 +1,140 @@
+.. _install:
+
+.. _`issue #17`: https://github.com/rickstaa/panda_openai_sim/issues/17/
+
+Installation
+===================================
+
+Install dependencies
+----------------------
+
+The following dependencies are required to run the
+:panda_openai_sim_ws:`panda_openai_sim <>` package:
+
+* `ROS Melodic - Desktop full <https://wiki.ros.org/melodic/Installation/Ubuntu/>`_
+* `Franka_ros (Build from source) <https://frankaemika.github.io/docs/>`_
+* Several system dependencies.
+
+System dependencies
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To use the scripts inside the :panda_training:`panda_training <>` package, the following
+system dependencies are needed:
+
+.. code-block:: bash
+
+    sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev
+
+ROS melodic
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A guide on how to install ROS melodic can be found
+`here <https://wiki.ros.org/melodic/Installation/Ubuntu/>`_.
+
+Franka_ros
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The steps for building the `franka_ros <https://github.com/frankaemika/franka_ros/>`_
+package from source can be found in the
+`Franka documentation <https://frankaemika.github.io/docs/installation_linux.html#building-from-source/>`_.
+
+
+Build the package
+------------------------
+
+Compile several ROS packages for python 3
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _catkin_ws_python:
+
+Since the Panda gym environments and RL training scripts are written in python 3 and ROS
+does not fully support python 3
+(see `issue #17`_) we need to
+build the following ROS packages for python 3 from source:
+
+* `geometry2 <https://github.com/ros/geometry2/>`_
+* `ros_comm <https://github.com/ros/ros_comm/>`_
+* `geometry_msgs <https://github.com/ros/common_msgs/>`_
+
+The following dependencies are required to build these packages:
+
+.. code-block:: bash
+
+    sudo apt update
+    sudo apt install python-catkin-tools python3-catkin-pkg-modules python3-rospkg-modules python3-empy python3-sip python3-sip-dev
+
+Then prepare catkin workspace:
+
+.. code-block:: bash
+
+    mkdir -p ~/.catkin_ws_python3/src
+    cd ~/.catkin_ws_python3
+    catkin_make
+    source devel/setup.bash
+    wstool init src
+    rosinstall_generator ros_comm common_msgs  geometry2 --rosdistro melodic --deps | wstool merge -t src -
+    wstool update -t src -j8
+    rosdep install --from-paths src --ignore-src -y -r
+
+Finally compile the ROS packages for Python 3 using the following catkin build command:
+
+.. code-block:: bash
+
+    catkin build --cmake-args \
+                -DCMAKE_BUILD_TYPE=Release \
+                -DPYTHON_EXECUTABLE=/usr/bin/python3 \
+                -DPYTHON_INCLUDE_DIR=/usr/include/python3.6m \
+                -DPYTHON_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.6m.so
+
+Build the main catkin package
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+After the required ROS packages have been build for python 3 you can build
+the :panda_openai_sim_ws:`panda_openai_sim <>` package. To do this first clone the
+:panda_openai_sim_ws:`panda_openai_sim <>` repository:
+
+.. code-block:: bash
+
+    mkdir ~/panda_openai_sim_ws
+    cd ~/panda_openai_sim_ws
+    git clone --recursive https://github.com/rickstaa/panda_openai_sim.git src
+
+
+Then install the required dependencies using the following command:
+
+.. code-block:: bash
+
+    rosdep install --from-paths src --ignore-src --rosdistro melodic -y --skip-keys libfranka
+
+
+Finally, the catkin package can be build by executing the following command from within
+the catkin workspace:
+
+.. code-block:: bash
+
+    catkin build -j4 -DCMAKE_BUILD_TYPE=Release -DFranka_DIR:PATH=~/libfranka/build
+
+Create a python 3 virtual environments
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. _py3_virtual_env:
+
+Since ROS does not yet fully support python 3
+(see `issue #17`_), we need
+to separate the training script (python 3) and the ROS gazebo simulation (python 2). To
+do this, please create a virtual environment:
+
+.. code-block:: bash
+
+    sudo apt install virtualenv
+    virtualenv ~/.catkin_ws_python3/openai_venv --python=python3
+
+After this environment is created, you can activate it using the
+``source ~/.catkin_ws_python3/openai_venv/bin/activate`` command. Following you need
+to make sure all the python dependencies for the :panda_openai_sim:`panda_openai_sim <>`
+and :panda_training:`panda_training <>` packages have been installed inside this
+python 3 environment. This can be done by running the following pip command from within
+the ``panda_training`` and ``panda_openai_sim`` folders:
+
+.. code-block:: bash
+
+    pip install .
diff --git a/docs/source/get_started/use.rst b/docs/source/get_started/use.rst
new file mode 100644
index 0000000..38fcad1
--- /dev/null
+++ b/docs/source/get_started/use.rst
@@ -0,0 +1,63 @@
+.. _use:
+
+.. _`issue #17`: https://github.com/rickstaa/panda_openai_sim/issues/17/
+
+Tutorial
+========================
+
+Using the panda_openai_sim package
+--------------------------------------
+
+Here is a bare minimum example of getting something running. This will run an instance
+of the PandaReach-v0 environment for 1000 timesteps.
+
+.. code-block:: python
+
+    import gym
+    import panda_openai_sim.envs
+    env = gym.make('PandaReach-v0')
+    env = gym.wrappers.FlattenObservation(env)
+    env.reset()
+    for _ in range(1000):
+        env.step(env.action_space.sample()) # take a random action
+    env.close()
+
+For this to work, you must run the simulated Panda robot. After you have
+successfully build the :panda_openai_sim:`panda_openai_sim <>` ROS package and
+installed all the dependencies you can run the simulation using the following
+ROS command:
+
+.. code-block:: bash
+
+    roslaunch panda_openai_sim start.launch
+
+.. warning::
+
+    Since ROS does not yet fully support python 3
+    (see `issue #17`_), we need
+    to run the Panda gym environments and the :panda_training:`panda_training <>`
+    scripts inside a separate python 3 environment. You therefore, have to make sure
+    the training scripts are started inside the virtual environment that was created
+    during the :ref:`the installation <py3_virtual_env>`.
+
+
+Changing parameters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Most of the Panda Robot environment parameters can be changed using the environment
+constructor (see the :ref:`code api <api>` for more information). Some additional global parameters
+can be found in the :env_config:`env_config.yaml <>` file.
+
+Using the panda_training package
+------------------------------------
+
+To run the example training scripts of the :panda_training:`panda_training package <>` you
+first have to make sure the :panda_openai_sim:`panda_openai_sim <>` Panda_simulation is
+running. Following you have to make sure that you activated the python 3 virtual
+environment using the ``source  ~/.catkin_ws_python3/openai_venv/bin/activate`` command.
+After you are inside the python environment, you can run each of the example scripts.
+
+Changing parameters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+All of the parameters for the example scripts are found at the top of the scripts.
\ No newline at end of file
diff --git a/docs/source/index.rst b/docs/source/index.rst
new file mode 100644
index 0000000..3041ede
--- /dev/null
+++ b/docs/source/index.rst
@@ -0,0 +1,51 @@
+.. panda_openai_sim documentation master file, created by
+   sphinx-quickstart on Tue Jun  9 11:12:09 2020.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to panda_openai_sim's documentation!
+============================================
+
+Welcome to the documentation for the :panda_openai_sim_ws:`panda_openai_sim <>` package.
+This package contains a :gym:`openai gym <>` environment for the :franka:`Panda Emika
+Franka <>` robot which can be used to train RL algorithms.
+
+Contents
+===============
+
+.. toctree::
+   :maxdepth: 2
+   :caption: General
+   :glob:
+
+   general/about.rst
+
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Getting started
+   :glob:
+
+   get_started/install.rst
+   get_started/use.rst
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Development
+
+   dev/release_dev.rst
+   dev/doc_dev.rst
+
+.. toctree::
+   :maxdepth: 2
+   :caption: API Documentation
+   :glob:
+
+   api/api.rst
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/panda_openai_sim/CMakeLists.txt b/panda_openai_sim/CMakeLists.txt
index 1f990f5..c425134 100755
--- a/panda_openai_sim/CMakeLists.txt
+++ b/panda_openai_sim/CMakeLists.txt
@@ -123,7 +123,7 @@ generate_messages(
 catkin_package(
 #  INCLUDE_DIRS include
 #  LIBRARIES panda_openai_sim
-  CATKIN_DEPENDS rospy message_runtime actionlib actionlib_msgs std_msgs geometry_msgs trajectory_msgs sensor_msgs control_msgs openai_ros
+  CATKIN_DEPENDS rospy message_runtime actionlib actionlib_msgs std_msgs geometry_msgs trajectory_msgs sensor_msgs control_msgs openai_ros rviz
 #  DEPENDS system_lib
 )
 
diff --git a/panda_openai_sim/README.md b/panda_openai_sim/README.md
index 2c71250..7ea6eca 100755
--- a/panda_openai_sim/README.md
+++ b/panda_openai_sim/README.md
@@ -28,9 +28,9 @@ for _ in range(1000):
 env.close()
 ```
 
-For this to work you must run the simulated Panda robot. After you have
+For this to work, you must run the simulated Panda robot. After you have
 successfully build the `panda_openai_sim` ROS package and installed all the dependencies
-you can run the simulation using the following ros command:
+you can run the simulation using the following ROS command:
 
 ```bash
 roslaunch panda_openai_sim start.launch
diff --git a/panda_openai_sim/launch/panda_openai_sim.launch b/panda_openai_sim/launch/panda_openai_sim.launch
index a0f1a31..db0e8f3 100755
--- a/panda_openai_sim/launch/panda_openai_sim.launch
+++ b/panda_openai_sim/launch/panda_openai_sim.launch
@@ -16,7 +16,7 @@
     <!-- Control arguments -->
     <arg name="use_group_controller" default="false" />
     <!-- Wheter to use group controllers for the panda control server -->
-    <arg name="preload_controllers" default="false" />
+    <arg name="preload_controllers" default="true" />
     <!-- Wheter to preload all non essential controllers -->
 
     <!-- ## LOAD ROBOT DESCRIPTION ## -->
diff --git a/panda_openai_sim/nodes/panda_control_server.py b/panda_openai_sim/nodes/panda_control_server.py
index 7e11a65..a694c72 100755
--- a/panda_openai_sim/nodes/panda_control_server.py
+++ b/panda_openai_sim/nodes/panda_control_server.py
@@ -1,6 +1,13 @@
 #!/usr/bin/env python
 """This node sets up a number of services that can be used to control
-the Panda Emika Franka robot using the Moveit framework.
+the Panda Emika Franka robot.
+
+Source code
+----------------------------
+.. literalinclude:: ../../../../panda_openai_sim/nodes/panda_control_server.py
+   :language: python
+   :linenos:
+   :lines: 13-
 """
 
 # Import ROS packages
diff --git a/panda_openai_sim/nodes/panda_moveit_server.py b/panda_openai_sim/nodes/panda_moveit_server.py
index a273dd6..c6b68ac 100755
--- a/panda_openai_sim/nodes/panda_moveit_server.py
+++ b/panda_openai_sim/nodes/panda_moveit_server.py
@@ -1,6 +1,13 @@
 #!/usr/bin/env python
 """This node sets up a number of services that can be used to control
-the Panda Emika Franka robot.
+the Panda Emika Franka robot using the Moveit framework.
+
+Source code
+----------------------------
+.. literalinclude:: ../../../../panda_openai_sim/nodes/panda_moveit_server.py
+   :language: python
+   :linenos:
+   :lines: 13-
 """
 
 # Import ROS packages
diff --git a/panda_openai_sim/package.xml b/panda_openai_sim/package.xml
index c75d75f..3a5ff63 100755
--- a/panda_openai_sim/package.xml
+++ b/panda_openai_sim/package.xml
@@ -82,7 +82,7 @@
   <exec_depend>actionlib_msgs</exec_depend>
   <exec_depend>control_msgs</exec_depend>
   <exec_depend>openai_ros</exec_depend>
-
+  <exec_depend>rviz</exec_depend>
 
   <!-- The export tag contains other, unspecified, tags (Used for metapackages and
   Plugins -->
diff --git a/docs/manuals/endeffector-config.json b/panda_openai_sim/resources/documents/endeffector-config.json
similarity index 100%
rename from docs/manuals/endeffector-config.json
rename to panda_openai_sim/resources/documents/endeffector-config.json
diff --git a/panda_openai_sim/setup.py b/panda_openai_sim/setup.py
index 8817ed5..27243fd 100755
--- a/panda_openai_sim/setup.py
+++ b/panda_openai_sim/setup.py
@@ -17,7 +17,7 @@
 date_files_relative_path = os.path.join(relative_site_packages, "panda_openai_sim")
 
 # Additional python requirements that could not be specified in the package.xml
-requirements = []
+requirements = ["rospkg"]
 
 # Set up logger
 logger = logging.getLogger(__name__)
diff --git a/panda_openai_sim/src/panda_openai_sim.egg-info/PKG-INFO b/panda_openai_sim/src/panda_openai_sim.egg-info/PKG-INFO
index b4afc09..c7d085f 100644
--- a/panda_openai_sim/src/panda_openai_sim.egg-info/PKG-INFO
+++ b/panda_openai_sim/src/panda_openai_sim.egg-info/PKG-INFO
@@ -8,7 +8,6 @@ Author-email: rick.staa@outlook.com
 License: Rick Staa copyright
 Description: ﻿# Panda_openai_sim
         
-        
         A ROS package that creates an Openai gym environment for the Panda Emika Franka robot.
         It contains a simulated version of the Panda robot together with a Panda gym environment
         that can be used to train RL algorithms as is done with the original
@@ -16,19 +15,20 @@ Description: ﻿# Panda_openai_sim
         
         ## Environments
         
-        The panda_openai_sim package currently contains the following environments:
-         - PandaPickAndPlace-v0: Lift a block into the air.
-         - PandaPush-v0: Push a block to a goal position.
-         - PandaReach-v0: Move fetch to a goal position.
-         - PandaSlide-v0: Slide a puck to a goal position.
+        The panda_openai_sim package currently contains the following task environments:
+        
+        - **PandaPickAndPlace-v0:** Lift a block into the air.
+        - **PandaPush-v0:** Push a block to a goal position.
+        - **PandaReach-v0:** Move fetch to a goal position.
+        - **PandaSlide-v0:** Slide a puck to a goal position.
         
         Here is a bare minimum example of getting something running. This will run an instance
         of the PandaReach-v0 environment for 1000 timesteps.
         
-        ```
+        ```python
         import gym
-        from panda_openai_sim.envs.task_envs import PandaReachEnv
-        env = gym.make('CartPole-v0')
+        import panda_openai_sim.envs
+        env = gym.make('PandaReach-v0')
         env = gym.wrappers.FlattenObservation(env)
         env.reset()
         for _ in range(1000):
@@ -40,7 +40,7 @@ Description: ﻿# Panda_openai_sim
         successfully build the `panda_openai_sim` ROS package and installed all the dependencies
         you can run the simulation using the following ros command:
         
-        ```
+        ```bash
         roslaunch panda_openai_sim start.launch
         ```
         
@@ -48,6 +48,7 @@ Description: ﻿# Panda_openai_sim
         
         Please see the [docs](https://rickstaa.github.io/panda_openai_sim/) for installation
         and usage instructions.
+        
 Keywords: ros,rl,panda,openai gym
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3.5
diff --git a/panda_openai_sim/src/panda_openai_sim/core/__init__.py b/panda_openai_sim/src/panda_openai_sim/core/__init__.py
index 18f04e8..1668b3b 100644
--- a/panda_openai_sim/src/panda_openai_sim/core/__init__.py
+++ b/panda_openai_sim/src/panda_openai_sim/core/__init__.py
@@ -1,4 +1,8 @@
-﻿# Python 2 and 3:
+﻿"""Module that contains the core components (classes and functions) that are needed for
+creating the Panda openai :gym:`gym <>` environment.
+"""
+
+# Python 2 and 3:
 # To make Py2 code safer (more like Py3) by preventing
 # implicit relative imports, you can also add this to the top:
 from __future__ import absolute_import
@@ -10,5 +14,5 @@
 
 # NOTE: We can not yet import Moveit when using both python 2 and python 3 as Moveit is
 # not yet python3 compatible.
-# IMPROVE: Fix if moveit becomes python 3 compatible (ROS NOETIC)
+# TODO: Fix if moveit becomes python 3 compatible (ROS NOETIC)
 # from .moveit_server import PandaMoveitPlannerServer
diff --git a/panda_openai_sim/src/panda_openai_sim/core/control_server.py b/panda_openai_sim/src/panda_openai_sim/core/control_server.py
index 03b3104..f5f2e30 100644
--- a/panda_openai_sim/src/panda_openai_sim/core/control_server.py
+++ b/panda_openai_sim/src/panda_openai_sim/core/control_server.py
@@ -23,6 +23,7 @@
     get_duplicate_list,
     translate_actionclient_result_error_code,
     list_2_human_text,
+    panda_action_msg_2_control_msgs_action_msg,
 )
 from panda_openai_sim.exceptions import InputMessageInvalidError
 from panda_openai_sim.core import GroupPublisher
@@ -139,7 +140,7 @@ def __init__(
         autofill_traj_positions=False,
         connection_timeout=10,
     ):
-        """Initializes the PandaControlServer object
+        """Initializes the PandaControlServer object.
 
         Parameters
         ----------
@@ -602,35 +603,10 @@ def __init__(
     ###############################################
     # Panda control member functions ##############
     ###############################################
-    def _panda_action_msg_2_control_msgs_action_msg(self, panda_action_msg):
-        """Converts a panda_openai_sim FollowJointTrajectoryActionGoal action message
-        into a control_msgs FollowJointTrajectoryGoal action message.
-
-        Parameters
-        ----------
-        panda_action_msg : panda_openai_sim.msg.FollowJointTrajectoryGoal
-            panda_openai_sim follow joint trajectory goal message.
-
-        Returns
-        -------
-        control_msgs.msg.FollowJointTrajectoryGoal
-            Control_msgs follow joint trajectory goal message
-        """
-
-        # Fill new control_msgs.msg.FollowJointTrajectoryGoal message
-        control_msgs_action_msg = control_msgs.FollowJointTrajectoryGoal()
-        control_msgs_action_msg.trajectory = panda_action_msg.trajectory
-        control_msgs_action_msg.goal_time_tolerance = (
-            panda_action_msg.goal_time_tolerance
-        )
-        control_msgs_action_msg.goal_tolerance = panda_action_msg.goal_tolerance
-        control_msgs_action_msg.path_tolerance = panda_action_msg.path_tolerance
-        return control_msgs_action_msg
-
     def _init_action_servers_fb_msgs(self):
-        """Initiate the 'panda_control_server/panda_arm/follow_joint_trajectory' and
-        'panda_control_server/panda_hand/follow_joint_trajectory' feedback messages with
-        the current robot states.
+        """Initiate the ``panda_control_server/panda_arm/follow_joint_trajectory`` and
+        ``panda_control_server/panda_hand/follow_joint_trajectory`` feedback messages
+        with the current robot states.
         """
 
         # Create header
@@ -669,10 +645,10 @@ def _init_action_servers_fb_msgs(self):
 
     def _get_combined_action_server_fb_msg(self):
         """Combine the action server feedback messages from the original
-        'panda_arm_controller/follow_joint_trajectory' and
-        'panda_hand_controller/follow_joint_trajectory' action servers so that they can
-        be used as a feedback message to the
-        'panda_control_server/follow_joint_trajectory' wrapper action server.
+        ``panda_arm_controller/follow_joint_trajectory`` and
+        ``panda_hand_controller/follow_joint_trajectory`` action servers so that they
+        can be used as a feedback message to the
+        ``panda_control_server/follow_joint_trajectory`` wrapper action server.
 
         Returns
         -------
@@ -1018,7 +994,7 @@ def _wait_till_done(
         ----------
         control_type : str
             The type of control that is being executed and on which we should wait.
-            Options are 'joint_effort_control' and 'joint_position_control'.
+            Options are ``joint_effort_control`` and ``joint_position_control``.
         control_group : str, optional
             The control group  for which the control is being performed, defaults to
             "both".
@@ -1208,10 +1184,12 @@ def _wait_till_done(
         return True
 
     def _create_traj_action_server_msg(self, input_msg, control_group, verbose=False):
-        """Converts the panda_openai_sim.msg.FollowJointTrajectoryGoal message that is
-        received by the 'panda_control_server' follow joint trajectory wrapper action
-        servers into the right format for the origina `panda_arm_controller` and
-        'panda_hand_controller'  follow joint trajectory action servers.
+        """Converts the ``panda_openai_sim.msg.FollowJointTrajectoryGoal`` message that
+        is received by the ``panda_control_server`` follow joint trajectory wrapper
+        action servers into the right format for the original ``panda_arm_controller``
+        and ``panda_hand_controller``
+        `follow_joint_trajectory <https://wiki.ros.org/joint_trajectory_action/>`_
+        action servers.
 
         Parameters
         ----------
@@ -1219,16 +1197,17 @@ def _create_traj_action_server_msg(self, input_msg, control_group, verbose=False
             The service input message we want to validate.
         control_type : str
             The type of control that is being executed and on which we should wait.
-            Options are 'joint_effort_control' and 'joint_position_control'.
+            Options are ``joint_effort_control`` and ``joint_position_control``.
         verbose : bool
-            Bool specifying whether you want to send a warning message to the ROS
+            Boolean specifying whether you want to send a warning message to the ROS
             logger.
 
         Returns
         -------
         dict
             A dictionary containing the arm and hand panda_arm_controller
-            'control_msgs.msg.FollowJointTrajectoryGoal' messages.
+            :control_msgs:`control_msgs.msg.FollowJointTrajectoryGoal
+            <html/action/FollowJointTrajectory.html>` messages.
 
         Raises
         ----------
@@ -1266,8 +1245,8 @@ def _create_traj_action_server_msg(self, input_msg, control_group, verbose=False
 
             # Throw error if controlled joints could not be retrieved
             logwarn_message = (
-                "The joint trajectory publisher messages could not be created as the "
-                "'%' are not initialized." % (self._traj_controllers)
+                "The joint trajectory publisher message could not be created as the "
+                "'%s' are not initialized." % (self._traj_controllers)
             )
             if verbose:
                 rospy.logwarn(logwarn_message)
@@ -1835,12 +1814,8 @@ def _create_traj_action_server_msg(self, input_msg, control_group, verbose=False
 
             # Return new action server messages
             return {
-                "arm": self._panda_action_msg_2_control_msgs_action_msg(
-                    arm_control_msg
-                ),
-                "hand": self._panda_action_msg_2_control_msgs_action_msg(
-                    hand_control_msg
-                ),
+                "arm": panda_action_msg_2_control_msgs_action_msg(arm_control_msg),
+                "hand": panda_action_msg_2_control_msgs_action_msg(hand_control_msg),
             }
 
         else:  # If joints were specified
@@ -2365,10 +2340,10 @@ def _create_traj_action_server_msg(self, input_msg, control_group, verbose=False
 
                     # Return new action server messages
                     return {
-                        "arm": self._panda_action_msg_2_control_msgs_action_msg(
+                        "arm": panda_action_msg_2_control_msgs_action_msg(
                             arm_control_msg
                         ),
-                        "hand": self._panda_action_msg_2_control_msgs_action_msg(
+                        "hand": panda_action_msg_2_control_msgs_action_msg(
                             hand_control_msg
                         ),
                     }
@@ -2386,17 +2361,17 @@ def _create_control_publisher_msg(
             The service input message we want to validate.
         control_type : str
             The type of control that is being executed and on which we should wait.
-            Options are 'joint_effort_control' and 'joint_position_control'.
+            Options are ``joint_effort_control`` and ``joint_position_control``.
         control_group : str
-            The robot control group which is being controlled. Options are "arm",
-            "hand" or "both".
+            The robot control group which is being controlled. Options are ``arm``,
+            ``hand`` or ``both``.
         verbose : bool
-            Bool specifying whether you want to send a warning message to the ROS
+            Boolean specifying whether you want to send a warning message to the ROS
             logger.
 
         Returns
         -------
-        (Dict, Dict)
+        (dict, dict)
             Two dictionaries. The first dictionary contains the Panda arm and hand
             control commands in the order which is are required by the publishers.
             The second dictionary contains the joints that are being controlled.
@@ -2404,7 +2379,7 @@ def _create_control_publisher_msg(
         Raises
         ----------
         panda_openai_sim.exceptions.InputMessageInvalidError
-            Raised when the input_msg could not be converted into 'moveit_commander'
+            Raised when the input_msg could not be converted into ``moveit_commander``
             arm hand joint position/effort commands.
         """
 
@@ -2459,7 +2434,7 @@ def _create_control_publisher_msg(
 
             # Throw error if controlled joints could not be retrieved
             logwarn_message = (
-                "The %s publisher messages could not be created as the '%s' '%s' "
+                "The '%s' publisher messages could not be created as the '%s' %s "
                 "are not initialized."
                 % (
                     "effort control"
@@ -2468,6 +2443,9 @@ def _create_control_publisher_msg(
                     self._effort_controllers
                     if control_type == "joint_effort_control"
                     else self._position_controllers,
+                    "joint effort controllers"
+                    if control_type == "joint_effort_control"
+                    else "joint position controllers",
                 )
             )
             if verbose:
@@ -2783,21 +2761,21 @@ def _get_controlled_joints(self, control_type, verbose=False):
         ----------
         control_type : str
             The type of control that is being executed and on which we should wait.
-            Options are 'joint_effort_control' and 'joint_position_control'.
+            Options are ``joint_effort_control`` and ``joint_position_control``.
         verbose : bool
-            Bool specifying whether you want to send a warning message to the ROS
+            Boolean specifying whether you want to send a warning message to the ROS
             logger.
 
         Returns
         -------
         dict
             A dictionary containing the joints that are controlled when using a given
-            control type, grouped by control group ("arm" and "hand).
+            control type, grouped by control group (``arm`` and ``hand``).
 
         Raises
         ----------
         panda_openai_sim.exceptions.InputMessageInvalidError
-            Raised when the input_msg could not be converted into 'moveit_commander'
+            Raised when the input_msg could not be converted into ``moveit_commander``
             arm hand joint position commands.
         """
 
@@ -4033,7 +4011,7 @@ def _list_control_type_cb(self, list_control_type_req):
         Returns
         -------
         panda_openai_sim.srv.ListControlTypeResponse
-            Service response. Options: 'joint_group_control' and 'joint_control'.
+            Service response. Options: ``joint_group_control`` and ``joint_control``.
         """
 
         # Check if verbose was set to True
@@ -4072,7 +4050,7 @@ def _get_controlled_joints_cb(self, get_controlled_joints_req):
         Returns
         -------
         panda_openai_sim.srv.GetControlledJointsResponse
-            The response message that contains the 'controlled_joints' list that
+            The response message that contains the ``controlled_joints`` list that
             specifies the joints that are controlled.
         """
 
@@ -4429,8 +4407,9 @@ def _joint_traj_execute_cb(self, goal):
 
     def _arm_joint_traj_feedback_cb(self, feedback):
         """Relays the feedback messages from the original
-        'panda_arm_controller/follow_joint_trajectory' server to our to our
-        'panda_control_server/panda_arm/follow_joint_trajectory' wrapper action server.
+        ``panda_arm_controller/follow_joint_trajectory`` server to our to our
+        ``panda_control_server/panda_arm/follow_joint_trajectory`` wrapper action
+        server.
 
         Parameters
         ----------
@@ -4457,8 +4436,9 @@ def _arm_joint_traj_feedback_cb(self, feedback):
 
     def _hand_joint_traj_feedback_cb(self, feedback):
         """Relays the feedback messages from the original
-        'panda_hand_controller/follow_joint_trajectory' server to our to our
-        'panda_control_server/panda_hand/follow_joint_trajectory' wrapper action server.
+        ``panda_hand_controller/follow_joint_trajectory`` server to our to our
+        ``panda_control_server/panda_hand/follow_joint_trajectory`` wrapper action
+        server.
 
         Parameters
         ----------
@@ -4485,8 +4465,8 @@ def _hand_joint_traj_feedback_cb(self, feedback):
 
     def _arm_joint_traj_preempt_cb(self):
         """Relays the preempt request made to the
-        'panda_control_server/panda_arm/follow_joint_trajectory' action server wrapper
-        to the original 'panda_arm_controller/follow_joint_trajectory' action server.
+        ``panda_control_server/panda_arm/follow_joint_trajectory`` action server wrapper
+        to the original ``panda_arm_controller/follow_joint_trajectory`` action server.
         """
 
         # Stop panda_arm_controller action server
@@ -4496,8 +4476,9 @@ def _arm_joint_traj_preempt_cb(self):
 
     def _hand_joint_traj_preempt_cb(self):
         """Relays the preempt request made to the
-        'panda_control_server/panda_hand/follow_joint_trajectory' action server wrapper
-        to the original 'panda_hand_controller/follow_joint_trajectory' action server.
+        ``panda_control_server/panda_hand/follow_joint_trajectory`` action server
+        wrapper to the original ``panda_hand_controller/follow_joint_trajectory``
+        action server.
         """
 
         # Stop panda_hand_controller action server
@@ -4507,9 +4488,9 @@ def _hand_joint_traj_preempt_cb(self):
 
     def _joint_traj_preempt_cb(self):
         """Relays the preempt request made to the
-        'panda_control_server/follow_joint_trajectory' action server wrapper
-        to the original 'panda_arm_controller/follow_joint_trajectory' and
-        'panda_hand_controller/follow_joint_trajectory' action servers.
+        ``panda_control_server/follow_joint_trajectory`` action server wrapper
+        to the original ``panda_arm_controller/follow_joint_trajectory`` and
+        ``panda_hand_controller/follow_joint_trajectory`` action servers.
         """
 
         # Stop panda_arm_controller and panda_hand_controller action servers
@@ -4517,29 +4498,3 @@ def _joint_traj_preempt_cb(self):
         self._hand_joint_traj_client.cancel_goal()
         self._joint_traj_as.set_preempted()
         self._full_joint_traj_control = False
-
-
-#################################################
-# Main script ###################################
-#################################################
-if __name__ == "__main__":
-
-    # Initiate ROS nodetypes
-    rospy.init_node("panda_control_server")
-
-    # Get ROS parameters
-    try:  # Check end effector
-        use_group_controller = rospy.get_param("~use_group_controller")
-    except KeyError:
-        use_group_controller = False
-    try:  # Auto fill joint traj position field if left empty
-        autofill_traj_positions = rospy.get_param("~autofill_traj_positions")
-    except KeyError:
-        autofill_traj_positions = False
-
-    # Start control server
-    control_server = PandaControlServer(
-        use_group_controller=use_group_controller,
-        autofill_traj_positions=autofill_traj_positions,
-    )
-    rospy.spin()  # Maintain the service open.
diff --git a/panda_openai_sim/src/panda_openai_sim/core/control_switcher.py b/panda_openai_sim/src/panda_openai_sim/core/control_switcher.py
index 7e1b09f..47af372 100644
--- a/panda_openai_sim/src/panda_openai_sim/core/control_switcher.py
+++ b/panda_openai_sim/src/panda_openai_sim/core/control_switcher.py
@@ -1,12 +1,18 @@
 """This class is responsible for switching the control type that is used for
-controlling the Panda Robot robot 'arm' and 'hand'. It serves as a wrapper around the
-services created by the ROS 'controller_manager' class. It allows the user to switch
-between the following control types:
-    - joint_trajectory_control
-    - joint_position_control
-    - joint_effort_control
-    - joint_group_position_control
-    - joint_group_effort_control
+controlling the Panda Robot robot ``arm`` and ``hand``. It serves as a wrapper around
+the services created by the ROS
+`controller_manager <https://wiki.ros.org/controller_manager>`_ class. It allows the
+user to switch between the following control types:
+
+* `joint_trajectory_control <https://wiki.ros.org/joint_trajectory_controller/>`_
+
+* `joint_position_control <https://wiki.ros.org/position_controllers/>`_
+
+* `joint_effort_control <https://wiki.ros.org/effort_controllers/>`_
+
+* `joint_group_position_control <https://wiki.ros.org/ros_controllers/>`_
+
+* `joint_group_effort_control <https://wiki.ros.org/ros_controllers/>`_
 """
 
 # Main python imports
@@ -86,7 +92,7 @@ class PandaControlSwitcher(object):
     Attributes
     ----------
     verbose : bool
-        Bool specifying whether we want to display log messages during switching.
+        Boolean specifying whether we want to display log messages during switching.
 
     Methods
     -------
@@ -311,15 +317,15 @@ def switch(
             are 'hand' or 'arm'.
         control_type : str
             The robot control type you want to switch to for the given 'control_group'.
-            Options are: 'joint_trajectory_control', 'joint_position_control',
-            'joint_effort_control', 'joint_group_position_control' and
-            'joint_group_effort_control'.
+            Options are: ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control`` and
+            ``joint_group_effort_control``.
         load_controllers : bool
             Try to load the required controllers for a given control_type if they are
             not yet loaded.
         timeout : int, optional
             The timout for switching to a given controller, by default
-            self._controller_switch_timeout.
+            :py:attr:`self._controller_switch_timeout`.
         verbose : bool, optional
             Whether to display debug log messages, defaults to verbose value set during
             the class initiation.
@@ -572,7 +578,7 @@ def __init__(self, success=True, prev_control_type=""):
         success : bool, optional
             Whether the switch operation was successfull, by default True.
         prev_control_type : str, optional
-            The previous used control type, by default "".
+            The previous used control type, by default ``""``.
         """
 
         # Set class attributes
diff --git a/panda_openai_sim/src/panda_openai_sim/core/group_publisher.py b/panda_openai_sim/src/panda_openai_sim/core/group_publisher.py
index 4caae0c..9454d12 100644
--- a/panda_openai_sim/src/panda_openai_sim/core/group_publisher.py
+++ b/panda_openai_sim/src/panda_openai_sim/core/group_publisher.py
@@ -22,8 +22,8 @@ def __init__(self, iterable=[]):
         """
         Parameters
         ----------
-        iterable : iterable, optional
-            New list initialized from iterable items, by default [].
+        iterable : list, optional
+            New list initialized from iterable items, by default ``[]``.
 
         Raises
         ------
@@ -58,6 +58,7 @@ def publish(self, messages):
         Raises
         ------
         ValueError
+            If something went wrong during publishing.
         """
 
         # Validate input messages
diff --git a/panda_openai_sim/src/panda_openai_sim/core/moveit_server.py b/panda_openai_sim/src/panda_openai_sim/core/moveit_server.py
index 3fa125b..900c8ed 100644
--- a/panda_openai_sim/src/panda_openai_sim/core/moveit_server.py
+++ b/panda_openai_sim/src/panda_openai_sim/core/moveit_server.py
@@ -21,8 +21,15 @@
 
 # ROS python imports
 import rospy
-import moveit_commander
-from moveit_commander.exception import MoveItCommanderException
+
+try:  # TODO: Remove when moveit is released for python 3 (ROS Noetic)
+    import moveit_commander
+except ImportError:
+    pass
+try:  # TODO: Remove when moveit is released for python 3 (ROS Noetic)
+    from moveit_commander.exception import MoveItCommanderException
+except ImportError:
+    pass
 from rospy.exceptions import ROSException
 
 # ROS msgs and srvs
@@ -90,16 +97,17 @@ def __init__(
         ----------
         arm_move_group : str, optional
             The name of the move group you want to use for controlling the Panda arm,
-            by default "panda_arm".
+            by default panda_arm.
         arm_ee_link : str, optional
             The end effector you want moveit to use when controlling
-            the Panda arm by default "panda_link8".
+            the Panda arm by default panda_link8.
         hand_move_group : str, optional
             The name of the move group you want to use for controlling the Panda
-            hand, by default "hand".
+            hand, by default hand.
         create_all_services : bool, optional
             Specifies whether we want to create all the available services or only the
-            ones that are crucial for the panda_openai_sim package, by default False.
+            ones that are crucial for the panda_openai_sim package, by default
+            False.
         """
 
         # Create class attributes
@@ -296,7 +304,7 @@ def _link_exists(self, link_name):
 
         Returns
         -------
-        Bool
+        bool
             Boolean specifying whether the link exists.
         """
         return link_name in self.robot.get_link_names()
@@ -308,12 +316,12 @@ def _execute(self, control_group="both"):
         ----------
         control_group : str, optional
             The robot control group for which you want to execute the control. Options
-            are "arm" or "hand" or "both, by default "both".
+            are ``arm`` or ``hand`` or ``both``, by default both.
         Returns
         -------
         list
             List specifying whether the arm and/or hand execution was successfull. If
-            control_group == "both" then ["arm_success", "hand_success"].
+            ``control_group == "both"`` then ``["arm_success", "hand_success"]``.
         """
 
         # Plan and execute
@@ -356,14 +364,14 @@ def _create_joint_positions_commands(
             The service input message we want to validate.
         control_group : str, optional
             The robot control group for which you want to execute the control. Options
-            are "arm" or "hand" or "both, by default "both".
+            are ``arm`` or ``hand`` or ``both``, by default both.
         verbose : bool
-            Bool specifying whether you want to send a warning message to the ROS
+            Boolean specifying whether you want to send a warning message to the ROS
             logger.
 
         Returns
         -------
-        Dict
+        dict
             Dictionary that contains the 'moveit_commander' arm and hand joint
             position commands. Grouped by control group.
 
@@ -1095,8 +1103,8 @@ def _get_random_joint_positions_callback(self, get_random_position_req):
                     rospy.logwarn(
                         "Joint limits ignored as the the "
                         "'panda_openai_sim.msg.JointLimits' field of the "
-                        "'panda_openai_sim.srv.GetRandomJointPositionsRequest' contains "
-                        "%s %s. Valid values are %s."
+                        "'panda_openai_sim.srv.GetRandomJointPositionsRequest' "
+                        "contains %s %s. Valid values are %s."
                         % (
                             "an invalid joint name"
                             if len(invalid_names) == 1
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/__init__.py b/panda_openai_sim/src/panda_openai_sim/envs/__init__.py
index c4fd834..980216f 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/__init__.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/__init__.py
@@ -1,4 +1,5 @@
-﻿"""Register Panda openai_ros gym environments.
+﻿"""Module that contains the Panda openai :gym:`gym <>` task, robot and Goal
+environments.
 """
 
 # Python 2 and 3:
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/__init__.py b/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/__init__.py
index 806a94e..594a3c7 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/__init__.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/__init__.py
@@ -1,4 +1,8 @@
-﻿# Python 2 and 3:
+﻿"""This module contains the Robot Environment classes that are responsible for the
+interaction with the Panda Robot (Control and Sensors).
+"""
+
+# Python 2 and 3:
 # To make Py2 code safer (more like Py3) by preventing
 # implicit relative imports, you can also add this to the top:
 from __future__ import absolute_import
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/panda_robot_env.py b/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/panda_robot_env.py
index f65cab6..25f1f0e 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/panda_robot_env.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/robot_envs/panda_robot_env.py
@@ -1,5 +1,5 @@
-"""This robot environment contains all the functions which are responsible
-for interaction with the robot (Control and Sensors).
+"""This module contains the Robot environment class which is responsible
+for interaction with the Panda robot (Control and Sensors).
 """
 
 # Main python imports
@@ -136,19 +136,21 @@ def __init__(
         Parameters
         ----------
         robot_EE_link : str, optional
-            Robot end effector link name, by default "panda_hand".
+            Robot end effector link name, by default panda_hand.
         robot_arm_control_type : str, optional
             The type of control you want to use for the robot arm. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_position_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_position_control.
         robot_hand_control_type : str, optional
             The type of control you want to use for the robot hand. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_position_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_position_control.
         robot_name_space : str, optional
-            Robot namespace, by default "".
+            Robot namespace, by default ``""``.
         reset_robot_pose : bool
             Boolean specifying whether to reset the robot pose when the simulation is
             reset.
@@ -157,7 +159,7 @@ def __init__(
             is reset.
         reset_control_list : list, optional
             List containing the robot controllers you want to reset each time
-            the simulation is reset, by default [].
+            the simulation is reset, by default ``[]``.
         """
 
         # Environment initiation message
@@ -578,7 +580,7 @@ def _check_all_systems_ready(self):
 
         Returns
         -------
-        Boolean
+        bool
             Boolean specifying whether reset was successful.
         """
         self._check_all_sensors_ready()
@@ -694,7 +696,8 @@ def set_ee_pose(self, ee_pose):
 
         Parameters
         ----------
-        ee_pose :  geometry_msgs.msg.Pose(Stamped), panda_openai_sim.msg.SetEePoseRequest,
+        ee_pose :  geometry_msgs.msg.Pose, panda_openai_sim.msg.SetEePoseRequest
+            The end effector pose.
         dict, list, tuple, int, float or numpy.ndarray
             A list or pose message containing the end effector position (x, y, z)
             and orientation (x, y, z, w).
@@ -889,8 +892,7 @@ def set_joint_positions(self, joint_positions, wait=None):
 
         Parameters
         ----------
-        joint_positions : panda_openai_sim.msg.SetJointPositionsRequest, dict, list,
-        tuple, int, float or numpy.ndarray
+        joint_positions : SetJointPositionsRequest, dict, list, float or numpy.ndarray
             The joint positions of each of the robot joints.
         wait : bool, optional
             Wait till the control has finished, by default False
@@ -1229,9 +1231,8 @@ def set_joint_efforts(self, joint_efforts, wait=None):
 
         Parameters
         ----------
-        joint_efforts : panda_openai_sim.SetJointEffortsRequest, dict, list, tuple, int,
-        float or numpy.ndarray
-            The joint efforts of each of the robot joints.
+        joint_efforts : SetJointEffortsRequest, dict, list, float or numpy.ndarray
+            The joint efforts of each of the robot joints. Can also be supplied as
         wait : bool, optional
             Wait till the control has finished, by default False
 
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/robot_gazebo_goal_env.py b/panda_openai_sim/src/panda_openai_sim/envs/robot_gazebo_goal_env.py
index 5e3d857..4dc051c 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/robot_gazebo_goal_env.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/robot_gazebo_goal_env.py
@@ -1,7 +1,5 @@
-"""This Robot Gazebo Goal environment class is responsible for creating the link
-between the Gazebo simulator and the openAI package. It is different from the
-RobotGazebo env as here the gym.GoalEnv is used instead of the gym.Env. This is done
-since the goal of the robot task now is changing with each episode.
+"""This module contains the Robot Gazebo Goal environment class that is responsible
+for creating the link between the Gazebo simulator and the openAI package.
 """
 
 # Main python imports
@@ -102,7 +100,7 @@ def __init__(
         reset_controls : bool
             Boolean specifying whether to reset the controllers when the simulation
             is reset.
-        reset_control_list : np.array
+        reset_control_list : numpy.ndarray
             Names of the controllers of the robot.
         """
 
@@ -302,7 +300,7 @@ def _spawn_object(self, object_name, model_name, pose=None):
         model_name : str
             The model type (The name of the xml file you want to use).
         pose : geometry_msgs.msg.Pose, optional
-            The pose of the model, by default :class:`geometry_msgs.msg.Pose`.
+            The pose of the model, by default :py:class:`geometry_msgs.msg.Pose`.
 
         Returns
         -------
@@ -426,7 +424,8 @@ def _set_model_state(self, object_name, pose, twist=Twist()):
         pose : geometry_msgs.Pose
             The pose you want the object to have.
         twist : geometry_msgs.Twist, optional
-            The twist you want the object to have, by default Twist().
+            The twist you want the object to have, by default
+            :geometry_msgs:`Twist()<html/msg/Twist.html>`.
 
         Returns
         -------
@@ -476,7 +475,7 @@ def _publish_reward_topic(self, reward, episode_number=1):
 
         Parameters
         ----------
-        reward : np.float32
+        reward : :obj:`numpy.float32`
             The episode reward.
         episode_number : int, optional
             The episode number, by default 1.
@@ -607,10 +606,11 @@ def _is_success(self, achieved_goal, desired_goal):
         Returns
         -------
         bool
-            Bool specifying whether the episode is done (e.i. distance to the goal is
+            Boolean specifying whether the episode is done (e.i. distance to the goal is
             within the distance threshold, robot has fallen etc.).
 
-        .. note::
+        Note
+        -----------
             The reason the name was changed to `_is_done` is twofold. First, with the
             use of an additional wrapper function, this makes the environment compatible
             with both packages that expect the openai :py:mod:`gym` package structure as
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/__init__.py b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/__init__.py
index c66dd9b..e5ea468 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/__init__.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/__init__.py
@@ -1,4 +1,8 @@
-﻿# Python 2 and 3:
+﻿"""This module contains several different Task environments on which a RL algorithms can
+be trained.
+"""
+
+# Python 2 and 3:
 # To make Py2 code safer (more like Py3) by preventing
 # implicit relative imports, you can also add this to the top:
 from __future__ import absolute_import
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_pick_and_place_env.py b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_pick_and_place_env.py
index 0e13108..71620d7 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_pick_and_place_env.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_pick_and_place_env.py
@@ -34,30 +34,30 @@ def __init__(
         Parameters
         ----------
         reward_type : str, optional
-            The reward type, i.e. sparse or dense, by default "sparse".
+            The reward type, i.e. ``sparse`` or ``dense``, by default sparse.
         distance_threshold : float, optional
             The threshold after which a goal is considered achieved, by default 0.05.
         target_bounds : dict, optional
             A dictionary with the bounds from which the target is sampled i.e.
-            {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         target_sampling_strategy : str, optional
             Whether the target bounds from which we sample the target goal position are
             relative to the global frame 'global' or relative to the current
             end-effector pose 'local', by default None.
         init_pose : dict, optional
             A dictionary of names and values that define the initial configuration i.e.
-            {x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}, by
+            ``{x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}``, by
             default None.
         init_pose_bounds : dict, optional
             A dictionary with the bounds from which the initial robot pose is sampled
-            i.e. {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            i.e. ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         init_obj_pose : dict, optional
             A dictionary that contains the initial object pose i.e.
-            {x, y, z, rx, ry, rz, rw}, by default None. The object will be spawned
+            ``{x, y, z, rx, ry, rz, rw}``, by default None. The object will be spawned
             relative to this pose in a region defined by the obj_bounds.
         obj_bounds : dict, optional
             A dictionary in which the bounds for sampling the object
-            positions is defined {x_min, y_min, x_max, y_max}, by default None. This
+            positions is defined ``{x_min, y_min, x_max, y_max}``, by default None. This
             bounds are relative to the init_obj_pose.
         use_gripper_width : bool, optional
             Whether you want to use the gripper_width instead of the individual panda
@@ -65,19 +65,23 @@ def __init__(
             using the gripper width reduces the action space by 1.
         robot_arm_control_type : str, optional
             The type of control you want to use for the robot arm. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_trajectory_control.
         robot_hand_control_type : str, optional
             The type of control you want to use for the robot hand. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_trajectory_control.
 
-        .. note::
+        Note
+        -----------
             If the default value for a argument is set to None this means the default
             values of the parent class
-            :py:class:`panda_openai_sim.envs.task_envs.PandaTaskEnv` are used.
+            :class:`panda_openai_sim.envs.task_envs.panda_task_env.PandaTaskEnv` are
+            used.
         """
 
         # Initiate main Panda Task environment
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_push_env.py b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_push_env.py
index 5002931..034751d 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_push_env.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_push_env.py
@@ -34,46 +34,50 @@ def __init__(
         Parameters
         ----------
         reward_type : str, optional
-            The reward type, i.e. sparse or dense, by default "sparse".
+            The reward type, i.e. ``sparse`` or ``dense``, by default sparse.
         distance_threshold : float, optional
             The threshold after which a goal is considered achieved, by default 0.05.
         target_bounds : dict, optional
             A dictionary with the bounds from which the target is sampled i.e.
-            {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         target_sampling_strategy : str, optional
             Whether the target bounds from which we sample the target goal position are
             relative to the global frame 'global' or relative to the current
             end-effector pose 'local', by default None.
         init_pose : dict, optional
             A dictionary of names and values that define the initial configuration i.e.
-            {x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}, by
+            ``{x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}``, by
             default None.
         init_pose_bounds : dict, optional
             A dictionary with the bounds from which the initial robot pose is sampled
-            i.e. {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            i.e. ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         init_obj_pose : dict, optional
             A dictionary that contains the initial object pose i.e.
-            {x, y, z, rx, ry, rz, rw}, by default None. The object will be spawned
+            ``{x, y, z, rx, ry, rz, rw}``, by default None. The object will be spawned
             relative to this pose in a region defined by the obj_bounds.
         obj_bounds : dict, optional
             A dictionary in which the bounds for sampling the object
-            positions is defined {x_min, y_min, x_max, y_max}, by default None. This
+            positions is defined ``{x_min, y_min, x_max, y_max}``, by default None. This
             bounds are relative to the init_obj_pose.
         robot_arm_control_type : str, optional
             The type of control you want to use for the robot arm. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            ``joint_trajectory_control``.
         robot_hand_control_type : str, optional
             The type of control you want to use for the robot hand. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            ``joint_trajectory_control``.
 
-        .. note::
+        Note
+        -----------
             If the default value for a argument is set to None this means the default
             values of the parent class
-            :py:class:`panda_openai_sim.envs.task_envs.PandaTaskEnv` are used.
+            :class:`panda_openai_sim.envs.task_envs.panda_task_env.PandaTaskEnv` are
+            used.
         """
         # Initiate main Panda Task environment
         super(PandaPushEnv, self).__init__(
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_reach_env.py b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_reach_env.py
index 4674c98..8bd7e72 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_reach_env.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_reach_env.py
@@ -34,30 +34,30 @@ def __init__(
         Parameters
         ----------
         reward_type : str, optional
-            The reward type, i.e. sparse or dense, by default "sparse".
+            The reward type, i.e. ``sparse`` or ``dense``, by default sparse.
         distance_threshold : float, optional
             The threshold after which a goal is considered achieved, by default 0.05.
         target_bounds : dict, optional
             A dictionary with the bounds from which the target is sampled i.e.
-            {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         target_sampling_strategy : str, optional
             Whether the target bounds from which we sample the target goal position are
             relative to the global frame 'global' or relative to the current
             end-effector pose 'local', by default None.
         init_pose : dict, optional
             A dictionary of names and values that define the initial configuration i.e.
-            {x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}, by
+            ``{x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}``, by
             default None.
         init_pose_bounds : dict, optional
             A dictionary with the bounds from which the initial robot pose is sampled
-            i.e. {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            i.e. ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         init_obj_pose : dict, optional
             A dictionary that contains the initial object pose i.e.
             {x, y, z, rx, ry, rz, rw}, by default None. The object will be spawned
             relative to this pose in a region defined by the obj_bounds.
         obj_bounds : dict, optional
             A dictionary in which the bounds for sampling the object
-            positions is defined {x_min, y_min, x_max, y_max}, by default None. This
+            positions is defined ``{x_min, y_min, x_max, y_max}``, by default None. This
             bounds are relative to the init_obj_pose.
         use_gripper_width : bool, optional
             Whether you want to use the gripper_width instead of the individual panda
@@ -65,19 +65,23 @@ def __init__(
             using the gripper width reduces the action space by 1.
         robot_arm_control_type : str, optional
             The type of control you want to use for the robot arm. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_trajectory_control.
         robot_hand_control_type : str, optional
             The type of control you want to use for the robot hand. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_trajectory_control.
 
-        .. note::
+        Note
+        -----------
             If the default value for a argument is set to None this means the default
             values of the parent class
-            :py:class:`panda_openai_sim.envs.task_envs.PandaTaskEnv` are used.
+            :class:`panda_openai_sim.envs.task_envs.panda_task_env.PandaTaskEnv` are
+            used.
         """
 
         # Initiate main Panda Task environment
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_slide_env.py b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_slide_env.py
index 3dec11d..5501a3f 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_slide_env.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_slide_env.py
@@ -33,46 +33,50 @@ def __init__(
         Parameters
         ----------
         reward_type : str, optional
-            The reward type, i.e. sparse or dense, by default "sparse".
+            The reward type, i.e. ``sparse`` or ``dense``, by default sparse.
         distance_threshold : float, optional
             The threshold after which a goal is considered achieved, by default 0.05.
         target_bounds : dict, optional
             A dictionary with the bounds from which the target is sampled i.e.
-            {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         target_sampling_strategy : str, optional
             Whether the target bounds from which we sample the target goal position are
             relative to the global frame 'global' or relative to the current
             end-effector pose 'local', by default None.
         init_pose : dict, optional
             A dictionary of names and values that define the initial configuration i.e.
-            {x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}, by
+            ``{x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}``, by
             default None.
         init_pose_bounds : dict, optional
             A dictionary with the bounds from which the initial robot pose is sampled
-            i.e. {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            i.e. ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         init_obj_pose : dict, optional
             A dictionary that contains the initial object pose i.e.
-            {x, y, z, rx, ry, rz, rw}, by default None. The object will be spawned
+            ``{x, y, z, rx, ry, rz, rw}``, by default None. The object will be spawned
             relative to this pose in a region defined by the obj_bounds.
         obj_bounds : dict, optional
             A dictionary in which the bounds for sampling the object
-            positions is defined {x_min, y_min, x_max, y_max}, by default None. This
+            positions is defined ``{x_min, y_min, x_max, y_max}``, by default None. This
             bounds are relative to the init_obj_pose.
         robot_arm_control_type : str, optional
             The type of control you want to use for the robot arm. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_trajectory_control.
         robot_hand_control_type : str, optional
             The type of control you want to use for the robot hand. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default 'joint_trajectory_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default
+            joint_trajectory_control.
 
-        .. note::
+        Note
+        -----------
             If the default value for a argument is set to None this means the default
             values of the parent class
-            :py:class:`panda_openai_sim.envs.task_envs.PandaTaskEnv` are used.
+            :class:`panda_openai_sim.envs.task_envs.panda_task_env.PandaTaskEnv` are
+            used.
         """
 
         # Initiate main Panda Task environment
diff --git a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_task_env.py b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_task_env.py
index 9baf8db..817c457 100644
--- a/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_task_env.py
+++ b/panda_openai_sim/src/panda_openai_sim/envs/task_envs/panda_task_env.py
@@ -152,7 +152,7 @@ def __init__(
         Parameters
         ----------
         reward_type : str, optional
-            The reward type, i.e. sparse or dense, by default None.
+            The reward type, i.e. ``sparse`` or ``dense``, by default None.
         distance_threshold : float, optional
             The threshold after which a goal is considered achieved, by default None.
         has_object : bool, optional
@@ -170,26 +170,26 @@ def __init__(
             default None.
         target_bounds : dict, optional
             A dictionary with the bounds from which the target is sampled i.e.
-            {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         target_sampling_strategy : str, optional
             Whether the target bounds from which we sample the target goal position are
             relative to the global frame 'global' or relative to the current
             end-effector pose 'local', by default None.
         init_pose : dict, optional
             A dictionary of names and values that define the initial configuration i.e.
-            {x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}, by
+            ``{x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}``, by
             default None.
         init_pose_bounds : dict, optional
             A dictionary with the bounds from which the initial robot pose is sampled
-            i.e. {x_min, y_min, z_min, x_max, y_max, x_max}, by default None.
+            i.e. ``{x_min, y_min, z_min, x_max, y_max, x_max}``, by default None.
         init_obj_pose : dict, optional
             A dictionary that contains the initial object pose i.e.
-            {x, y, z, rx, ry, rz, rw}, by default None. The object will be spawned
+            ``{x, y, z, rx, ry, rz, rw}``, by default None. The object will be spawned
             relative to this pose in a region defined by the obj_bounds.
         obj_bounds : dict, optional
             A dictionary in which the bounds for sampling the object
-            positions is defined {x_min, y_min, x_max, y_max}, by default None. These
-            bounds are relative to the init_obj_pose.
+            positions is defined ``{x_min, y_min, x_max, y_max}``, by default None.
+            These bounds are relative to the init_obj_pose.
         n_actions : int, optional
             The size of the action space you want to use. When the 'action_space_joints'
             variable is supplied this variable is ignored and the length of the
@@ -206,14 +206,20 @@ def __init__(
             None.
         robot_arm_control_type : str, optional
             The type of control you want to use for the robot arm. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default None.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default None.
         robot_hand_control_type : str, optional
             The type of control you want to use for the robot hand. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or 'ee_control'
-            , by default None.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``, by default None.
+
+        Note
+        -------
+            If the default value for a argument is set to None this means the default
+            values are used. These values can be set in the
+            :env_config:`env_config.yaml <>` file.
         """
 
         # Log message
@@ -546,11 +552,11 @@ def _compute_reward(self, observations, done):
         observations : dict
             Dictionary containing the observations
         done : bool
-            Bool specifying whether an episode is terminated.
+            Boolean specifying whether an episode is terminated.
 
         Returns
         -------
-        np.float32
+        :obj:`numpy.float32`
             Reward that is received by the agent.
         """
 
@@ -864,7 +870,7 @@ def _is_done(self, observations):
         Returns
         -------
         bool
-            Bool specifying whether the episode is done (e.i. distance to the goal is
+            Boolean specifying whether the episode is done (e.i. distance to the goal is
             within the distance threshold, robot has fallen etc.).
         """
 
@@ -1058,7 +1064,7 @@ def _set_init_pose(self):
 
         Returns
         -------
-        Boolean
+        bool
             Boolean specifying whether reset was successful.
         """
 
@@ -1270,7 +1276,7 @@ def _robot_get_obs(self, data):
 
         Returns
         -------
-        np.array
+        numpy.array
            Robot Positions, Robot Velocities
         """
 
@@ -1289,14 +1295,14 @@ def _goal_distance(self, goal_a, goal_b):
 
         Parameters
         ----------
-        goal_a : np.array
+        goal_a : numpy.ndarray
             List containing a gripper and object pose.
-        goal_b : np.array
+        goal_b : numpy.ndarray
             List containing a gripper and object pose.
 
         Returns
         -------
-        np.float32
+        numpyp.float32
             Perpendicular distance to the goal.
         """
         assert goal_a.shape == goal_b.shape
@@ -1308,9 +1314,9 @@ def _sample_achieved_goal(self, ee_pos, object_pos):
 
         Parameters
         ----------
-        ee_pos : np.array
+        ee_pos : numpy.ndarray
             Gripper position.
-        object_pos : np.array
+        object_pos : numpy.ndarray
             Object position.
 
         Returns
@@ -1802,21 +1808,21 @@ def _process_input_arguments(
             default None.
         target_bounds : dict
             A dictionary with the bounds from which the target is sampled i.e.
-            {x_min, y_min, z_min, x_max, y_max, x_max}.
+            ``{x_min, y_min, z_min, x_max, y_max, x_max}``.
         target_sampling_strategy : str
             Whether the target bounds from which we sample the target goal position are
             relative to the global frame 'global' or relative to the current
             end-effector pose 'local'.
         init_pose : dict
             A dictionary of names and values that define the initial configuration i.e.
-            {x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}, by
+            ``{x, y, z, rx, ry, rz, rw, panda_finger_joint1, panda_fingerjoint_2}``, by
             default None.
         init_pose_bounds : dict
             A dictionary with the bounds from which the initial robot pose is sampled
-            i.e. {x_min, y_min, z_min, x_max, y_max, x_max}.
+            i.e. ``{x_min, y_min, z_min, x_max, y_max, x_max}``.
         init_obj_pose : dict
             A dictionary that contains the initial object pose i.e.
-            {x, y, z, rx, ry, rz, rw}. The object will be spawned
+            ``{x, y, z, rx, ry, rz, rw}``. The object will be spawned
             relative to this pose in a region defined by the obj_bounds.
         obj_bounds : dict
             A dictionary in which the bounds for sampling the object
@@ -1837,14 +1843,14 @@ def _process_input_arguments(
             to be equal to the length of the 'action_space_joints' list.
         robot_arm_control_type : str
             The type of control you want to use for the robot arm. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or
-            'ee_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``.
         robot_hand_control_type : str
             The type of control you want to use for the robot hand. Options are
-            'joint_trajectory_control', 'joint_position_control', 'joint_effort_control'
-            'joint_group_position_control', 'joint_group_effort_control' or
-            'ee_control'.
+            ``joint_trajectory_control``, ``joint_position_control``,
+            ``joint_effort_control``, ``joint_group_position_control``,
+            ``joint_group_effort_control`` or ``ee_control``.
         """
 
         # Overload configuration with input argument if supplied
@@ -1959,8 +1965,9 @@ def _process_input_arguments(
             )
 
     def _validate_action_space_joints(self):
-        """Checks whether the joints in the 'action_space_joints' arguments are valid. If
-        this is not the case a ROS error will be thrown and the node will be shutdown.
+        """Checks whether the joints in the ``action_space_joints`` arguments are valid.
+        If this is not the case a ROS error will be thrown and the node will be
+        shutdown.
         """
 
         # Validate if the number of action_space_joints is equal to the action space
diff --git a/panda_openai_sim/src/panda_openai_sim/errors.py b/panda_openai_sim/src/panda_openai_sim/errors.py
index c881ad2..8df80eb 100644
--- a/panda_openai_sim/src/panda_openai_sim/errors.py
+++ b/panda_openai_sim/src/panda_openai_sim/errors.py
@@ -1,4 +1,4 @@
-"""A number of custom ROS errors that are used in the `panda_openai_sim` package."""
+"""Module containing several custom ROS errors."""
 
 
 # Main python imports
@@ -118,9 +118,10 @@ def arg_keys_error(arg_name, missing_keys=[], extra_keys=[], shutdown=True):
     arg_name : str
         The name of the argument.
     missing_keys : list, optional
-        The dictionary keys that were missing from the input argument, by default [].
+        The dictionary keys that were missing from the input argument, by default
+        ``[]``.
     extra_keys : list
-        The dictionary keys that were present but should not be, by default [].
+        The dictionary keys that were present but should not be, by default ``[]``.
     shutdown : bool, optional
         Whether to shutdown the ROS node after the error has been thrown, by default
         True.
diff --git a/panda_openai_sim/src/panda_openai_sim/exceptions.py b/panda_openai_sim/src/panda_openai_sim/exceptions.py
index 1d685e8..a31dc4e 100644
--- a/panda_openai_sim/src/panda_openai_sim/exceptions.py
+++ b/panda_openai_sim/src/panda_openai_sim/exceptions.py
@@ -1,4 +1,4 @@
-﻿"""A number of custom exceptions that are used in the `panda_openai_sim` package."""
+﻿"""Module containing several custom exceptions."""
 
 # Main python imports
 from __future__ import print_function
@@ -26,9 +26,10 @@ def __init__(self, message="", log_message="", **details):
         Parameters
         ----------
         message : str, optional
-            Exception message specifying whether the exception occurred, by default "".
+            Exception message specifying whether the exception occurred, by default
+            ``""``.
         log_message : str, optional
-            Full log message, by default "".
+            Full log message, by default ``""``.
         details : dict
             Additional dictionary that can be used to supply the user with more details
             about why the exception occurred.
@@ -60,9 +61,10 @@ def __init__(self, message="", log_message="", **details):
         Parameters
         ----------
         message : str, optional
-            Exception message specifying whether the exception occurred, by default "".
+            Exception message specifying whether the exception occurred, by default
+            ``""``.
         log_message : str, optional
-            Full log message, by default "".
+            Full log message, by default ``""``.
         details : dict
             Additional dictionary that can be used to supply the user with more details
             about why the exception occurred.
@@ -94,9 +96,10 @@ def __init__(self, message="", log_message="", **details):
         Parameters
         ----------
         message : str, optional
-            Exception message specifying whether the exception occurred, by default "".
+            Exception message specifying whether the exception occurred, by default
+            ``""``.
         log_message : str, optional
-            Full log message, by default "".
+            Full log message, by default ``""``.
         details : dict
             Additional dictionary that can be used to supply the user with more details
             about why the exception occurred.
@@ -128,9 +131,10 @@ def __init__(self, message="", log_message="", **details):
         Parameters
         ----------
         message : str, optional
-            Exception message specifying whether the exception occurred, by default "".
+            Exception message specifying whether the exception occurred, by default
+            ``""``.
         log_message : str, optional
-            Full log message, by default "".
+            Full log message, by default ``""``.
         details : dict
             Additional dictionary that can be used to supply the user with more details
             about why the exception occurred.
@@ -162,9 +166,10 @@ def __init__(self, message="", log_message="", **details):
         Parameters
         ----------
         message : str, optional
-            Exception message specifying whether the exception occurred, by default "".
+            Exception message specifying whether the exception occurred, by default
+            ``""``.
         log_message : str, optional
-            Full log message, by default "".
+            Full log message, by default ``""``.
         details : dict
             Additional dictionary that can be used to supply the user with more details
             about why the exception occurred.
@@ -196,9 +201,10 @@ def __init__(self, message="", log_message="", **details):
         Parameters
         ----------
         message : str, optional
-            Exception message specifying whether the exception occurred, by default "".
+            Exception message specifying whether the exception occurred, by default
+            ``""``.
         log_message : str, optional
-            Full log message, by default "".
+            Full log message, by default ``""``.
         details : dict
             Additional dictionary that can be used to supply the user with more details
             about why the exception occurred.
@@ -230,9 +236,10 @@ def __init__(self, message="", log_message="", **details):
         Parameters
         ----------
         message : str, optional
-            Exception message specifying whether the exception occurred, by default "".
+            Exception message specifying whether the exception occurred, by default
+            ``""``.
         log_message : str, optional
-            Full log message, by default "".
+            Full log message, by default ``""``.
         details : dict
             Additional dictionary that can be used to supply the user with more details
             about why the exception occurred.
diff --git a/panda_openai_sim/src/panda_openai_sim/extras/__init__.py b/panda_openai_sim/src/panda_openai_sim/extras/__init__.py
index ee25b00..25a7625 100644
--- a/panda_openai_sim/src/panda_openai_sim/extras/__init__.py
+++ b/panda_openai_sim/src/panda_openai_sim/extras/__init__.py
@@ -1,4 +1,8 @@
-﻿# Python 2 and 3:
+﻿"""Module that contains some extra components (classes and functions) that are used for
+creating the Panda openai :gym:`gym <>` environment.
+"""
+
+# Python 2 and 3:
 # To make Py2 code safer (more like Py3) by preventing
 # implicit relative imports, you can also add this to the top:
 from __future__ import absolute_import
diff --git a/panda_openai_sim/src/panda_openai_sim/extras/quaternion.py b/panda_openai_sim/src/panda_openai_sim/extras/quaternion.py
index bae65d0..0de65a0 100644
--- a/panda_openai_sim/src/panda_openai_sim/extras/quaternion.py
+++ b/panda_openai_sim/src/panda_openai_sim/extras/quaternion.py
@@ -34,15 +34,12 @@ class Quaternion(QuaternionParent):
     norm : float
         The quaternion norm.
 
-    Class Methods
-    -------------
-    normalize_quaternion(cls, quaternion):
-        Normalizes a given quaternion.
-    quaternion_norm(cls, Quaternion):
-        Calculates the norm of a quaternion.
-
     Methods
     -------
+    quaternion_norm(cls, Quaternion):
+        Calculates the norm of a quaternion.
+    normalize_quaternion(cls, quaternion):
+        Normalizes a given quaternion.
     normalize():
         Normalize the quaternion.
     """
diff --git a/panda_openai_sim/src/panda_openai_sim/extras/sample_region_marker.py b/panda_openai_sim/src/panda_openai_sim/extras/sample_region_marker.py
index 3265f20..1b49100 100644
--- a/panda_openai_sim/src/panda_openai_sim/extras/sample_region_marker.py
+++ b/panda_openai_sim/src/panda_openai_sim/extras/sample_region_marker.py
@@ -1,8 +1,8 @@
-"""Class used for displaying the a sampling region in rviz. This class overloads "
-the visualization_msgs.msgs.Marker class in order to pre-initialize some of its "
-attributes. It further also adds the ability to specify the marker scale using x, y, z
-max and min values. To visualize the marker you can publish it on the
- 'panda_openai_sim/goal_region' topic.
+"""Class used for displaying the a sampling region in rviz. This class overloads
+the :visualization_msgs:`visualization_msgs.msgs.Marker <html/msg/Marker.html>`
+class in order to pre-initialize some of its attributes. It further also adds the
+ability to specify the marker scale using ``x``, ``y``, ``z`` max and min values. To
+visualize the marker you can publish it on the ``panda_openai_sim/goal_region`` topic.
 """
 
 # Main python 2/3 compatibility imports
@@ -46,7 +46,7 @@ class SampleRegionMarker(Marker):
         The max z position of the marker.
     id : int
         The marker object id.
-    type : string
+    type : str
         The marker type.
     action : float
         The marker message action (add or remove).
@@ -59,15 +59,15 @@ class SampleRegionMarker(Marker):
     lifetime : duration
         The lifetime duration.
     frame_locked : bool
-        Bool specifying whether the marker frame is locked to the world.
+        Boolean specifying whether the marker frame is locked to the world.
     point : geometry_msgs.Point
         The marker points.
-    string : string
+    string : str
         A description.
-    mesh_resource : string
+    mesh_resource : str
         Marker mess location.
     mesh_use_embedded_materials : bool
-        Bool specifying whether we want to use a mesh.
+        Boolean specifying whether we want to use a mesh.
     """
 
     def __init__(
diff --git a/panda_openai_sim/src/panda_openai_sim/extras/target_marker.py b/panda_openai_sim/src/panda_openai_sim/extras/target_marker.py
index 6d02705..18c7a85 100644
--- a/panda_openai_sim/src/panda_openai_sim/extras/target_marker.py
+++ b/panda_openai_sim/src/panda_openai_sim/extras/target_marker.py
@@ -1,6 +1,7 @@
-﻿"""Class used for displaying a gasp target goal marker in rviz. This class overloads the "
-visualization_msgs.msgs.Marker class in order to pre-initialize some of its attributes.
-To visualize the marker you can publish it on the 'panda_openai_sim/current_goal' topic.
+﻿"""Class used for displaying a gasp target goal marker in rviz. This class overloads
+the :visualization_msgs:`visualization_msgs.msgs.Marker <html/msg/Marker.html>`
+class in order to pre-initialize some of its attributes. To visualize the marker you
+can publish it on the ``panda_openai_sim/current_goal`` topic.
 """
 
 # Main python 2/3 compatibility imports
@@ -32,7 +33,7 @@ class TargetMarker(Marker):
     ----------
     id : int
         The marker object id.
-    type : string
+    type : str
         The marker type.
     action : float
         The marker message action (add or remove).
@@ -45,15 +46,15 @@ class TargetMarker(Marker):
     lifetime : duration
         The lifetime duration.
     frame_locked : bool
-        Bool specifying whether the marker frame is locked to the world.
+        Boolean specifying whether the marker frame is locked to the world.
     point : geometry_msgs.Point
         The marker points.
-    string : string
+    string : str
         A description.
-    mesh_resource : string
+    mesh_resource : str
         Marker mess location.
     mesh_use_embedded_materials : bool
-        Bool specifying whether we want to use a mesh.
+        Boolean specifying whether we want to use a mesh.
     """
 
     def __init__(self, *args, **kwds):
diff --git a/panda_openai_sim/src/panda_openai_sim/functions.py b/panda_openai_sim/src/panda_openai_sim/functions.py
index 4812a64..20650f5 100644
--- a/panda_openai_sim/src/panda_openai_sim/functions.py
+++ b/panda_openai_sim/src/panda_openai_sim/functions.py
@@ -1,4 +1,5 @@
-"""Some additional helpfull helper functions for the panda_openai_sim package.
+"""Module containing some additional functions used in the
+:panda_openai_sim:`panda_openai_sim <>` package.
 """
 
 # Main python imports
@@ -17,6 +18,7 @@
 from geometry_msgs.msg import Pose
 from actionlib_msgs.msg import GoalStatusArray
 from trajectory_msgs.msg import JointTrajectoryPoint
+import control_msgs.msg as control_msgs
 
 from panda_openai_sim.msg import FollowJointTrajectoryGoal
 from panda_openai_sim.srv import SetJointPositionsRequest
@@ -57,6 +59,32 @@ def action_dict_2_joint_trajectory_msg(action_dict):
     return goal_msg
 
 
+def panda_action_msg_2_control_msgs_action_msg(panda_action_msg):
+    """Converts a panda_openai_sim FollowJointTrajectoryActionGoal action message
+    into a
+    :control_msgs:`control_msgs/FollowJointTrajectoryGoal
+    <html/action/FollowJointTrajectory.html>` action message.
+
+    Parameters
+    ----------
+    panda_action_msg : panda_openai_sim.msg.FollowJointTrajectoryGoal
+        panda_openai_sim follow joint trajectory goal message.
+
+    Returns
+    -------
+    control_msgs.msg.FollowJointTrajectoryGoal
+        Control_msgs follow joint trajectory goal message
+    """
+
+    # Fill new control_msgs.msg.FollowJointTrajectoryGoal message
+    control_msgs_action_msg = control_msgs.FollowJointTrajectoryGoal()
+    control_msgs_action_msg.trajectory = panda_action_msg.trajectory
+    control_msgs_action_msg.goal_time_tolerance = panda_action_msg.goal_time_tolerance
+    control_msgs_action_msg.goal_tolerance = panda_action_msg.goal_tolerance
+    control_msgs_action_msg.path_tolerance = panda_action_msg.path_tolerance
+    return control_msgs_action_msg
+
+
 def joint_positions_2_follow_joint_trajectory_goal(joint_positions, time_from_start=1):
     """Converts a dictionary of joint_positions into a FollowJointTrajectoryGoal
     msgs.
@@ -104,9 +132,9 @@ def joint_positions_2_follow_joint_trajectory_goal(joint_positions, time_from_st
 
 
 def model_state_msg_2_link_state_dict(link_state_msgs):
-    """Converts the a gazebo_msgs/ModelState message into a panda_state dictionary.
-    Contrary to the original ModelState message, in the model_state dictionary the
-    poses and twists are grouped per link/model.
+    """Converts the a :gazebo_msgs:`gazebo_msgs/ModelState <html/msg/ModelState.html>`
+    message into a panda_state dictionary. Contrary to the original ModelState message,
+    in the model_state dictionary the poses and twists are grouped per link/model.
 
     Parameters
     ----------
@@ -133,8 +161,8 @@ def model_state_msg_2_link_state_dict(link_state_msgs):
 
 
 def controller_list_array_2_dict(controller_list_msgs):
-    """Converts a controller_manager list_controllers message into a
-    controller information dictionary.
+    """Converts a :controller_manager_msgs:`Controller_manager/list_controllers
+    <html/srv/ListControllers.html>` message into a controller information dictionary.
 
     Parameters
     ----------
@@ -158,13 +186,14 @@ def controller_list_array_2_dict(controller_list_msgs):
 
 
 def pose_dict_2_pose_msg(pose_dict):
-    """Create a 'geometry_msgs.msg.Pose' message out of a panda_openai_sim pose dictionary
-    {x, y, z, rx, ry, rz, rw}.
+    """Create a :geometry_msgs:`geometry_msgs.msg.Pose<html/msg/Pose.html>` message out
+    of a panda_openai_sim pose dictionary ``{x, y, z, rx, ry, rz, rw}``.
 
     Parameters
     ----------
     pose_dict : dict
-        Dict containing the object position {x, y, z} and orientation {rx, ry, rz, rw}
+        Dict containing the object position ``{x, y, z}`` and orientation
+        ``{rx, ry, rz, rw}``
     """
 
     # Create pose message out of a panda_openai_sim pose dict
@@ -182,9 +211,8 @@ def pose_dict_2_pose_msg(pose_dict):
 
 
 def pose_msg_2_pose_dict(pose_msg):
-    """Create a panda_openai_sim pose dictionary {x, y, z, rx, ry, rz, rw} out of a
-    'geometry_msgs.msg.Pose' message.
-    .
+    """Create a panda_openai_sim pose dictionary ``{x, y, z, rx, ry, rz, rw}`` out of a
+    :geometry_msgs:`geometry_msgs.msg.Pose<html/msg/Pose.html>` message.
 
     Parameters
     ----------
@@ -240,7 +268,7 @@ def translate_actionclient_result_error_code(actionclient_retval):
 
 
 def translate_gripper_width_2_finger_joint_commands(input_dict):
-    """Translate any 'gripper_width' keys that are present in the action dictionary
+    """Translate any ``gripper_width`` keys that are present in the action dictionary
     into the corresponding finger joint control commands which are used by the
     controllers.
 
@@ -309,7 +337,8 @@ def lower_first_char(string):
     string
         The de-capitalized string.
 
-    .. note::
+    Note
+    ---------
         This function is not the exact opposite of the capitalize function of the
         standard library. For example, capitalize('abC') returns Abc rather than AbC.
     """
@@ -347,8 +376,8 @@ def list_2_human_text(input_list, seperator=",", end_seperator="&"):
     """Function converts a list of values into human readable sentence.
 
     Example:
-        Using this function a list of 4 items '[item1, item2, item3, item4]' becomes
-        'item2, item3 and item4'.
+        Using this function a list of 4 items ``[item1, item2, item3, item4]`` becomes
+        ``item2, item3 and item4``.
 
     Parameters
     ----------
@@ -361,7 +390,7 @@ def list_2_human_text(input_list, seperator=",", end_seperator="&"):
         A human readable string that can be printed.
     """
 
-    # Add spaces around seperators if not present
+    # Add spaces around separators if not present
     seperator = wrap_space_around(seperator)[1:]
     end_seperator = wrap_space_around(end_seperator)
 
@@ -437,15 +466,18 @@ def log_pose_dict(qpose, header="q_pose", level="DEBUG"):
 def split_dict(input_dict, *args):
     """Split a dictionary into smaller dictionaries based on the keys.
 
-    example:
+    Example
+    -----------
+
     .. code-block:: python
+
         split_dict_list = split_dict(input_dict,["first_dict_key1","first_dict_key2"],
         ["second_dict_key1", "second_dict_key2"])
 
     Parameters
     ----------
-    input_dict : [type]
-        [description]
+    input_dict : dict
+        Input dictionary.
     *args : list
         Lists containing the keys you want to have in the successive dictionaries.
 
@@ -465,7 +497,7 @@ def split_dict(input_dict, *args):
 
 def split_bounds_dict(bounds_dict):
     """Splits the bounding region dictionary into two separate bounding dictionaries,
-    one for the ee_pose and one fore the joint_pose.
+    one for the ``ee_pose`` and one fore the ``joint_pose``.
 
     Parameters
     ----------
@@ -504,8 +536,8 @@ def split_bounds_dict(bounds_dict):
 
 
 def split_pose_dict(pose_dict):
-    """Splits a pose dictionary into two separate pose dictionaries, one for the ee_pose
-     and one fore the joint_pose.
+    """Splits a pose dictionary into two separate pose dictionaries, one for the
+    ``ee_pose`` and one fore the ``joint_pose``.
 
     Parameters
     ----------
@@ -574,7 +606,7 @@ def dict_clean(input_dict):
 
 
 def get_unique_list(input_list):
-    """Removes non-unique items from a list
+    """Removes non-unique items from a list.
 
     Parameters
     ----------
@@ -722,7 +754,7 @@ def has_invalid_type(variable, variable_types, items_types=None, depth=0):
 
 def contains_keys(input_dict, required_keys, exclusive=True):
     """Function used to check if a dictionary contains the required keys. If the
-    'required_keys' argument contains a nested list it checks whether at least one of
+    ``required_keys`` argument contains a nested list it checks whether at least one of
     the nested_list elements is present.
 
     Parameters
@@ -814,8 +846,8 @@ def has_invalid_value(variable, valid_values):
 # Other functions ###############################
 #################################################
 def find_gazebo_model_path(model_name, model_folder_path, extension=""):
-    """Finds the path of the sdf or urdf file that belongs to a given 'model_name'.
-    This is done by searching in the 'panda_openai_sim' models folder.
+    """Finds the path of the ``sdf`` or ``urdf`` file that belongs to a given
+    ``model_name``. This is done by searching in the ``model_folder_path`` folder.
 
     Parameters
     ----------
@@ -824,13 +856,14 @@ def find_gazebo_model_path(model_name, model_folder_path, extension=""):
     model_folder_path : str
         The path of the folder that contains the gazebo models.
     extension : str, optional
-        The model path extension, by default "".
+        The model path extension, by default ``""``.
 
     Returns
     -------
     str, str
-        The path where the 'sdf' or 'urdf' model file can be found and the extension of
-        the model file. If not file was found the model file path is returned empty.
+        The path where the ``sdf`` or ``urdf`` model file can be found and the
+        extension of the model file. If not file was found the model file path is
+        returned empty.
     """
 
     # Add dot to extension if needed
@@ -900,7 +933,7 @@ def action_server_exists(topic_name):
     Returns
     -------
     bool
-        Bool specifying whether the action service exists.
+        Boolean specifying whether the action service exists.
     """
 
     # Strip action server specific topics from topic name
diff --git a/panda_openai_sim/test/manual/panda_control_server_test.py b/panda_openai_sim/test/manual/panda_control_server_test.py
index 0fbaf82..1fda982 100644
--- a/panda_openai_sim/test/manual/panda_control_server_test.py
+++ b/panda_openai_sim/test/manual/panda_control_server_test.py
@@ -147,266 +147,267 @@
     # retval = set_hand_joint_effort_srv.call(set_hand_joint_efforts_msg)
     # print(retval.message)
 
-    ######## - TEST SET JOINT POSITIONS - #########
-    #%% /panda_control_server/set_joint_positions test
-
-    # Connect to /panda_control_server/set_joint_positions
-    rospy.logdebug("Connecting to '/panda_control_server/set_joint_positions' service.")
-    rospy.wait_for_service("/panda_control_server/set_joint_positions", timeout=10)
-    set_joint_positions_srv = rospy.ServiceProxy(
-        "/panda_control_server/set_joint_positions", SetJointPositions
-    )
-    rospy.logdebug("Connected to 'panda_control_server/set_joint_positions' service!")
-
-    # Generate joint_efforts msg
-    set_joint_positions_msg = SetJointPositionsRequest()
-    set_joint_positions_msg.joint_names = [
-        "panda_finger_joint1",
-        "panda_finger_joint2",
-        "panda_joint1",
-        "panda_joint2",
-        "panda_joint3",
-        "panda_joint4",
-        "panda_joint5",
-        "panda_joint6",
-        "panda_joint7",
-    ]
-    # set_joint_positions_msg.joint_positions = (1,)
-    # set_joint_positions_msg.joint_positions = [1.5, 2, 4, 5]
-    # set_joint_positions_msg.joint_positions = [
-    #     0.0,
-    #     0.0,
-    #     0.0,
-    #     1.5,
-    #     1.5,
-    #     0.0,
-    #     0.0,
-    #     0.0,
+    # ######## - TEST SET JOINT POSITIONS - #########
+    # #%% /panda_control_server/set_joint_positions test
+
+    # # Connect to /panda_control_server/set_joint_positions
+    # rospy.logdebug("Connecting to '/panda_control_server/set_joint_positions' service.")
+    # rospy.wait_for_service("/panda_control_server/set_joint_positions", timeout=10)
+    # set_joint_positions_srv = rospy.ServiceProxy(
+    #     "/panda_control_server/set_joint_positions", SetJointPositions
+    # )
+    # rospy.logdebug("Connected to 'panda_control_server/set_joint_positions' service!")
+
+    # # Generate joint_efforts msg
+    # set_joint_positions_msg = SetJointPositionsRequest()
+    # set_joint_positions_msg.joint_names = [
+    #     "panda_finger_joint1",
+    #     "panda_finger_joint2",
+    #     "panda_joint1",
+    #     "panda_joint2",
+    #     "panda_joint3",
+    #     "panda_joint4",
+    #     "panda_joint5",
+    #     "panda_joint6",
+    #     "panda_joint7",
     # ]
+    # # set_joint_positions_msg.joint_positions = (1,)
+    # # set_joint_positions_msg.joint_positions = [1.5, 2, 4, 5]
+    # # set_joint_positions_msg.joint_positions = [
+    # #     0.0,
+    # #     0.0,
+    # #     0.0,
+    # #     1.5,
+    # #     1.5,
+    # #     0.0,
+    # #     0.0,
+    # #     0.0,
+    # # ]
+    # # set_joint_positions_msg.joint_positions = [
+    # #     1.5,
+    # #     0.0,
+    # #     1.0,
+    # #     1.5,
+    # #     1.5,
+    # #     1.0,
+    # #     1.0,
+    # #     0.02,
+    # # ]
     # set_joint_positions_msg.joint_positions = [
-    #     1.5,
     #     0.0,
-    #     1.0,
-    #     1.5,
-    #     1.5,
-    #     1.0,
-    #     1.0,
-    #     0.02,
+    #     0.0,
+    #     0.2,
+    #     0.2,
+    #     0.2,
+    #     0.2,
+    #     0.2,
+    #     0.2,
+    #     0.2,
     # ]
-    set_joint_positions_msg.joint_positions = [
-        0.0,
-        0.0,
-        0.2,
-        0.2,
-        0.2,
-        0.2,
-        0.2,
-        0.2,
-        0.2,
-    ]
-    # set_arm_joint_positions_msg.joint_positions = [0.0, 1.5]
-    set_joint_positions_msg.wait = True
-    # set_joint_positions_msg.joint_names = ["panda_finger_joint1", "panda_joint2"]
-    retval = set_joint_positions_srv.call(set_joint_positions_msg)
-    print(retval.message)
-
-    # ######## - TEST SET ARM JOINT POSITIONS - #########
-    # #%% /panda_control_server/panda_arm/set_joint_positions test
-
-    # Connect to /panda_control_server/set_joint_positions
-    rospy.logdebug(
-        "Connecting to '/panda_control_server/panda_arm/set_joint_positions' service."
-    )
-    rospy.wait_for_service(
-        "/panda_control_server/panda_arm/set_joint_positions", timeout=10
-    )
-    set_arm_joint_positions_srv = rospy.ServiceProxy(
-        "/panda_control_server/panda_arm/set_joint_positions", SetJointPositions
-    )
-    rospy.logdebug(
-        "Connected to 'panda_control_server/panda_arm/set_joint_positions' service!"
-    )
+    # # set_arm_joint_positions_msg.joint_positions = [0.0, 1.5]
+    # set_joint_positions_msg.wait = True
+    # # set_joint_positions_msg.joint_names = ["panda_finger_joint1", "panda_joint2"]
+    # retval = set_joint_positions_srv.call(set_joint_positions_msg)
+    # print(retval.message)
 
-    # Generate set_arm_joint_positions_msg
-    set_arm_joint_positions_msg = SetJointPositionsRequest()
-    set_arm_joint_positions_msg.joint_names = ["panda_joint5", "panda_joint6"]
-    set_arm_joint_positions_msg.joint_positions = [1.5, 2]
-    # set_arm_joint_positions_msg.joint_positions = [
-    # 0.0,
-    # 0.0,
-    # 0.0,
-    # 1.5,
-    # 1.5,
-    # 0.0,
-    # 0.0,
-    # ]
-    # set_arm_joint_positions_msg.joint_positions = [
-    #     1.5,
-    #     1.0,
-    #     1.0,
-    #     1.5,
-    #     1.5,
-    #     1.0,
-    #     1.0,
-    # ]
-    # set_arm_joint_positions_msg.joint_positions = [0.0, 1.5]
-    set_arm_joint_positions_msg.wait = True
-    retval = set_arm_joint_positions_srv.call(set_arm_joint_positions_msg)
-    print(retval.message)
-
-    # ######## - TEST SET HAND JOINT POSITIONS - #########
-    # %% /panda_control_server/panda_hand/set_joint_positions test
-    # Connect to /panda_control_server/set_joint_positions
-    rospy.logdebug(
-        "Connecting to '/panda_control_server/panda_hand/set_joint_positions' service."
-    )
-    rospy.wait_for_service(
-        "/panda_control_server/panda_hand/set_joint_positions", timeout=10
-    )
-    set_hand_joint_positions_srv = rospy.ServiceProxy(
-        "/panda_control_server/panda_hand/set_joint_positions", SetJointPositions
-    )
-    rospy.logdebug(
-        "Connected to 'panda_control_server/panda_hand/set_joint_positions' service!"
-    )
+    # # ######## - TEST SET ARM JOINT POSITIONS - #########
+    # # #%% /panda_control_server/panda_arm/set_joint_positions test
 
-    # Generate joint_efforts msg
-    set_hand_joint_positions_msg = SetJointPositionsRequest()
-    set_hand_joint_positions_msg.joint_names = [
-        "panda_finger_joint1",
-        "panda_finger_joint2",
-    ]
-    set_hand_joint_positions_msg.joint_positions = [0.04, 0.04]
-    set_hand_joint_positions_msg.wait = True
-    retval = set_hand_joint_positions_srv.call(set_hand_joint_positions_msg)
-    print(retval.message)
+    # # Connect to /panda_control_server/set_joint_positions
+    # rospy.logdebug(
+    #     "Connecting to '/panda_control_server/panda_arm/set_joint_positions' service."
+    # )
+    # rospy.wait_for_service(
+    #     "/panda_control_server/panda_arm/set_joint_positions", timeout=10
+    # )
+    # set_arm_joint_positions_srv = rospy.ServiceProxy(
+    #     "/panda_control_server/panda_arm/set_joint_positions", SetJointPositions
+    # )
+    # rospy.logdebug(
+    #     "Connected to 'panda_control_server/panda_arm/set_joint_positions' service!"
+    # )
 
-    # # # ######## - TEST SET JOINT TRAJ SERVICE - #########
+    # # Generate set_arm_joint_positions_msg
+    # set_arm_joint_positions_msg = SetJointPositionsRequest()
+    # set_arm_joint_positions_msg.joint_names = ["panda_joint5", "panda_joint6"]
+    # set_arm_joint_positions_msg.joint_positions = [1.5, 2]
+    # # set_arm_joint_positions_msg.joint_positions = [
+    # # 0.0,
+    # # 0.0,
+    # # 0.0,
+    # # 1.5,
+    # # 1.5,
+    # # 0.0,
+    # # 0.0,
+    # # ]
+    # # set_arm_joint_positions_msg.joint_positions = [
+    # #     1.5,
+    # #     1.0,
+    # #     1.0,
+    # #     1.5,
+    # #     1.5,
+    # #     1.0,
+    # #     1.0,
+    # # ]
+    # # set_arm_joint_positions_msg.joint_positions = [0.0, 1.5]
+    # set_arm_joint_positions_msg.wait = True
+    # retval = set_arm_joint_positions_srv.call(set_arm_joint_positions_msg)
+    # print(retval.message)
 
-    # # Create action client
-    # follow_joint_traj_client = actionlib.SimpleActionClient(
-    #     "/panda_control_server/follow_joint_trajectory", FollowJointTrajectoryAction,
+    # # ######## - TEST SET HAND JOINT POSITIONS - #########
+    # # %% /panda_control_server/panda_hand/set_joint_positions test
+    # # Connect to /panda_control_server/set_joint_positions
+    # rospy.logdebug(
+    #     "Connecting to '/panda_control_server/panda_hand/set_joint_positions' service."
+    # )
+    # rospy.wait_for_service(
+    #     "/panda_control_server/panda_hand/set_joint_positions", timeout=10
+    # )
+    # set_hand_joint_positions_srv = rospy.ServiceProxy(
+    #     "/panda_control_server/panda_hand/set_joint_positions", SetJointPositions
+    # )
+    # rospy.logdebug(
+    #     "Connected to 'panda_control_server/panda_hand/set_joint_positions' service!"
     # )
 
-    # # Waits until the action server has started up and started
-    # # listening for goals.
-    # retval = follow_joint_traj_client.wait_for_server(timeout=rospy.Duration(5))
-    # if not retval:
-    #     rospy.logerr("Shutting down")
-    #     sys.exit(0)
-
-    # # Create action client goal
-    # header = Header()
-    # # header.stamp = rospy.get_rostime()
-    # goal = FollowJointTrajectoryGoal()
-    # # goal.trajectory.joint_names = [
-    # #     "panda_joint1",
-    # #     "panda_joint2",
-    # #     "panda_joint3",
-    # #     "panda_joint4",
-    # #     "panda_joint5",
-    # #     "panda_joint6",
-    # #     "panda_joint7",
-    # # ]
-    # point = JointTrajectoryPoint()
-    # point.positions = [
-    #     0.02,
-    #     0.02,
+    # # Generate joint_efforts msg
+    # set_hand_joint_positions_msg = SetJointPositionsRequest()
+    # set_hand_joint_positions_msg.joint_names = [
+    #     "panda_finger_joint1",
+    #     "panda_finger_joint2",
     # ]
-    # point.time_from_start.secs = 1.0
-    # goal.trajectory.points.append(point)
-    # # goal.trajectory.header = header
-    # # goal.goal_time_tolerance.secs = 5
-
-    # # Send goal
-    # follow_joint_traj_client.send_goal(goal)
-    # follow_joint_traj_client.wait_for_result()
-    # result = follow_joint_traj_client.get_result()
-    # print(result)
-
-    # # # ######## - TEST SET ARM JOINT TRAJ SERVICE - #########
-
-    # # Create action client
-    # follow_joint_traj_client = actionlib.SimpleActionClient(
-    #     "/panda_control_server/panda_arm/follow_joint_trajectory",
-    #     FollowJointTrajectoryAction,
-    # )
+    # set_hand_joint_positions_msg.joint_positions = [0.04, 0.04]
+    # set_hand_joint_positions_msg.wait = True
+    # retval = set_hand_joint_positions_srv.call(set_hand_joint_positions_msg)
+    # print(retval.message)
 
-    # # Waits until the action server has started up and started
-    # # listening for goals.
-    # retval = follow_joint_traj_client.wait_for_server(timeout=rospy.Duration(5))
-    # if not retval:
-    #     rospy.logerr("Shutting down")
-    #     sys.exit(0)
-
-    # # Create action client goal
-    # header = Header()
-    # # header.stamp = rospy.get_rostime()
-    # goal = FollowJointTrajectoryGoal()
-    # # goal.trajectory.joint_names = [
-    # #     "panda_joint1",
-    # #     "panda_joint2",
-    # #     "panda_joint3",
-    # #     "panda_joint4",
-    # #     "panda_joint5",
-    # #     "panda_joint6",
-    # #     "panda_joint7",
-    # # ]
-    # point = JointTrajectoryPoint()
-    # point.positions = [
-    #     0.02,
-    #     0.02,
+    # # ######## - TEST SET JOINT TRAJ SERVICE - #########
+
+    # Create action client
+    follow_joint_traj_client = actionlib.SimpleActionClient(
+        "/panda_control_server/follow_joint_trajectory", FollowJointTrajectoryAction,
+    )
+
+    # Waits until the action server has started up and started
+    # listening for goals.
+    retval = follow_joint_traj_client.wait_for_server(timeout=rospy.Duration(5))
+    if not retval:
+        rospy.logerr("Shutting down")
+        sys.exit(0)
+
+    # Create action client goal
+    header = Header()
+    # header.stamp = rospy.get_rostime()
+    goal = FollowJointTrajectoryGoal()
+    # goal.trajectory.joint_names = [
+    #     "panda_joint1",
+    #     "panda_joint2",
+    #     "panda_joint3",
+    #     "panda_joint4",
+    #     "panda_joint5",
+    #     "panda_joint6",
+    #     "panda_joint7",
     # ]
-    # point.time_from_start.secs = 1.0
-    # goal.trajectory.points.append(point)
-    # # goal.trajectory.header = header
-    # # goal.goal_time_tolerance.secs = 5
-
-    # # Send goal
-    # follow_joint_traj_client.send_goal(goal)
-    # follow_joint_traj_client.wait_for_result()
-    # result = follow_joint_traj_client.get_result()
-    # print(result)
-
-    # # # ######## - TEST SET HAND JOINT TRAJ SERVICE - #########
-
-    # # Create action client
-    # follow_joint_traj_client = actionlib.SimpleActionClient(
-    #     "/panda_control_server/panda_hand/follow_joint_trajectory", FollowJointTrajectoryAction,
-    # )
+    point = JointTrajectoryPoint()
+    point.positions = [
+        0.02,
+        0.02,
+    ]
+    point.time_from_start.secs = 1.0
+    goal.trajectory.points.append(point)
+    # goal.trajectory.header = header
+    # goal.goal_time_tolerance.secs = 5
+
+    # Send goal
+    follow_joint_traj_client.send_goal(goal)
+    follow_joint_traj_client.wait_for_result()
+    result = follow_joint_traj_client.get_result()
+    print(result)
+
+    # # ######## - TEST SET ARM JOINT TRAJ SERVICE - #########
+
+    # Create action client
+    follow_joint_traj_client = actionlib.SimpleActionClient(
+        "/panda_control_server/panda_arm/follow_joint_trajectory",
+        FollowJointTrajectoryAction,
+    )
 
-    # # Waits until the action server has started up and started
-    # # listening for goals.
-    # retval = follow_joint_traj_client.wait_for_server(timeout=rospy.Duration(5))
-    # if not retval:
-    #     rospy.logerr("Shutting down")
-    #     sys.exit(0)
-
-    # # Create action client goal
-    # header = Header()
-    # # header.stamp = rospy.get_rostime()
-    # goal = FollowJointTrajectoryGoal()
-    # # goal.trajectory.joint_names = [
-    # #     "panda_joint1",
-    # #     "panda_joint2",
-    # #     "panda_joint3",
-    # #     "panda_joint4",
-    # #     "panda_joint5",
-    # #     "panda_joint6",
-    # #     "panda_joint7",
-    # # ]
-    # point = JointTrajectoryPoint()
-    # point.positions = [
-    #     0.02,
-    #     0.02,
+    # Waits until the action server has started up and started
+    # listening for goals.
+    retval = follow_joint_traj_client.wait_for_server(timeout=rospy.Duration(5))
+    if not retval:
+        rospy.logerr("Shutting down")
+        sys.exit(0)
+
+    # Create action client goal
+    header = Header()
+    # header.stamp = rospy.get_rostime()
+    goal = FollowJointTrajectoryGoal()
+    # goal.trajectory.joint_names = [
+    #     "panda_joint1",
+    #     "panda_joint2",
+    #     "panda_joint3",
+    #     "panda_joint4",
+    #     "panda_joint5",
+    #     "panda_joint6",
+    #     "panda_joint7",
     # ]
-    # point.time_from_start.secs = 1.0
-    # goal.trajectory.points.append(point)
-    # # goal.trajectory.header = header
-    # # goal.goal_time_tolerance.secs = 5
-
-    # # Send goal
-    # follow_joint_traj_client.send_goal(goal)
-    # follow_joint_traj_client.wait_for_result()
-    # result = follow_joint_traj_client.get_result()
-    # print(result)
+    point = JointTrajectoryPoint()
+    point.positions = [
+        0.02,
+        0.02,
+    ]
+    point.time_from_start.secs = 1.0
+    goal.trajectory.points.append(point)
+    # goal.trajectory.header = header
+    # goal.goal_time_tolerance.secs = 5
+
+    # Send goal
+    follow_joint_traj_client.send_goal(goal)
+    follow_joint_traj_client.wait_for_result()
+    result = follow_joint_traj_client.get_result()
+    print(result)
+
+    # # ######## - TEST SET HAND JOINT TRAJ SERVICE - #########
+
+    # Create action client
+    follow_joint_traj_client = actionlib.SimpleActionClient(
+        "/panda_control_server/panda_hand/follow_joint_trajectory",
+        FollowJointTrajectoryAction,
+    )
+
+    # Waits until the action server has started up and started
+    # listening for goals.
+    retval = follow_joint_traj_client.wait_for_server(timeout=rospy.Duration(5))
+    if not retval:
+        rospy.logerr("Shutting down")
+        sys.exit(0)
+
+    # Create action client goal
+    header = Header()
+    # header.stamp = rospy.get_rostime()
+    goal = FollowJointTrajectoryGoal()
+    # goal.trajectory.joint_names = [
+    #     "panda_joint1",
+    #     "panda_joint2",
+    #     "panda_joint3",
+    #     "panda_joint4",
+    #     "panda_joint5",
+    #     "panda_joint6",
+    #     "panda_joint7",
+    # ]
+    point = JointTrajectoryPoint()
+    point.positions = [
+        0.02,
+        0.02,
+    ]
+    point.time_from_start.secs = 1.0
+    goal.trajectory.points.append(point)
+    # goal.trajectory.header = header
+    # goal.goal_time_tolerance.secs = 5
+
+    # Send goal
+    follow_joint_traj_client.send_goal(goal)
+    follow_joint_traj_client.wait_for_result()
+    result = follow_joint_traj_client.get_result()
+    print(result)
diff --git a/panda_training.egg-info/PKG-INFO b/panda_training.egg-info/PKG-INFO
new file mode 100644
index 0000000..cc9fc1b
--- /dev/null
+++ b/panda_training.egg-info/PKG-INFO
@@ -0,0 +1,50 @@
+Metadata-Version: 2.1
+Name: panda-training
+Version: 0.0.0
+Summary: A python package with several example RL training scripts for thepanda_openai_sim package.
+Home-page: https://github.com/rickstaa/panda_openai_sim
+Author: Rick Staa
+Author-email: rick.staa@outlook.com
+License: Rick Staa copyright
+Description: # panda_openai_sim workspace
+        
+        [![GitHub release (latest by date)](https://img.shields.io/github/v/release/rickstaa/panda_openai_sim)](https://github.com/rickstaa/panda_openai_sim/releases)
+        [![Python 3](https://img.shields.io/badge/python%203-3.7%20%7C%203.6%20%7C%203.5-yellow.svg)](https://www.python.org/)
+        [![Python 2](https://img.shields.io/badge/python%202-2.7%20%7C%202.6%20%7C%202.5-brightgreen.svg)](https://www.python.org/)
+        [![ROS versions](https://img.shields.io/badge/ROS%20versions-Melodic-brightgreen)](https://wiki.ros.org)
+        
+        This repository contains the workspace for the `panda_openai_sim` ROS package. It
+        includes all the components (submodules and code) to create a create a
+        Openai gym environment for the Panda Emika Franka robot. This workspace consists of two
+        main components the `panda_openai_sim` package and the `panda_training` package. The
+        the first package (`panda_openai_sim`) contains a simulated version of the Panda robot
+        together with a Panda gym environment that can be used to train RL algorithms as is done
+        with the original [openai_gym robotics environments](https://gym.openai.com/envs/#robotics).
+        The second package (`panda_training`) contains several examples of RL training scripts
+        that can be used together with the simulation and Openai gym environments of the
+        `panda_openai_sim` package.
+        
+        ## Environments
+        
+        The `panda_openai_sim` package currently contains the following task environments:
+        
+        - **PandaPickAndPlace-v0:** Lift a block into the air.
+        - **PandaPush-v0:** Push a block to a goal position.
+        - **PandaReach-v0:** Move fetch to a goal position.
+        - **PandaSlide-v0:** Slide a puck to a goal position.
+        
+        These environments were based on the original [openai_gym robotics environments](https://gym.openai.com/envs/#robotics).
+        
+        ## Installation and Usage
+        
+        Please see the [docs](https://rickstaa.github.io/panda_openai_sim/) for installation
+        and usage instructions.
+        
+Keywords: ros,rl,panda,openai gym
+Platform: UNKNOWN
+Classifier: Programming Language :: Python :: 3.5
+Classifier: Programming Language :: Python :: 3.6
+Classifier: Programming Language :: Python :: 3.7
+Classifier: Natural Language :: English
+Classifier: Topic :: Scientific/Engineering
+Description-Content-Type: text/markdown
diff --git a/panda_training.egg-info/SOURCES.txt b/panda_training.egg-info/SOURCES.txt
new file mode 100644
index 0000000..6f4144a
--- /dev/null
+++ b/panda_training.egg-info/SOURCES.txt
@@ -0,0 +1,9 @@
+README.md
+setup.cfg
+panda_training/setup.py
+panda_training/version.py
+panda_training.egg-info/PKG-INFO
+panda_training.egg-info/SOURCES.txt
+panda_training.egg-info/dependency_links.txt
+panda_training.egg-info/requires.txt
+panda_training.egg-info/top_level.txt
\ No newline at end of file
diff --git a/panda_training.egg-info/dependency_links.txt b/panda_training.egg-info/dependency_links.txt
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/panda_training.egg-info/dependency_links.txt
@@ -0,0 +1 @@
+
diff --git a/panda_training.egg-info/requires.txt b/panda_training.egg-info/requires.txt
new file mode 100644
index 0000000..e2cc5b6
--- /dev/null
+++ b/panda_training.egg-info/requires.txt
@@ -0,0 +1,4 @@
+tensorflow-gpu==1.15
+numpy
+stable_baselines
+gym
diff --git a/panda_training.egg-info/top_level.txt b/panda_training.egg-info/top_level.txt
new file mode 100644
index 0000000..bdb5a12
--- /dev/null
+++ b/panda_training.egg-info/top_level.txt
@@ -0,0 +1 @@
+panda_training
diff --git a/panda_training/README.md b/panda_training/README.md
index b458a23..42a2617 100644
--- a/panda_training/README.md
+++ b/panda_training/README.md
@@ -7,7 +7,7 @@ with the simulation and Openai gym environments of the `panda_openai_sim` packag
 
 To use the scripts in this package, you first must build the `panda_openai_sim`
 package. Please see the [online docs](https://rickstaa.github.io/panda_openai_sim/) for
-installation and usage instructions. After you have successfully built the `panda_openai_sim` package you can start the Panda Openai simulation using the following ROS command:
+installation and usage instructions. After you have successfully built the `panda_openai_sim` package, you can start the Panda Openai simulation using the following ROS command:
 
 ```bash
 roslaunch panda_openai_sim start.launch
@@ -16,7 +16,7 @@ roslaunch panda_openai_sim start.launch
 After the simulation is running, please install the 'panda_training' python package using
 the `pip install .` command inside the `panda_training` folder. After the python package
 is installed, you can then run any of the example training scripts that are
-contained in the `scripts` folder.
+contained in the `scripts` folder inside a [python3 environment](https://rickstaa.github.io/panda_openai_sim/get_started/install.html#py3-virtual-env).
 
 ## References
 
diff --git a/panda_training/panda_training.egg-info/PKG-INFO b/panda_training/panda_training.egg-info/PKG-INFO
new file mode 100644
index 0000000..5cd6359
--- /dev/null
+++ b/panda_training/panda_training.egg-info/PKG-INFO
@@ -0,0 +1,43 @@
+Metadata-Version: 2.1
+Name: panda-training
+Version: 0.0.0
+Summary: A python package with several example RL training scripts for thepanda_openai_sim package.
+Home-page: https://github.com/rickstaa/panda_openai_sim
+Author: Rick Staa
+Author-email: rick.staa@outlook.com
+License: Rick Staa copyright
+Description: # Panda_training
+        
+        This package contains several examples of RL training scripts that can be used together
+        with the simulation and Openai gym environments of the `panda_openai_sim` package.
+        
+        ## How to use
+        
+        To use the scripts in this package, you first must build the `panda_openai_sim`
+        package. Please see the [online docs](https://rickstaa.github.io/panda_openai_sim/) for
+        installation and usage instructions. After you have successfully built the `panda_openai_sim` package you can start the Panda Openai simulation using the following ROS command:
+        
+        ```bash
+        roslaunch panda_openai_sim start.launch
+        ```
+        
+        After the simulation is running, please install the 'panda_training' python package using
+        the `pip install .` command inside the `panda_training` folder. After the python package
+        is installed, you can then run any of the example training scripts that are
+        contained in the `scripts` folder.
+        
+        ## References
+        
+        The following repositories were used for creating the example training scripts:
+        
+        - [Reinforcement-learning-with-tensorflow](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow)
+        - [Stable-baselines](https://stable-baselines.readthedocs.io/en/master/)
+        
+Keywords: ros,rl,panda,openai gym
+Platform: UNKNOWN
+Classifier: Programming Language :: Python :: 3.5
+Classifier: Programming Language :: Python :: 3.6
+Classifier: Programming Language :: Python :: 3.7
+Classifier: Natural Language :: English
+Classifier: Topic :: Scientific/Engineering
+Description-Content-Type: text/markdown
diff --git a/panda_training/panda_training.egg-info/SOURCES.txt b/panda_training/panda_training.egg-info/SOURCES.txt
new file mode 100644
index 0000000..681fb5f
--- /dev/null
+++ b/panda_training/panda_training.egg-info/SOURCES.txt
@@ -0,0 +1,8 @@
+README.md
+setup.py
+panda_training/functions.py
+panda_training.egg-info/PKG-INFO
+panda_training.egg-info/SOURCES.txt
+panda_training.egg-info/dependency_links.txt
+panda_training.egg-info/requires.txt
+panda_training.egg-info/top_level.txt
\ No newline at end of file
diff --git a/panda_training/panda_training.egg-info/dependency_links.txt b/panda_training/panda_training.egg-info/dependency_links.txt
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/panda_training/panda_training.egg-info/dependency_links.txt
@@ -0,0 +1 @@
+
diff --git a/panda_training/panda_training.egg-info/requires.txt b/panda_training/panda_training.egg-info/requires.txt
new file mode 100644
index 0000000..741716e
--- /dev/null
+++ b/panda_training/panda_training.egg-info/requires.txt
@@ -0,0 +1,5 @@
+tensorflow<=1.15
+numpy
+stable_baselines
+cloudpickle<1.4.0
+gym
diff --git a/panda_training/panda_training.egg-info/top_level.txt b/panda_training/panda_training.egg-info/top_level.txt
new file mode 100644
index 0000000..bdb5a12
--- /dev/null
+++ b/panda_training/panda_training.egg-info/top_level.txt
@@ -0,0 +1 @@
+panda_training
diff --git a/panda_training/panda_training/functions.py b/panda_training/panda_training/functions.py
index 06ee74e..030140b 100644
--- a/panda_training/panda_training/functions.py
+++ b/panda_training/panda_training/functions.py
@@ -1,4 +1,4 @@
-"""Several functions that are used in the 'panda_training' scripts."""
+"""Several functions that are used in the :panda_training:`panda_training <>` scripts."""
 
 # Main python imports
 import time
@@ -79,8 +79,8 @@ def get_unique_file_suffix(folder_path=None, prefix_type="timestamp"):
     folder_path : str, optional
         The folder in which you want to save the model, by default None.
     prefix_type : str, optional
-        The type of suffix you want to use i.e. 'timestamp' or 'number', by default
-        "timestamp".
+        The type of suffix you want to use i.e. ``timestamp`` or ``number``, by default
+        timestamp.
 
     Raises
     ------
@@ -133,7 +133,7 @@ def move_all_files_in_dir(srcDir, dstDir):
 
     Parameters
     ----------
-    srcDir : st
+    srcDir : str
         The path of the source directory.
     dstDir : str
         The path of the destination directory.
diff --git a/panda_training/scripts/morvan_ddpg_panda_train_and_inference.py b/panda_training/scripts/morvan_ddpg_panda_train_and_inference.py
index 33e675c..ec0ba58 100644
--- a/panda_training/scripts/morvan_ddpg_panda_train_and_inference.py
+++ b/panda_training/scripts/morvan_ddpg_panda_train_and_inference.py
@@ -1,4 +1,4 @@
-"""Train a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
+"""Trains a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
 DDPG classes o
 `Morvanzhou <https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/>`_.
 
diff --git a/panda_training/scripts/stable_baselines_ddpg_panda_train_and_inference.py b/panda_training/scripts/stable_baselines_ddpg_panda_train_and_inference.py
index 83e753c..9f8d790 100755
--- a/panda_training/scripts/stable_baselines_ddpg_panda_train_and_inference.py
+++ b/panda_training/scripts/stable_baselines_ddpg_panda_train_and_inference.py
@@ -1,4 +1,4 @@
-"""Train a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
+"""Trains a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
 `stable_baselines <https://stable-baselines.readthedocs.io/en/master/index.html>`_
 `DDPG <https://stable-baselines.readthedocs.io/en/master/modules/ddpg.html>`_ model
 class.
@@ -10,6 +10,15 @@
 import gym
 import numpy as np
 
+from panda_training.functions import (
+    get_unique_file_suffix,
+    backup_model,
+)
+
+# ROS python imports
+import rospy
+
+# Import stable-baselines packages
 # from stable_baselines.ddpg.policies import MlpPolicy
 from stable_baselines.common.noise import (
     # NormalActionNoise,
@@ -17,15 +26,17 @@
     # AdaptiveParamNoiseSpec,
 )
 from stable_baselines.common.callbacks import CheckpointCallback
-from stable_baselines import DDPG  # HER, TD3, SAC
 
-from panda_training.functions import (
-    get_unique_file_suffix,
-    backup_model,
-)
-
-# ROS python imports
-import rospy
+try:
+    from stable_baselines import DDPG  # HER, TD3, SAC
+except ImportError:
+    if __name__ == "__main__":
+        rospy.logwarn(
+            "The stable-baslines DDPG example can not be run the mpi requirements "
+            "are not installed on the system. Please check teh installation "
+            "documentation for more info."
+        )
+        sys.exit(0)
 
 # Import panda openai sim task environments
 import panda_openai_sim.envs
diff --git a/panda_training/scripts/stable_baselines_her_ddpg_panda_train_and_inference.py b/panda_training/scripts/stable_baselines_her_ddpg_panda_train_and_inference.py
index 3ff3a81..2bd5ed0 100755
--- a/panda_training/scripts/stable_baselines_her_ddpg_panda_train_and_inference.py
+++ b/panda_training/scripts/stable_baselines_her_ddpg_panda_train_and_inference.py
@@ -1,4 +1,4 @@
-"""Train a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
+"""Trains a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
 `stable_baselines <https://stable-baselines.readthedocs.io/en/master/index.html>`_
 `DDPG <https://stable-baselines.readthedocs.io/en/master/modules/ddpg.html>`_ model
 class in combination with the
@@ -10,10 +10,6 @@ class in combination with the
 import sys
 import gym
 
-# from stable_baselines.ddpg.policies import MlpPolicy
-from stable_baselines.common.callbacks import CheckpointCallback
-from stable_baselines import DDPG, HER  # SAC, TD3, HER
-
 from panda_training.functions import (
     get_unique_file_suffix,
     backup_model,
@@ -22,6 +18,21 @@ class in combination with the
 # ROS python imports
 import rospy
 
+# Import stable-baselines packages
+# from stable_baselines.ddpg.policies import MlpPolicy
+from stable_baselines.common.callbacks import CheckpointCallback
+
+try:
+    from stable_baselines import DDPG, HER  # HER, TD3, SAC
+except ImportError:
+    if __name__ == "__main__":
+        rospy.logwarn(
+            "The stable-baslines HER example can not be run the mpi requirements "
+            "are not installed on the system. Please check teh installation "
+            "documentation for more info."
+        )
+        sys.exit(0)
+
 # Import panda openai sim task environments
 import panda_openai_sim.envs
 
diff --git a/panda_training/scripts/stable_baselines_sac_panda_train_and_inference.py b/panda_training/scripts/stable_baselines_sac_panda_train_and_inference.py
index 77fcec8..eaddb51 100755
--- a/panda_training/scripts/stable_baselines_sac_panda_train_and_inference.py
+++ b/panda_training/scripts/stable_baselines_sac_panda_train_and_inference.py
@@ -1,4 +1,4 @@
-"""Train a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
+"""Trains a RL algorithm on the Panda robot of the 'panda_openai_sim' package using the
 `stable_baselines <https://stable-baselines.readthedocs.io/en/master/index.html>`_
 `SAC <https://stable-baselines.readthedocs.io/en/master/modules/sac.html>`_ model
 class.
diff --git a/panda_training/setup.py b/panda_training/setup.py
index e413e8e..a006e3f 100644
--- a/panda_training/setup.py
+++ b/panda_training/setup.py
@@ -23,11 +23,16 @@
 relative_site_packages = get_python_lib().split(sys.prefix + os.sep)[1]
 date_files_relative_path = os.path.join(relative_site_packages, "panda_openai_sim")
 
-# General setup.py parameters
-TF_MAX_VERSION = "1.15"  # NOTE: Currently the examples used tensorflow 1.
-
 # Package requirements
-requirements = ["numpy", "stable_baselines", "gym"]
+requirements = [
+    "tensorflow<=1.15",
+    "numpy",
+    "mpi4py",
+    "stable_baselines",
+    "cloudpickle<1.4.0",
+    "gym",
+    "rospkg",
+]
 
 # Set up logger
 logger = logging.getLogger(__name__)
@@ -37,12 +42,28 @@
 #################################################
 # Setup functions ###############################
 #################################################
-def get_tf_dep():
+def get_tf_dep(tf_entry):
     """Check whether or not the Nvidia driver and GPUs are available and add the
-    corresponding Tensorflow dependency."""
+    corresponding Tensorflow dependency.
+
+    Parameters
+    ----------
+    tf_entry : str
+        The original tensorflow package entry.
+
+    Returns
+    -------
+    str
+        The right tensorflow package (GPU or normal)
+    """
 
     # Find the right tensorflow version to install
-    tf_dep = "tensorflow<={}".format(TF_MAX_VERSION)
+    tf_dep = tf_entry
+
+    # Retrieve requested tensorflow version
+    tf_dep_version = re.sub(r"[^=<>.0-9]", "", tf_dep)
+
+    # Check which tensorflow version is required
     try:
         gpus = (
             subprocess.check_output(
@@ -53,7 +74,7 @@ def get_tf_dep():
             .split("\n")[1:]
         )
         if len(gpus) > 0:
-            tf_dep = "tensorflow-gpu<={}".format(TF_MAX_VERSION)
+            tf_dep = "tensorflow-gpu{}".format(tf_dep_version)
         else:
             no_device_msg = (
                 "Found Nvidia device driver but no"
@@ -135,15 +156,27 @@ def finalize_options(self):
         install.finalize_options(self)
 
     def run(self):
-        """Overload the :py:meth:`setuptools.command.install.install.run` method."""
-
-        # Install Tensorflow dependency.
-        if any(["tensorflow" in item for item in self.distribution.install_requires]):
+        """Overload the :py:meth:`setuptools.command.install.install.run` method.
+        To make sure the right version of tensorflow is installed GPU or CPU.
+        """
+
+        # Install the right tensorflow dependency (GPU or CPU)
+        tensorflow_entry = [
+            item
+            for item in self.distribution.install_requires
+            if (re.sub(r"[=<>.0-9]", "", item) in ["tensorflow", "tensorflow-gpu"])
+        ]
+        if tensorflow_entry:  # If tensorflow or tensorflow-gpu is in requirements
             if not self.docker and not self.sing:
-                tf_dep = get_tf_dep()
-                subprocess.Popen(
-                    [sys.executable, "-m", "pip", "install", tf_dep]
-                ).wait()
+
+                # Check if tensorflow GPU or CPU needs to be installed
+                tf_dep = get_tf_dep(tensorflow_entry[0])
+
+                # Change the tensorflow entry to the right version
+                self.distribution.install_requires = [
+                    item if "tensorflow" not in item else tf_dep
+                    for item in self.distribution.install_requires
+                ]
             else:
 
                 # If we're using a Docker or singularity container, the right
@@ -165,17 +198,12 @@ def run(self):
 
 # Get current package version
 # NOTE: We use the version of the panda_openai_sim package.
+# Get current package version
 __version__ = re.sub(
     r"[^\d.]",
     "",
     open(
-        os.path.abspath(
-            os.path.join(
-                os.path.dirname(os.path.realpath(__file__)),
-                "..",
-                "panda_openai_sim/version.py",
-            )
-        )
+        os.path.join(os.path.dirname(os.path.realpath(__file__)), "version.py")
     ).read(),
 )
 
diff --git a/panda_training/version.py b/panda_training/version.py
new file mode 100755
index 0000000..4a89279
--- /dev/null
+++ b/panda_training/version.py
@@ -0,0 +1 @@
+﻿__version__ = "0.0.0"
\ No newline at end of file