## Applied Reinforcement Learning - Tutorial - Panda-Gym
### [Armin Niedermueller](https://github.com/nerovalerius)
### Salzburg University of applied Sciences

### Covered Topics
* Panda-Gym Introduction
    * Franka Panda Robot
    * Setup
* Example Environment - Panda Reach
* Create a custom robot
* Custom a custom environment
* Create a custom task
    * Avoid Obstacles while reaching a target
    * Stack 3 instead of 2 blocks

### Panda-Gym Introduction
Panda-Gym is a reinforcement learning environment for the Franka Emika Panda robot. It is based on the OpenAI Gym framework and provides a set of tasks that can be used to train reinforcement learning agents. The tasks are based on the PyBullet physics engine and can be used to train agents for real-world applications. The environment is designed to be easy to use and extend. It is also possible to create custom robots, environments and tasks.
A detailed documentation can be found [here](https://panda-gym.readthedocs.io/en/latest/) and their paper can be cited as folows:
```
@article{gallouedec2021pandagym,
  title        = {{panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning}},
  author       = {Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
  year         = 2021,
  journal      = {4th Robot Learning Workshop: Self-Supervised and Lifelong Learning at NeurIPS},
}
```

#### Franka Panda Robot
Panda is a collaborative robot with 7 degrees of freedom developed by [FRANKA EMIKA](https://www.franka.de/).
It can be programmed directly with a graphical user interface or with the Robot Operating System 1 & 2 (C++, MoveIt!, Rviz and so on).
The torque sensors on it's 7 seven axes make this robot arm so sensitive, that it even stops at a balloon.
It works at a very high precision as well as stability, which makes it a perfect tool for research and development.

<img src="images/franka_panda.png"  width="35%"> \
Image source: [LINK](https://github.com/nerovalerius/collision_avoidance/blob/master/BAC2_niedermueller.pdf)

I worked with the Panda robot for my bachelor thesis, where i used two 3D stereo cameras to enable collision avoidance for the robot arm. The robot was able to avoid obstacles while reaching a target. The code and results can be found [here](https://github.com/nerovalerius/collision_avoidance) and [here](https://www.youtube.com/watch?v=LQPS--bnvQY)

#### Setup
Before we are able to start programming, we need to prepare our programming environment.
First of all, we create a virtual environment for our undertaking, in order to avoid conflicts with other projects.
We use the conda package manager to create a virtual environment. If you are not familiar with conda, you can find a tutorial [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).

First, download and install miniconda.
```
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && bash Miniconda3-latest-Linux-x86_64.sh -b \
    && rm Miniconda3-latest-Linux-x86_64.sh
```
Follow the instructions on the terminal and also initialize conda for your current shell.
```
conda init bash
```
Now, create a new environment and activate it.
```
conda create -n panda-gym-tutorial python=3.9
```
For panda-gym, there is currently no conda package available. Therefore, we need to install it with pip.
Furthermore, we can install numpngw to store the rendered images as animated png files.

In [None]:
%pip install panda-gym
%pip3 install numpngw

### Example Environment - Panda Reach
Panda-Gym defines for each task a separate environment. Let's take a look at the environment for the **Panda Reach** environment, where the robot has to reach a target position with its end-effector:

```
import numpy as np
from panda_gym.envs.core import RobotTaskEnv
from panda_gym.envs.robots.panda import Panda
from panda_gym.envs.tasks.reach import Reach

class PandaReachEnv(RobotTaskEnv):
    """Reach task wih Panda robot.
    Args:
        render (bool, optional): Activate rendering. Defaults to False.
        reward_type (str, optional): "sparse" or "dense". Defaults to "sparse".
        control_type (str, optional): "ee" to control end-effector position or "joints" to control joint values.
            Defaults to "ee".
    """

    def __init__(self, render: bool = False, reward_type: str = "sparse", control_type: str = "ee") -> None:
        # use PyBullet as simulation backend
        sim = PyBullet(render=render)
        # use Panda robot, define its control type and initial position
        robot = Panda(sim, block_gripper=True, base_position=np.array([-0.6, 0.0, 0.0]), control_type=control_type)
        # use Reach task, define its reward type and initial end-effector position
        task = Reach(sim, reward_type=reward_type, get_ee_position=robot.get_ee_position)
        super().__init__(robot, task)
```

We define PyBullet as our physics engine for robotics. It is used to simulate and render the robot and the environment.

The robot is defined in [/panda_gym/envs/robots/panda](https://github.com/qgallouedec/panda-gym/blob/master/panda_gym/envs/robots/panda.py). \
Here, you can find all necessary physics parameters (such as friction) and functions (such as set_action).

Furthermore, the task is defined as Reach task, defined in [panda_gym/envs/tasks/reach.py](https://github.com/qgallouedec/panda-gym/blob/master/panda_gym/envs/tasks/reach.py).\
Inside this file, the 3D environment (such as the table) is defined and the reward is computed. 

Now, three parameters can be set in the environment:
* **render:** activate or deactivate the rendering of the environment
* **reward_type:**
    * sparse: reward is 1 if the target is reached and 0 otherwise
    * dense: reward is the distance between the target and the end-effector
* **control_type:** actions should either control the robot's:
    * end-effector position
    * or joint values

#### Code and Animation
<img src="images/reach.png"  width="35%"> 

First, we create "PandaReach-v3" environment and set render to True, to see what the robot is learning. Then, we reset the environment, define our number of maximum number of episodes and let the robot take actions provided by our policy. After each step, we check if the episode is done and if so, we reset the environment. Beside the actual reinforcement learning, we also create an animation of the robot's learning process by storing the rendered images inside a png file.

In [None]:
###################################
# Task: Panda Reach               #
###################################

import gymnasium as gym
import panda_gym
from tqdm import tqdm
from numpngw import write_apng

# create environment and activate rendering
env = gym.make("PandaReach-v3", render=True)

# define low frame rate for rendering to reduce computational load
env.metadata['render_fps'] = 24

# array to store images
images = []

# reset environment and get initial observation (either state of the joints or the end effector position)
observation, info = env.reset()

# define maximum number of episodes
max_steps = 1000

# run simulation 
for step in tqdm(range(max_steps)):
    # take action as defined by our policy
    action = env.action_space.sample()
    
    # execute action and get new observation, reward, termination flag and additional info
    observation, reward, terminated, truncated, info = env.step(action)

    # add each image to our array for each step to create an animation afterwards
    images.append(env.render('rgb_array'))

    # when the episode is terminated, reset the environment
    if terminated or truncated:
        observation, info = env.reset()

env.close()

# save animation
print("Saving animation...")
write_apng('images/reach.png', images, delay = 10)
print("finished")

### Create a custom robot
This section follows the [panda-gym tutorial](https://panda-gym.readthedocs.io/en/latest/custom/custom_robot.html) for creating a custom robot. 
#### Create URDF
First, we need to create a URDF file for our robot. We can use the [URDF Modeler](https://mymodelrobot.appspot.com/5629499534213120) to create a URDF file while seeing the robot in 3D.

Video: https://www.theconstructsim.com/ros-projects-my-robotic-manipulator-02-urdf-xacro/
https://www.theconstructsim.com/my-robotic-manipulator-1-basic-urdf-rviz/

oder den hier umbauen: https://automaticaddison.com/how-to-build-a-simulated-robot-arm-using-ros/

### Create a custom task

https://panda-gym.readthedocs.io/en/latest/custom/custom_task.html

#### Original Panda Stack task

#### New task: Stack 3 blocks 
#### New task: avoid obstacles

### Create a custom environment
https://panda-gym.readthedocs.io/en/latest/custom/custom_env.html

### New environment: add obstacles to the environment