## Applied Reinforcement Learning - Tutorial - Panda-Gym
### [Armin Niedermueller](https://github.com/nerovalerius)
### Salzburg University of applied Sciences

### Covered Topics
* Panda-Gym Introduction
    * Franka Panda Robot
    * Setup
* Panda-Gym Theory
    * Panda-Gym Environments
    * Panda-Gym Task
    * Panda-Gym Robot
    * Panda-Gym States
    * Panda-Gym Action-Space
    * Panda-Gym Reward
* Panda-Gym Examples
    * Task 1: Panda Reach
    * Task 2: Panda Pick and Place
    * Task 3: Panda Stack
* Create a custom robot
* Custom a custom environment
* Create a custom task
    * Avoid Obstacles while reaching a target
    * Stack 3 instead of 2 blocks

### Panda-Gym Introduction
Panda-Gym is a reinforcement learning environment for the Franka Emika Panda robot. It is based on the OpenAI Gym framework and provides a set of tasks that can be used to train reinforcement learning agents. The tasks are based on the PyBullet physics engine and can be used to train agents for real-world applications. The environment is designed to be easy to use and extend. It is also possible to create custom robots, environments and tasks.
A detailed documentation can be found [here](https://panda-gym.readthedocs.io/en/latest/) and their paper can be cited as folows:
```
@article{gallouedec2021pandagym,
  title        = {{panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning}},
  author       = {Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
  year         = 2021,
  journal      = {4th Robot Learning Workshop: Self-Supervised and Lifelong Learning at NeurIPS},
}
```

#### Franka Panda Robot
Panda is a collaborative robot with 7 degrees of freedom developed by [FRANKA EMIKA](https://www.franka.de/).
It can be programmed directly with a graphical user interface or with the Robot Operating System 1 & 2 (C++, MoveIt!, Rviz and so on).
The torque sensors on it's 7 seven axes make this robot arm so sensitive, that it even stops at a balloon.
It works at a very high precision as well as stability, which makes it a perfect tool for research and development.

<img src="images/franka_panda.png"  width="35%"> \
Image source: [LINK](https://github.com/nerovalerius/collision_avoidance/blob/master/BAC2_niedermueller.pdf)

I worked with the Panda robot for my bachelor thesis, where i used two 3D stereo cameras to enable collision avoidance for the robot arm. The robot was able to avoid obstacles while reaching a target. The code and results can be found [here](https://github.com/nerovalerius/collision_avoidance) and [here](https://www.youtube.com/watch?v=LQPS--bnvQY)

#### Setup
Before we are able to start programming, we need to prepare our programming environment.
First of all, we create a virtual environment for our undertaking, in order to avoid conflicts with other projects.
We use the conda package manager to create a virtual environment. If you are not familiar with conda, you can find a tutorial [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).

First, download and install miniconda.
```
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && bash Miniconda3-latest-Linux-x86_64.sh -b \
    && rm Miniconda3-latest-Linux-x86_64.sh
```
Follow the instructions on the terminal and also initialize conda for your current shell.
```
conda init bash
```
Now, create a new environment and activate it.
```
conda create -n panda-gym-tutorial python=3.9
```
For panda-gym, there is currently no conda package available. Therefore, we need to install it with pip.
Furthermore, we can install numpngw to store the rendered images as animated png files.

In [None]:
%pip install panda-gym
%pip3 install numpngw

### Panda-Gym Examples
In this section, we will take a look at three tasks that are already implemented in panda-gym.
1. **Panda Reach:** The robot has to reach a target position.
2. **Panda Pick and Place:** The robot has to pick up an object and place it on a target position.
3. **Panda Stack:** The robot has to stack two blocks on top of each other.

#### Task 1: Panda Reach
<img src="images/reach.png"  width="35%"> \

In [1]:
###################################
# Task 1: Panda Reach             #
###################################

import gymnasium as gym
import panda_gym
from tqdm import tqdm
from numpngw import write_apng

env = gym.make("PandaReach-v3", render=True)
env.metadata['render_fps'] = 24
images = []
observation, info = env.reset()
max_steps = 1000

for step in tqdm(range(max_steps)):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    images.append(env.render('rgb_array'))

    if terminated or truncated:
        observation, info = env.reset()

env.close()

print("Saving animation...")
write_apng('images/reach.png', images, delay = 10)

100%|██████████| 1000/1000 [00:13<00:00, 72.01it/s]


Saving animation...


#### Task 2: Panda Pick and Place
<img src="images/pickandplace.png"  width="35%"> \

In [2]:
###################################
# Task 2: Panda Pick and Place    #
###################################

import gymnasium as gym
import panda_gym
from tqdm import tqdm
from numpngw import write_apng

env = gym.make("PandaPickAndPlace-v3", render=True)
env.metadata['render_fps'] = 24
images = []
observation, info = env.reset()
max_steps = 1000

for step in tqdm(range(max_steps)):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    images.append(env.render('rgb_array'))

    if terminated or truncated:
        observation, info = env.reset()

env.close()

print("Saving animation...")
write_apng('images/pickandplace.png', images, delay = 10)

100%|██████████| 1000/1000 [00:13<00:00, 75.00it/s]


Saving animation...


#### Task 3: Panda Stack
<img src="images/stack.png"  width="35%"> \

In [3]:
###################################
# Task 3: Panda Stack             #
###################################

import gymnasium as gym
import panda_gym
from tqdm import tqdm
from numpngw import write_apng

env = gym.make("PandaStack-v3", render=True)
env.metadata['render_fps'] = 24
images = []
observation, info = env.reset()
max_steps = 1000

for step in tqdm(range(max_steps)):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    images.append(env.render('rgb_array'))

    if terminated or truncated:
        observation, info = env.reset()

env.close()

print("Saving animation...")
write_apng('images/stack.png', images, delay = 10)

100%|██████████| 1000/1000 [00:15<00:00, 64.85it/s]


Saving animation...
