# GymGodot CartPole Example

## Godot Environment

Turn a Godot program into a Gym Environment with the following steps :


   - **1)** create a Godot node (called `Env` in this CartPole example) that implements the following methods :

        - **`func apply_action(action : Array) -> void`** : this method receives the action to execute in an array. For instance, it can be a single Int `[2]` if the actions belong to a `Discrete` space or it can be Floats `[2.3, 3.4]` if they belong to a continuous `Box` space. In this CartPole example, we can get two possible discrete actions : `[0]` (go left) or `[1]` (go right)

        - **`func get_observation() -> Array`** : this method should return the current observable states in an array. In this CartPole example, we observe : Cart z-Position, Cart z-Velocity, Pole Angle and Pole Angle Velocity. Thus, we return an array of four floats.

        - **`func get_reward() -> float`** : this method should return the current reward as a float. In this CartPole example, the reward is simple: we earn +1 at each step (we want to encourage the agent to survive as long as possible). In other environments, the reward could be more complex.

        - **`func reset() -> void`** : this function resets the environment and makes it ready for a new episode. This often involves reinitializing the agent position, speed, etc. In this CartPole example, we reset the cart position to the middle of the screen and the pendulum to a vertical position.
        
        - **`func is_done() -> bool`** : this function should return `True` when the episode ends and `False` otherwise. Episode termination can be due to various events : the agent 'dies', goes out of a defined area, has too much error, lost the game, etc. In this CartPole example, the episode ends if the pendulum angle is more than $ \Pi /8 $ rads from the vertical (because then we know it would be hard to keep balance, so better stop now and retry) or if the cart goes out of screen. It can also be used to put a time limit i.e. end the episode after `n` steps altough this can also be done on the Python side instead.
        
        
  - **2)** add the `GymGodot` node (`GymGodot.tscn`) : add `GymGodot.tscn`, `GymGodot.gd` and `WebSocketClient.gd` from `/gym-godot` to your Godot project folder. Then drag & drop the `GymGodot.tscn` node in your scene. The `Environment Node` property of the `GymGodot` node must point to your `Env` node described above.
  
![nodes](./notebook_images/screenshot_godot.png)

The `Step Length` property indicates how many Godot frames are run at each step. The action of the current step will be applied during all those frames. The minimum value for this property is 1, in which case one Godot frame = one step.

When the scene is launched, it will look for the python server. If the server is not found, the program will close. To launch the scene without looking for the server, either delete the `GymGodot` node or disable its `Enabled` property.

In this notebook, we'll use the demo Godot CartPole environment which has all of the above already setup.

## Python / Gym Environment

Once we have a Godot environment ready, we can start writing the Python learning script. First, we import Gym (which can be installed with `pip install gym`) and Numpy :

In [1]:
import gym
from gym import spaces
import numpy as np

The Godot environment will be exposed through the `server-v0` Gym environment provided in the GymGodot repo. This environment communicates over websocket with the `GymGodot` node. 

To install this Gym environment, open a terminal _**inside the GymGodot repo**_ and execute `pip install -e gym-server`. You should then be able to import it :

In [2]:
import gym_server

Now we need to configure this GymGodot server environment :


- **`serverIP`**, **`serverPort`** : the (websocket) IP/port to use for communication.


- **`exeCmd`** : the command to start the Godot environment. There are two possible ways :
    - **Through Godot Editor**. Depending on the OS and how Godot was installed, the command can take different forms. For instance, on Linux with Godot installed through Flatpak, we can start Godot with : `flatpak run org.godotengine.Godot`. If you downloaded Godot directly without any package/app manager, it would be: `<path_to_godot_folder>/bin/godot.x11.tools.64`. We will have to execute from the project folder (i.e. the folder containing the `project.godot` file) and pass the path to the scene to execute (`./examples/pendulum/Root.tscn` for our Cartpole example here)
    - Or, **export as an executable** (`Project -> Export` in Godot Editor). Then simply indicate the path to that executable in `exeCmd` (it must be re-exported if the scene is modified).


- **`action_space`** : A Gym space for the action space of your environment, can be a continuous `Box` space or a discrete `Discrete` space. In this CartPole example, our action space is "go left" or "go right" i.e. a `Discrete` space which can take two values `0` or `1`.


- **`observation_space`** : A Gym space for the action space of your environment. In this CartPole example, we observe four float values : Cart Position, Cart Velocity, Pole Angle and Pole Angle Velocity.


- **`window_render`** : If `True` the environment will be rendered in the Godot window which can be useful for debugging. If `False`, rendering will be skipped which considerably speeds up the training.


- **`render_path`** : Path where rendered frames will be stored. A frame is saved when calling `env.render()`. **The path must exist** (it will not create the corresponding folder if it doesn't exist).

In [3]:
import os

# Server
serverIP = '127.0.0.1'
serverPort = '8000'

# Godot game exe command
projectPath = 'C:/Users/Shehroze/source/repos/GymGodot/gym-godot'  # project.godot folder
godotPath = 'E:/godot/Godot/Godot_v3.4-stable_mono_win64.exe'  # godot editor executable
scenePath = './examples/cartpole/Root.tscn'
exeCmd = 'cd {} && {} {}'.format(projectPath, godotPath, scenePath)

# Action Space ('go left' (0) or 'go right' (1))
action_space = spaces.Discrete(2)

# Observation Space (Cart Position, Cart Velocity, Pole Angle, Pole Angle Velocity)
observation_space = spaces.Box(low=np.array([-40, -np.inf, -np.pi/8, -np.inf], dtype=np.float32), 
                               high=np.array([40, np.inf, np.pi/8, np.inf], dtype=np.float32),
                               dtype=np.float32)

# Create folder to store renders
renderPath = os.getcwd() + '/render_frames/' # '/home/user/.../GymGodot/gym-godot/examples/cartpole/render_frames'
if not os.path.exists(renderPath):
    os.makedirs(renderPath)

In [9]:
# Set up gym-server with those parameters
env = gym.make('server-v0', serverIP=serverIP, serverPort=serverPort, exeCmd=exeCmd, 
               action_space=action_space, observation_space=observation_space, 
               window_render=True, renderPath=renderPath)

- starting Gym server
- starting Godot env with command : cd C:/Users/Shehroze/source/repos/GymGodot/gym-godot && E:/godot/Godot/Godot_v3.4-stable_mono_win64.exe ./examples/cartpole/Root.tscn --fixed-fps 60 --serverIP=127.0.0.1 --serverPort=8000 --renderPath=c:\Users\Shehroze\source\repos\GymGodot\gym-godot\examples\cartpole/render_frames/


A window should open with the cartpole. Now we can control this environment from Python, let's do 5 steps with action "go left" :

In [10]:
for i in range(0,5):
    print(env.step(0)) # "go left"
    # or env.step(1) for "go right"

(array([0., 0., 0., 0.], dtype=float32), 1, False, {})
(array([-0.088484, -0.088484,  0.010983,  0.010983], dtype=float32), 1, False, {})
(array([-0.294668, -0.206184,  0.036597,  0.025614], dtype=float32), 1, False, {})
(array([-0.618337, -0.323669,  0.076923,  0.040325], dtype=float32), 1, False, {})
(array([-1.059165, -0.440828,  0.132096,  0.055173], dtype=float32), 1, False, {})


The cart should start to go slightly on the left. This function returns the tuple : (`next_state`, `reward`, `done`, `info`) where `next_state` is our observation (i.e. the `[Cart z-Position, Cart z-Velocity, Pole Angle, Pole Angle Velocity]` in this example).

We can re-initialize the environment :

In [11]:
print(env.reset())

[0. 0. 0. 0.]


Now the cart should be back into the middle of the screen with the pendulum vertical. This function returns the initial state (initial observation).

We can also make a render of the environment :

In [None]:
env.render()

The render will be saved at the `renderPath` folder path we configured above. We can display it here :

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
img = mpimg.imread(renderPath + '0.png')
plt.imshow(img)
plt.axis('off')
plt.show()

At last, we can close our environment when we are done (the Godot window should be closing):

In [12]:
env.close()
del env

- server closed


# Training

We can implement the training algorithm on our own (using for e.g. PyTorch, Tensorflow, Keras, etc.) or we can use already existing RL libraries such as [Stable-Baseline 3](https://github.com/DLR-RM/stable-baselines3).

We disable window rendering in our environment (`window_render=False`) to speed up computations but we can still render episodes manually using `env.render()`. This CartPole environment does not have a time limit (i.e. a max number of steps per episode) defined on Godot side but we can add such a limit from Python using Gym's `TimeLimit` wrapper. We set the limit to 250 steps and as we get +1 reward per step, this is also the maximum reward we can get.

For this example we'll use Stable-Baseline (`pip install stable-baselines3`). We can follow the training progression from Tensorboard thanks to the `Monitor` wrapper.

In [4]:
from gym.wrappers import TimeLimit
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.monitor import Monitor

env = gym.make('server-v0', serverIP=serverIP, serverPort=serverPort, exeCmd=exeCmd, 
               action_space=action_space, observation_space=observation_space, 
               window_render=False, renderPath=renderPath)

env = Monitor(TimeLimit(env, max_episode_steps=250))

model = PPO(MlpPolicy, env, verbose=0, learning_rate=0.0004, seed=0,
            tensorboard_log="./tensorboard_logs/", device='cpu')
model.learn(total_timesteps=100000)



- starting Gym server
- starting Godot env with command : cd C:/Users/Shehroze/source/repos/GymGodot/gym-godot && E:/godot/Godot/Godot_v3.4-stable_mono_win64.exe ./examples/cartpole/Root.tscn --fixed-fps 60 --disable-render-loop --serverIP=127.0.0.1 --serverPort=8000 --renderPath=c:\Users\Shehroze\source\repos\GymGodot\gym-godot\examples\cartpole/render_frames/


In TensorBoard we can see the agent is converging towards the maximum reward as the training is progressing :
    
<img src="./notebook_images/tensorboard_plot.png" width="900">

We can save the learned model to disk :

In [None]:
model.save('cartpole_model')

Load the model from disk :

In [None]:
model = PPO.load('cartpole_model', device='cpu')

We can also render one episode using the learned model :

In [None]:
obs = env.reset()
for i in range(250):
    action, _states = model.predict(obs)
    obs, rewards, done, info = env.step(action)
    env.render()
    if done :
        break
        
env.close()

We can convert the rendered frames as video, for instance with `ffmpeg` :

In [None]:
os.system('cd {} && ffmpeg -framerate 30 -y -i %01d.png -vcodec libvpx video.webm'.format(renderPath))

In [None]:
%%html
    <video width='256' height='256' controls>
        <source src='./render_frames/video.webm'>
    </video>

You should obtain an animation with the cart balancing looking like so :

![cart_balancing_gif](./notebook_images/output.gif)