# Deep Reinforcement Learning with Gymnasium

Reinforcement Learning (RL) is one of the three main machine learning paradigms, alongside supervised and unsupervised learning. Unlike the other two, RL focuses on training an agent to interact with its environment by making decisions that maximize cumulative rewards. Through trial and error, the agent learns the optimal actions to take in different situations.

A powerful extension of this approach is Reinforcement Learning with Human Feedback (RLHF), where human input helps refine the agent’s behavior at each step, leading to more aligned and effective decision-making.

RL has a wide range of applications, from self-driving cars and automated trading to game-playing AI and robotic control. When combined with deep neural networks, it becomes Deep Reinforcement Learning, enabling breakthroughs in complex problem-solving.

In this code-along, we’ll dive into Gymnasium, an open-source Python library for developing and benchmarking RL algorithms. I’ll guide you through setting it up, exploring different RL environments, and implementing a simple agent to apply an RL algorithm in Python.

Let’s get started! 🚀

## What is Gymnasium?

[Gymnasium](https://gymnasium.farama.org/) is an open-source Python library designed to support the development and evaluation of reinforcement learning (RL) algorithms. It provides a robust framework that simplifies RL research and experimentation by offering:

- A diverse range of environments, from simple games to complex real-world simulations.
- Intuitive APIs and wrappers for seamless interaction with environments.
- Flexibility to create custom environments while leveraging the standardized API framework.

With Gymnasium, developers can easily build and test RL algorithms using API calls to:

- Send the agent’s chosen actions to the environment.
- Retrieve the environment’s state and reward after each action.
- Train the RL model efficiently.
- Evaluate the model’s performance in different scenarios.

This structured approach makes Gymnasium a powerful tool for both beginners and experienced researchers in RL.

Since this code-along is recorded at a certain point in time, we'll install specific versions of the required dependencies.

In [2]:
!pip install torch==2.3.1 gymnasium==1.1.1

Defaulting to user installation because normal site-packages is not writeable
Collecting gymnasium==1.1.1
  Downloading gymnasium-1.1.1-py3-none-any.whl.metadata (9.4 kB)
Downloading gymnasium-1.1.1-py3-none-any.whl (965 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m965.4/965.4 kB[0m [31m56.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gymnasium
Successfully installed gymnasium-1.1.1


In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.distributions as distributions
import numpy as np
import gymnasium as gym  

## Task 1: Setting up a Gymnasium Environment

A [Gymnasium Environment](https://gymnasium.farama.org/api/env/) is a controlled setting where an RL agent interacts, learns, and makes decisions to achieve a goal. Environments provide a structured way to model various real-world and simulated scenarios, making them essential for developing and testing reinforcement learning (RL) algorithms.

For this code-along we'll use the [CartPole-v1](https://gymnasium.farama.org/environments/classic_control/cart_pole/) environment. Our goal is to develop a simple neural network that is able keep the inverted pendulumn upright by the control the left-to-right motion of the cart on which it stands.

An episode ends if one of the following conditions occur:

1. Termination: Pole Angle is greater than ±12°
2. Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)
3. Truncation: Episode length is greater than 500 (200 for v0)

We'll specify `render_mode="rgb_array"` to be able to visualize the state using matplotlib later on. 

![Cartpole](cartpole.png)


## Task 2: Create a Neural Network

## Task 3: Train and validate policy

## Additional tasks:

- Tweak the discount factor and evaluate the effects on training
- Create an agent for other Gymnasium Environments
- Try out different loss functions like the [CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)