## Gymnasium

Gymnasium is a project that provides an API (application programming interface) for various single-agent reinforcement learning environments. It includes implementations of common environments such as CartPole, Pendulum, MountainCar, MuJoCo, Atari, and more. This page outlines the basics of how to use Gymnasium, focusing on its four key functions: `make()`, `Env.reset()`, `Env.step()`, and `Env.render()`.

### Core Concepts

At the core of Gymnasium is the `Env` class, a high-level Python class representing a Markov Decision Process (MDP) from reinforcement learning theory. Note that this is not a perfect reconstruction and is missing several components of MDPs. The `Env` class provides users with the ability to:

- Generate an initial state
- Transition/move to new states given an action
- Visualize the environment

Additionally, Gymnasium provides `Wrapper` classes to help augment or modify the environment, particularly the agent's observations, rewards, and actions.

### Initialization

In [2]:
import gymnasium as gym
env = gym.make('CartPole-v1')

### Core Functions

#### `make()`
This function will return an `Env` for users to interact with. To see all environments you can create, use `pprint_registry()`. Furthermore, `make()` provides a number of additional arguments for specifying keywords to the environment, adding more or less wrappers, etc. See `make()` for more information.

#### `Env.reset()`
Generates an initial state for the environment. This function is used to reset the environment to its initial state, which is useful for starting a new episode.

#### `Env.step()`
Transitions/moves to new states given an action. This function takes an action as input and returns the next state, reward, done (a boolean indicating if the episode has ended), and additional info.

#### `Env.render()`
Visualizes the environment. This function is used to render the current state of the environment, which can be useful for debugging and understanding the agent's behavior.

### Interacting with the Environment

In reinforcement learning, the classic “agent-environment loop” pictured below is a simplified representation of how an agent and environment interact with each other. The agent receives an observation about the environment, the agent then selects an action, which the environment uses to determine the reward and the next observation. The cycle then repeats itself until the environment ends (terminates).

![Agent-Environment Loop](../Assets//AE_loop_dark.png)

For Gymnasium, the “agent-environment-loop” is implemented below for a single episode (until the environment ends). See the next section for a line-by-line explanation. Note that running this code requires installing swig (`pip install swig` or download) along with `pip install "gymnasium[box2d]"`.

In [6]:
import gymnasium as gym

env = gym.make("LunarLander-v3", render_mode="human")
observation, info = env.reset()

episode_over = False
while not episode_over:
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)

    episode_over = terminated or truncated

env.close()

### Explaining the Code

First, an environment is created using `make()` with the `render_mode` keyword to specify visualization. In this example, we use the "LunarLander" environment where the agent controls a spaceship that needs to land safely.

After initializing the environment, we `Env.reset()` to get the first observation and additional info. To initialize with a specific random seed or options, use the `seed` or `options` parameters with `reset()`.

We define `episode_over` to know when to stop interacting with the environment and use a while loop that checks this variable.

The agent performs an action in the environment, and `Env.step()` executes this action (randomly chosen with `env.action_space.sample()`). This updates the environment, providing a new observation and a reward. This action-observation exchange is called a timestep.

The environment may end after some timesteps, reaching a terminal state. If the environment has terminated, `step()` returns `terminated` as `True`. Similarly, the environment may issue a `truncated` signal after a fixed number of timesteps. If either `terminated` or `truncated` is `True`, the episode ends. To restart, use `env.reset()`.

### Action and Observation Spaces

Every environment specifies the format of valid actions and observations with the `action_space` and `observation_space` attributes. This helps in understanding the expected input and output of the environment. In the example above, we sampled random actions via `env.action_space.sample()` instead of using an agent policy.

`Env.action_space` and `Env.observation_space` are instances of `Space`, a high-level Python class with key functions: `Space.contains()` and `Space.sample()`. Gymnasium supports various spaces:

- **Box**: Bounded space with upper and lower limits of any n-dimensional shape.
- **Discrete**: Discrete space where {0, 1, ..., n-1} are the possible values.
- **MultiBinary**: Binary space of any n-dimensional shape.
- **MultiDiscrete**: Series of Discrete action spaces with different numbers of actions.
- **Text**: String space with a minimum and maximum length.
- **Dict**: Dictionary of simpler spaces.
- **Tuple**: Tuple of simple spaces.
- **Graph**: Mathematical graph with interlinking nodes and edges.
- **Sequence**: Variable length of simpler space elements.

For example usage of spaces, see their documentation along with utility functions. There are also niche spaces like Graph, Sequence, and Text.


### Modifying the Environment

Wrappers are a convenient way to modify an existing environment without altering the underlying code directly. Using wrappers helps avoid boilerplate code and makes your environment more modular. Wrappers can also be chained to combine their effects. Most environments generated via `gymnasium.make()` are already wrapped by default using `TimeLimit`, `OrderEnforcing`, and `PassiveEnvChecker`.

To wrap an environment, first initialize a base environment. Then pass this environment along with optional parameters to the wrapper’s constructor:

