# API for reinforcement learning - Basic Usage

Original source: https://www.gymlibrary.dev/content/basic_usage/

Interacting with the Environment
Gym implements the classic “agent-environment loop”

The **agent** performs some actions in the **environment** (usually by passing some control inputs to the environment, e.g. torque inputs of motors) and **observes** how the environment’s **state** changes. One such action-observation exchange is referred to as a **timestep**.

The goal in RL is to **manipulate the environment** in some specific way. For instance, we want the agent to navigate a robot to a specific point in space. If it succeeds in doing this (or makes some progress towards that goal), it will receive a **positive reward** alongside the observation for this timestep. The reward may also be **negative or 0**, if the agent did not yet succeed (or did not make any progress). The agent will then be trained to **maximize** the reward it accumulates over many timesteps.

After some timesteps, the environment may enter a **terminal state**. For instance, the robot may have crashed! In that case, we want to **reset the environment** to a new initial state. The environment issues a done signal to the agent if it enters such a terminal state. Not all done signals must be triggered by a “catastrophic failure”: Sometimes we also want to issue a done signal after a fixed number of timesteps, or if the agent has succeeded in completing some task in the environment.

Let’s see what the agent-environment loop looks like in Gym. This example will run an instance of LunarLander-v2 environment for 1000 timesteps. Since we pass render_mode="human", you should see a window pop up rendering the environment.

In [1]:
import gym
env = gym.make("LunarLander-v2", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(1000):
   #action = policy(observation)  # User-defined policy function
   action = env.action_space.sample()
   observation, reward, terminated, truncated, info = env.step(action)

   if terminated or truncated:
      observation, info = env.reset()
env.close()

In [3]:
import gym
env = gym.make("CartPole-v1", render_mode="human")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()
env.close()