In [7]:
# Install the required packages
!pip install gymnasium
!pip install numpy
!pip install matplotlib

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [3]:
import gymnasium as gym
import numpy as np
import matplotlib.pyplot as plt
import time
from IPython import display

## Enviroment Creation

In [4]:
env = gym.make("MountainCar-v0", render_mode=None)

# Let's see what the observation and action spaces look like
print("Observation Space:", env.observation_space)
print("Action Space:", env.action_space)

# Get the high and low bounds of the state space
print("State space bounds:")
print("Low:", env.observation_space.low)
print("High:", env.observation_space.high)


Observation Space: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
Action Space: Discrete(3)
State space bounds:
Low: [-1.2  -0.07]
High: [0.6  0.07]


gym.make(...): This is the command from the Gymnasium library to create an environment.

'MountainCar-v0': This is the specific game we want to load.

    The Goal of MountainCar: Imagine a car stuck between two hills. It doesn't have enough engine power to drive straight up the right hill to reach the goal (a flag at the top). The only way to succeed is to drive back and forth, building up momentum, like swinging on a swing set, to eventually climb the hill.

render_mode=None: This tells Gymnasium not to show us a window with the game graphics while it runs. If you wanted to see the car moving, you might change this to render_mode='human'. For just running calculations, None is faster.

env = ...: We store the created game environment in a variable named env. We'll use this variable to interact with the game.

Observation Space (env.observation_space): This tells us what information the game gives our character (the "agent") at each step. It describes what the agent "sees" or "observes".

    For MountainCar, the output (e.g., Box([-1.2 -0.07], [0.6 0.07], (2,), float32)) means:

        Box: The observations are continuous numbers within a range.

        [position, velocity]: There are two numbers: the car's horizontal position and its current velocity.

        Low: [-1.2 -0.07]: The lowest possible position is -1.2, and the lowest velocity is -0.07.

        High: [0.6 0.07]: The highest possible position is 0.6 (the flag is at 0.5), and the highest velocity is 0.07.

Action Space (env.action_space): This tells us what actions the character is allowed to take.

    For MountainCar, the output (e.g., Discrete(3)) means:

        Discrete: The actions are distinct choices, numbered starting from 0.

        (3): There are 3 possible actions:

        Action 0: Push the car left.

        Action 1: Do nothing.

        Action 2: Push the car right.

In [5]:
# Let's run a random episode to see how the environment behaves
state,_ = env.reset()
done=False
steps=0

state, _ = env.reset():

    env.reset(): This command starts a new episode. It puts the car back at its starting position (usually somewhere random at the bottom of the valley) and resets the game score, time, etc.

    It returns two things: the initial state (the car's starting position and velocity) and some extra info (which we are ignoring here with _).

    We store the starting state in the state variable.

done = False: This is a flag variable. We set it to False at the beginning because the episode is not finished yet.

steps = 0: We initialize a counter to keep track of how many steps we've taken in this episode.

In [6]:
while not done and steps < 200: # # Maximum steps is typically 200
    action = env.action_space.sample() #Random action
    next_state, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

    print(f"Step {steps}: State {next_state}, Action {action}, Reward {reward}")
    state = next_state
    steps +=1

print(f"Episode finished after {steps} steps")
env.close()

Step 0: State [-0.44740105 -0.00057069], Action 1, Reward -1.0
Step 1: State [-4.4753826e-01 -1.3720577e-04], Action 2, Reward -1.0
Step 2: State [-0.44924095 -0.00170272], Action 0, Reward -1.0
Step 3: State [-0.45149675 -0.00225579], Action 1, Reward -1.0
Step 4: State [-0.4542891  -0.00279235], Action 1, Reward -1.0
Step 5: State [-0.45859754 -0.00430844], Action 0, Reward -1.0
Step 6: State [-0.4643904  -0.00579286], Action 0, Reward -1.0
Step 7: State [-0.47062498 -0.00623459], Action 1, Reward -1.0
Step 8: State [-0.47725523 -0.00663023], Action 1, Reward -1.0
Step 9: State [-0.4852319  -0.00797668], Action 0, Reward -1.0
Step 10: State [-0.49249572 -0.0072638 ], Action 2, Reward -1.0
Step 11: State [-0.49899244 -0.00649673], Action 2, Reward -1.0
Step 12: State [-0.50467354 -0.00568111], Action 2, Reward -1.0
Step 13: State [-0.51149654 -0.00682298], Action 0, Reward -1.0
Step 14: State [-0.51941025 -0.00791372], Action 0, Reward -1.0
Step 15: State [-0.5283554  -0.00894514], Ac

while not done and steps < 200:: This loop continues as long as the episode is not done AND the number of steps taken is less than 200.

    MountainCar usually has a time limit (200 steps). If the car doesn't reach the flag within 200 steps, the episode ends.

action = env.action_space.sample(): This is where we choose an action. env.action_space.sample() randomly picks one of the valid actions (0, 1, or 2). Crucially, this is NOT intelligent decision-making; it's just random.

next_state, reward, terminated, truncated, info = env.step(action): This is the MOST IMPORTANT command. We tell the environment (env) to take the chosen action. The game then calculates what happens next and gives us back five pieces of information:

    next_state: The new observation (the car's new position and velocity) after taking the action.

    reward: A score signal. In MountainCar, you typically get a reward of -1 for every single step you take. This encourages the agent to finish as quickly as possible. You get a reward of 0 (or sometimes positive) if you reach the flag.

    terminated: A boolean (True or False). It becomes True if the episode ended because the agent reached the goal state (car reached the flag).

    truncated: A boolean (True or False). It becomes True if the episode ended for another reason, like hitting the step limit (200 steps).

    info: Extra dictionary with debugging information (usually empty in basic environments).

done = terminated or truncated: We update our done flag. The episode is considered "done" if it either terminated (goal reached) OR truncated (time ran out).

print(...): We display the information for the step we just took.

state = next_state: We update our current state to be the next_state we just received. This is important for the next loop iteration (though in this random example, we don't use the state to decide the next action).

steps += 1: We increment our step counter.

Run this code first. It will:

Create the MountainCar environment

Show you the observation space (position and velocity ranges)

Show you the action space (the three possible actions)

Run a random agent to demonstrate the environment's behavior