In [1]:
import gymnasium as gym
import numpy as np

# Task 1: Environment Testing

### Environment 1: MountainCar-v0

In [2]:
env = gym.make("MountainCar-v0", render_mode="human")

observation, info = env.reset(seed=42)
total_steps = 0
episodes_completed = 0

for _ in range(500):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    total_steps += 1

    if terminated or truncated:
        episodes_completed += 1
        observation, info = env.reset()

env.close()
print(f"Episodes completed: {episodes_completed}")
print(f"Average episode length: {total_steps / max(episodes_completed, 1):.1f}")

Episodes completed: 2
Average episode length: 250.0


### Environment 2: Acrobot-v1

In [3]:
env = gym.make("Acrobot-v1", render_mode="human")

observation, info = env.reset(seed=42)
total_reward = 0
total_steps = 0
episodes_completed = 0

for _ in range(500):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    total_steps += 1

    if terminated or truncated:
        episodes_completed += 1
        observation, info = env.reset()

env.close()
print(f"Episodes completed: {episodes_completed}")
print(f"Average episode length: {total_steps / max(episodes_completed, 1):.1f}")
print(f"Average reward per episode: {total_reward / max(episodes_completed, 1):.1f}")

Episodes completed: 1
Average episode length: 500.0
Average reward per episode: -500.0


# Task 2: Environment Analysis

In [4]:
envs = ["MountainCar-v0", "Acrobot-v1"]

for env_name in envs:
    env = gym.make(env_name)
    print(f"\n=== {env_name} ===")
    print(f"Observation space: {env.observation_space}")
    print(f"Action space: {env.action_space}")
    env.close()


=== MountainCar-v0 ===
Observation space: Box([-1.2  -0.07], [0.6  0.07], (2,), float32)
Action space: Discrete(3)

=== Acrobot-v1 ===
Observation space: Box([ -1.        -1.        -1.        -1.       -12.566371 -28.274334], [ 1.        1.        1.        1.       12.566371 28.274334], (6,), float32)
Action space: Discrete(3)


##**MountainCar-v0 Analysis**
**Observation Space:** Box(2,) - Two continuous values:
- Position of car along x-axis (-1.2 to 0.6)
- Velocity of car (-0.07 to 0.07)

**Action Space:** Discrete(3) - Three actions:
- 0: Push left
- 1: No push (neutral)
- 2: Push right

**Goal:** Get the car to reach the flag at position 0.5 on the right hill

**Episode Length:** Usually 200 steps (max limit) since random actions rarely succeed

**Challenges:**
- Car lacks power to drive up the hill directly
- Must build momentum by rocking back and forth
- Sparse reward structure (only +1 when reaching goal)
- Random actions typically result in getting stuck in the valley

#### MountainCar Observation Space
```
Box([-1.2 -0.07], [0.6 0.07], (2,), float32)
```
- **Position**: Range [-1.2, 0.6] represents the valley (negative) to goal hill (positive)
- **Velocity**: Range [-0.07, 0.07] shows limited speed capabilities
- **Insight**: The narrow velocity range explains why direct climbing is impossible - the car simply cannot build enough speed in one direction

## **Acrobot-v1 Analysis**

**Observation Space:** Box(6,) - Six continuous values representing joint angles and velocities

**Action Space:** Discrete(3) - Three torque actions:
- 0: Apply -1 torque
- 1: Apply 0 torque  
- 2: Apply +1 torque

**Goal:** Swing the double pendulum up so the tip reaches above a certain height

**Episode Length:** Around 200-500 steps, ends when goal is reached or max steps

**Challenges:**
- Underactuated system (can only control one joint)
- Must use momentum and gravity to swing up
- Precise timing needed for successful swings

#### Acrobot Observation Space  
```
Box([-1. -1. -1. -1. -12.566371 -28.274334], [1. 1. 1. 1. 12.566371 28.274334], (6,), float32)
```
- **Joint Angles** (first 4 values): Normalized to [-1, 1] using sine/cosine representations
- **Angular Velocities** (last 2 values): Much larger ranges (±12.57, ±28.27 rad/s)
- **Insight**: The high angular velocity limits suggest this system can build significant rotational energy