## 🧊 Lab 1: FrozenLake Intro

In this lab, we will explore **FrozenLake**, a classic reinforcement learning environment provided by [Gymnasium](https://gymnasium.farama.org/environments/toy_text/frozen_lake/).  
FrozenLake is a simple **grid world** where an agent must navigate from the **start tile (S)** to the **goal tile (G)** without falling into any **holes (H)**.  
Each step moves the agent **up, down, left, or right**, but the surface can be slippery — meaning your chosen action may not always lead to the intended direction.

- **State Space:** Each grid cell is a discrete state (4×4 = 16 states by default).  
- **Action Space:** 4 actions – `LEFT (0)`, `DOWN (1)`, `RIGHT (2)`, `UP (3)`.  
- **Reward Function:** +1 for reaching the goal, 0 otherwise.  
- **Episode Termination:** Episode ends when the agent falls into a hole or reaches the goal.

FrozenLake is a great starting point for RL experiments because it is:
- **Simple to visualize**, helping you understand the interaction loop (`reset → step → render`).
- **Small and discrete**, perfect for testing value iteration, policy iteration, and basic RL algorithms.
- **Customizable**, letting you adjust map size and stochasticity (`is_slippery=True/False`).

Today, we will run a **random agent** to get familiar with the API and visualize how the environment behaves before we move on to smarter policies.


In [None]:
import gymnasium as gym
import numpy as np
from gymnasium.envs.toy_text.frozen_lake import generate_random_map

### 🔧 Setting Up FrozenLake (4×4)

Let's create the **4×4 FrozenLake environment** in deterministic mode (`is_slippery=False`) so we can step through it without random slips.

We'll:
1. Initialize the environment.
2. Inspect the **state space** (number of discrete states).
3. Inspect the **action space** (number of available actions).
4. Print the **reward map** to see where the goal and holes are.


In [None]:
# 1. Create FrozenLake environment (deterministic so it's easy to follow)
env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery=False, render_mode="ansi")

In [None]:
# 2. Show state and action space
print(f"State space: {env.observation_space}  -> {env.observation_space.n} states")
print(f"Action space: {env.action_space}  -> {env.action_space.n} actions (0=LEFT, 1=DOWN, 2=RIGHT, 3=UP)")

In [None]:
# 3. Render initial state
obs, info = env.reset(seed=42)
print("\nInitial Grid:")
print(env.render())

In [None]:
# 4. Show reward map using env.unwrapped.P
reward_map = np.zeros(env.observation_space.n)
P = env.unwrapped.P  # <-- Access underlying FrozenLakeEnv

for state, transitions in P.items():
    for action, outcomes in transitions.items():
        for prob, next_state, reward, done in outcomes:
            reward_map[next_state] = max(reward_map[next_state], reward)

print("Reward map (reshaped to 4x4):")
print(reward_map.reshape(4, 4))

### 🔑 Core Gym API: `reset()` and `step()`

Every Gym environment follows the same basic pattern:  
you **reset** to start an episode, then **step** through the environment until it ends.

In [None]:
env.reset()
obs, info = env.reset(seed=0)
print(obs, info)

- **`obs`** → the initial state (an integer for FrozenLake, `0–15` for a 4×4 grid).
- **`info`** → extra diagnostic info (not used much here).

In [None]:
obs, reward, terminated, truncated, info = env.step(action)
print(action, obs, reward, terminated, truncated, info  )

#### 🟡 `env.step(action)`

| Variable       | Meaning                                                                 |
|---------------|-------------------------------------------------------------------------|
| **`action`**      | Integer `0–3`: the action you chose (see table below).                  |
| **`obs`**         | The **next state** (an integer from `0` to `15` for a 4×4 grid).        |
| **`reward`**      | Scalar reward for this step (`+1` at goal, `0` elsewhere).              |
| **`terminated`**  | `True` if the episode ended because you reached a **goal** or fell in a **hole**. |
| **`truncated`**   | `True` if the episode ended because you hit a **time limit** (rare in FrozenLake). |
| **`info`**        | Extra information (rarely needed for FrozenLake).                       |

#### 🧭 Action Mapping (for FrozenLake)

| Action Number | Direction | Symbol |
|--------------|-----------|--------|
| `0` | **LEFT**  | ⬅️ |
| `1` | **DOWN**  | ⬇️ |
| `2` | **RIGHT** | ➡️ |
| `3` | **UP**    | ⬆️ |

---

You repeat `env.step(action)` until **`terminated or truncated`** becomes `True`,  
then call **`env.reset()`** to start a new episode.


### 🏃 Exercise: Implement `run_random_trajectory`

Write a function called **`run_random_trajectory`** that:

1. **Resets the environment** to get the initial state.
2. Prints the **initial grid** using `env.render()`.
3. Loops for up to `max_steps`:
   - Samples a **random action** with `env.action_space.sample()`.
   - Calls `env.step(action)` and unpacks the result into  
     `obs, reward, terminated, truncated, info`.
   - Prints the **step number**, chosen action, reward, and the grid.
   - **Breaks the loop** if `terminated` or `truncated` is `True`.
4. At the end, prints the **total reward** collected in this episode.

When you finish, run your function for one or more trajectories with different seeds:

```python
# Example usage (after you implement the function)
for i in range(3):
    print(f"=== Trajectory {i+1} ===")
    run_random_trajectory(env, max_steps=20, seed=100 + i)
```

#### 💡 Tips
- Remember to call `env.reset(seed=...)` at the start of your function.
- Use a loop like `for step in range(max_steps):`.
- Use f-strings to format the output clearly (e.g., `print(f"Step {step+1}: Action={action} -> Reward={reward}")`).
- Stop the loop early when `terminated or truncated` is `True`.

> **Goal:** You should see the agent move step by step and eventually fall into a hole or reach the goal. Your printout should look similar to the instructor’s solution but doesn’t have to match exactly.


In [None]:
# Your turn to work on it