# Week3 - Practicals - PADM
----------------------------




## Agenda for the day
1. Install the Gym Library

    1. Classic Control (eg. Cartpole) --> https://gymnasium.farama.org/environments/classic_control/
    2. Toy Text (eg. Frozen Lake) --> https://gymnasium.farama.org/environments/toy_text/
    3. Atari (eg. Breakout) --> https://gymnasium.farama.org/environments/atari/breakout/
2. Demonstration of Gym Environments.
3. Lets Build one Gym env and understand what action,state,step and step are

In [2]:
import gymnasium as gym

# Classic control

#### pip install gymnasium[classic_control]


In [7]:
env = gym.make("CartPole-v1", render_mode="human")

observation, info = env.reset()

action = env.action_space.sample()  # agent policy that uses the observation and info
for _ in range(100):
    observation, reward, terminated, truncated, info = env.step(action)
    pole_angle = observation[2]
    action = 0 if pole_angle < 0 else 1

    if terminated or truncated:
        observation, info = env.reset()

env.close()

# Toy Text
#### pip install gymnasium[toy_text]

In [4]:
env = gym.make('FrozenLake-v1', render_mode="human")

observation, info = env.reset()

for _ in range(100):
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)
    
    if terminated or truncated:
        observation, info = env.reset()

env.close()


# Atari
#### pip install "gymnasium[atari, accept-rom-license]"

In [6]:
from ale_py import ALEInterface
from ale_py.roms import Breakout, Pong

env = gym.make('ALE/Breakout-v5', render_mode='human')

observation, info = env.reset()

for _ in range(100):
    action = env.action_space.sample()  # agent policy that uses the observation and info
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

we'll explore the basic concepts using the OpenAI Gym's CartPole-v1 environment. The key concepts we'll cover include:

* *State*: The current situation of the environment which the agent observes.
* *Action*: The decision or move made by the agent based on the state.
* *Reward*: Feedback from the environment based on the action taken.
* *Step*: A function that moves the environment to the next state based on the action and returns the new state, reward, termination condition, and additional info.

#### Action Space

* 0: Push cart to the left

* 1: Push cart to the right

#### Observation Space

| Num | Observation | Min | Max |
|---|---|---|---|
| 0 | Cart Position | -4.8 | 4.8 |
| 1 | Cart Velocity | -Inf | Inf |
| 2 | Pole Angle | -0.418 rad (-24°) | 0.418 rad (24°) |
| 3 | Pole Angular Velocity | -Inf | Inf |

#### Rewards
Since the goal is to keep the pole upright for as long as possible, a reward of +1 for every step taken, including the termination step, is allotted. The threshold for rewards is 500 for v1 and 200 for v0.

#### env.step(action) --> https://gymnasium.farama.org/api/env/

RETURNS:

* observation (ObsType) – An element of the environment’s observation_space as the next observation due to the agent actions. An example is a numpy array containing the positions and velocities of the pole in CartPole.

* reward (SupportsFloat) – The reward as a result of taking the action.

* terminated (bool) – Whether the agent reaches the terminal state (as defined under the MDP of the task) which can be positive or negative. An example is reaching the goal state or moving into the lava from the Sutton and Barton, Gridworld. If true, the user needs to call reset().

* truncated (bool) – Whether the truncation condition outside the scope of the MDP is satisfied. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached. If true, the user needs to call reset().

* info (dict) – Contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. In OpenAI Gym <v26, it contains “TimeLimit.truncated” to distinguish truncation and termination, however this is deprecated in favour of returning terminated and truncated variables.

* done (bool) – (Deprecated) A boolean value for if the episode has ended, in which case further step() calls will return undefined results. This was removed in OpenAI Gym v26 in favor of terminated and truncated attributes. A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully, a certain timelimit was exceeded, or the physics simulation has entered an invalid state.

# Exercise: Building and Understanding the CartPole Environment
In this exercise, you will get hands-on experience with one of the most iconic environments in reinforcement learning—the CartPole. Your task will be to build the environment, interact with it using a simple policy, and observe the effects of actions on the state and overall performance.

### Part 1: Setup and Environment Creation
##### Task 1.1: First, you need to import the necessary library. Use the import statement to import gym.
##### Task 1.2: Now, create an instance of the CartPole-v1 environment and assign it to a variable named env. Remember to set render_mode="human" to visualize the simulation.

* TODO: Import the gym library
* TODO: Create the CartPole-v1 environment with render_mode set to "human"

In [7]:
# Placeholder to add your imports


<details>
<summary>Click to see the solution for Part 1</summary>

```python
import gymnasium as gym
env = gym.make("CartPole-v1", render_mode="human")
```
</details>



### Part 2: Interacting with the Environment
Before diving into policy implementation, it's important to understand how to interact with the Gym environment.

##### Task 2.1: Reset the environment to its initial state using the reset method and print the initial observation.

* TODO: Reset the environment and print the initial observation

##### Task 2.2: Implement a loop to simulate the agent taking 100 random actions in the environment. At each step, use the action_space.sample() method to choose a random action, apply it to the environment using the step function, and print the new observation and reward.

* TODO: Simulate 100 steps in the environment using random actions


In [8]:
# Placeholder for implementation

<details>
<summary>Click to see the solution for Part 2</summary>

```python
observation, info = env.reset()
print("Initial Observation:", observation)

for _ in range(100):
    action = env.action_space.sample()  # Take a random action
    observation, reward, terminated, truncated, info = env.step(action)
    print(f"New Observation: {observation}, Reward: {reward}")
    
    if terminated or truncated:
        observation, info = env.reset()
```
</details>


### Part 3: Implementing a Simple Policy
Now that you're familiar with the basics of interacting with the environment, let's implement a simple policy to control the cart.

##### Task 3.1: Create a policy function that takes an observation as input and returns an action. For this simple policy, return action 0 (move left) if the pole's angle (second value of the observation array) is negative, and action 1 (move right) if the angle is positive.

* TODO: Apply the simple policy in the simulation loop

In [9]:
# Placeholder for implementation
def simple_policy():
    pass

: 

<details>
<summary>Click to see the solution for Part 3</summary>

```python
def simple_policy(observation):
    return 0 if observation[2] < 0 else 1

observation, info = env.reset()
for _ in range(100):
    action = simple_policy(observation)  # Use the simple policy
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()
```
</details>


# Building Custom Environment

* https://github.com/Farama-Foundation/Gymnasium/tree/main
* https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/