# Environment Testing Notebook

This notebook is for testing and debugging the REACH simulation environments.

Use this to:
- Test environment setup
- Visualize observations and actions
- Debug reward functions

## FIRST TIME SETUP

After you've cloned this repo, go to the root directory of the project (`cd PATH/TO/reach/`)

Create a Python Virtual Environment (venv)

`python -m venv .`

VSCode might give you a popup about there being a new environment, asking if you want to set it to default. Select yes.
<hr>
Once you've created the environment, you don't need to do it again. However, you'll need to activate it each time you begin work (VSCode should do this automatically).

If you need to do it manually:

Unix: `source ./bin/activate`

Windows PS: `./Scripts/Activate.ps1`

This virtual environment keeps all the packages you install local to this project.
<hr>

In the top right of this notebook, you'll see a selector for the Python kernel (something like Python 3.12.1). Click on this and select "Select another kernel", then "Python Environments", then "reach".

This ensures the Jupyter notebook runs your code inside the virtual environment.
<hr>

Next, you'll need to install the required packages.

`pip install -r requirements.txt`
<hr>

Once you've completed the above steps, you should be able to test your environment. You can click "Run All" at the top or run the blocks one by one.

If something doesn't work, try to do some research and see if you need to install prerequisite packages for your system. Otherwise, reach out to a REACH developer for assistance.

## 1. Test Gymnasium and SB3

Run the below code to test gymnasium and sb3 installations.

There will be no window and it should only take about 3 seconds to run.

NOTE: If you have a CUDA enabled GPU (NVIDIA), run `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121` to use GPU acceleration.

In [None]:
import gymnasium as gym
from stable_baselines3 import PPO

def test_no_view():
    env = gym.make("CartPole-v1")
    model = PPO("MlpPolicy", env, verbose=1, device="auto")
    model.learn(total_timesteps=500) # Increase this value to train longer, 500 takes about 1 minute, 5000 takes about 2.5 minutes

    obs, _ = env.reset()
    for _ in range(10):
        action, _ = model.predict(obs, deterministic=True)
        obs, reward, terminated, truncated, info = env.step(action)
        if terminated or truncated:
            obs, _ = env.reset()

    env.close()
    return model

model = test_no_view()

print("✅ Completed cart pole test successfully!")

## 2. Test visualization

Now, test the human view of the above code. This should create a pygame window (you might need to look for it!) and will take about 1 minute to run.


In [None]:
def test_vis():
    env = gym.make("CartPole-v1", render_mode="human")
    model = PPO("MlpPolicy", env, verbose=0, device="auto")
    model.learn(total_timesteps=100) # Increase this value to train longer, 100 takes about 40 seconds, 500 takes about 1 minute

    obs, _ = env.reset()
    for _ in range(10):
        action, _ = model.predict(obs, deterministic=True)
        obs, reward, terminated, truncated, info = env.step(action)
        if terminated or truncated:
            obs, _ = env.reset()

    env.close()
    return model

model = test_vis()

## 3. Train the model

Train the model for about 5-10 minutes and save its policy to ppo_cartpole.zip

In [7]:
from stable_baselines3.common.vec_env import DummyVecEnv

def train_model():
    # 8 parallel environments
    env = DummyVecEnv([lambda: gym.make("CartPole-v1") for _ in range(8)])
    model = PPO("MlpPolicy", env, verbose=0, device="auto")  # verbose=0 to reduce logging

    model.learn(total_timesteps=500_000) # This will take some time to complete
    model.save("ppo_cartpole")
    env.close()

    return model

model = train_model()

## 4. Load model and record video

This will load the policy we saved in the last step and it will run some tests while recording.

The output video will be named `cartpole_trained.mp4`

In [None]:
from moviepy.video.io.ffmpeg_writer import FFMPEG_VideoWriter

video_filename = "cartpole_trained.mp4"

def load_and_record_model():
    env = gym.make("CartPole-v1", render_mode="rgb_array")
    model = PPO.load("ppo_cartpole.zip", env=env, device="cuda")

    fps = 60

    obs, _ = env.reset()
    first_frame = env.render()
    height, width, _ = first_frame.shape

    writer = FFMPEG_VideoWriter(
        video_filename,
        size=(width, height),
        fps=fps,
        codec="libx264"
    )

    writer.write_frame(first_frame)

    num_episodes = 10
    max_steps_per_episode = 500

    for ep in range(num_episodes):
        obs, _ = env.reset()
        for step in range(max_steps_per_episode):
            action, _ = model.predict(obs, deterministic=True)  # deterministic for video
            obs, reward, terminated, truncated, info = env.step(action)

            frame = env.render()
            if frame is not None:
                writer.write_frame(frame)

            if terminated or truncated:
                break

    env.close()
    writer.close()
    return model

model = load_and_record_model()

print(f"✅ Video saved as {video_filename}")