# Running a Pre-trained PPO Model for 10 Episodes in a Given Environment



In [1]:
import gymnasium as gym
from stable_baselines3 import PPO

A. **Initialization**: The code starts by specifying the directory where the pre-trained Proximal Policy Optimization (PPO) model is stored. It then loads this model into the `model` variable, setting the environment as `env`.

    ```python
    models_dir = 'models/PPO'
    models_path = f"{models_dir}/280000.zip"
    model = PPO.load(models_path, env=env)
    ```

In [6]:
env =gym.make('LunarLander-v2',render_mode='human')
env.reset()

models_dir = 'models/PPO'
models_path = f"{models_dir}/4990000.zip"

model = PPO.load(models_path,env=env)


Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


2. **Episode Loop**: The code runs the model for 10 episodes. Each episode is a single run of the environment from start to termination.

    ```python
    episodes = 10
    ```

In [7]:
episodes=10

3. **Environment Interaction**: Within each episode, the code performs the following steps until the episode is done:

    - **Reset Environment**: The environment is reset to an initial state, and the observation is extracted.
    - **Rendering**: The current state of the environment is visualized using `env.render()`.
    - **Action Prediction**: An action is predicted based on the current observation using the PPO model.
    - **Step**: The environment is updated by taking the predicted action. New observation, reward, and termination status (`done`) are obtained.
  
    ```python
    for ep in range(episodes):
        obs = env.reset()
        obs = obs[0]
        done = False
        while not done:
            env.render()
            action, _ = model.predict(obs)
            obs, reward, done, info, _ = env.step(action)
    ```

In [8]:
episodes = 10

for ep in range(episodes):
    obs = env.reset()
    obs = obs[0]
    done = False
    step_count = 0  # Initialize step counter for each episode
    
    while not done:
        env.render()
        action, _ = model.predict(obs)
        obs, reward, done, info, _ = env.step(action)
        
        step_count += 1  # Increment step counter
        
        if step_count >= 250:  # Check if step limit is reached
            print("Episode failed to reach a terminal state within 250 steps.")
            break  # Exit the inner loop

env.close()
