## Evolution + Reinforcement Learning for CarRacing-v3
### This notebook explores combining NEAT (neuroevolution) and PPO (reinforcement learning)
### to train an agent to drive in OpenAI's CarRacing-v3 environment.
### Pipeline:
 1. Train an agent using NEAT
 2. Extract best genome
 3. Pretrain a PPO agent using NEAT's behavior (imitation learning)
 4. Fine-tune PPO with reinforcement learning
 5. Evaluate performance




## Imports & Setup

pip install -r requirements.txt

In [10]:
import os
import cv2
import gymnasium as gym
import numpy as np
import pickle
import matplotlib.pyplot as plt
import neat
import torch
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage

## Loading the NEAT Agent from Module

Instead of redefining the NEAT agent logic here, we'll load it from `agents/neat_agent.py`. The class handles preprocessing and action selection using the evolved genome.


In [11]:
from agents.neat_agent import NeatAgent
from run_episode import run_agent
import pickle
import neat
import gymnasium as gym

## Loading the Best Evolved Genome

We previously trained the NEAT model and saved the best-performing genome in `best_genome.pkl`. Here we load it and construct the NEAT network.


In [12]:
# Load the NEAT config and genome
with open("best_genome.pkl", "rb") as f:
    genome = pickle.load(f)

config_path = "neat_config.txt"
config = neat.Config(
    neat.DefaultGenome,
    neat.DefaultReproduction,
    neat.DefaultSpeciesSet,
    neat.DefaultStagnation,
    config_path
)

# Create agent from genome
agent = NeatAgent(genome, config)


## Run a Test Episode Using the Evolved NEAT Agent

Now we run the trained NEAT agent in the CarRacing-v3 environment and observe its behavior. The `run_agent()` function also supports rendering.


In [13]:
env = gym.make("CarRacing-v3", render_mode="human")
score = run_agent(env, agent, render=True)
env.close()

print(f"✅ Episode finished with score: {score:.2f}")


✅ Episode finished with score: 313.06


In [21]:
import importlib
import train_ppo_from_neat
importlib.reload(train_ppo_from_neat)

from train_ppo_from_neat import train_from_neat_supervised

# Train PPO model using NEAT supervision
model = train_from_neat_supervised(episodes=10, epochs=3, save_path="ppo_bootstrapped_from_neat")

# You can now fine-tune or test `model` as needed


Using cuda device
Wrapping the env in a VecTransposeImage.
🎯 Generating dataset from NEAT agent...
🧠 Pretraining PPO with NEAT data...
Epoch 1 Loss: 0.2005
Epoch 2 Loss: 0.1083
Epoch 3 Loss: 0.0951
✅ PPO model saved to 'ppo_bootstrapped_from_neat.zip'


### Fine-Tuning PPO Agent with Reinforcement Learning

After initializing the PPO agent with NEAT demonstrations, we now fine-tune it using standard reinforcement learning. This allows the agent to explore and improve beyond what NEAT taught it.


In [23]:
from stable_baselines3.common.vec_env import DummyVecEnv, VecTransposeImage

def make_env():
    return gym.make("CarRacing-v3", render_mode="human", disable_env_checker=True)

env = DummyVecEnv([make_env])
env = VecTransposeImage(env)

model = PPO.load("ppo_bootstrapped_from_neat", env=env)

model.learn(total_timesteps=100_000)

model.save("ppo_finetuned_from_neat")
print("✅ PPO fine-tuned and saved!")


KeyboardInterrupt: 