# Building a Self-Driving Agent for Car-Racing

**Student:** Muhammad Mahdi Amirpour

**Student ID:** 40003033

**Course:** Artificial Intelligence

**University:** K. N. Toosi University of Technology

---

## Project Overview

This project aims to train an AI agent to drive a race car in the `CarRacing-v3` environment from the Gymnasium library. The agent will be built from scratch using the **Double Deep Q-Networks (DDQN)** algorithm.

This notebook provides a complete implementation based on the project description, including:
1.  **Environment Setup**: Configuring the `CarRacing-v3` environment.
2.  **DDQN Agent**: A complete class for the agent, with the `build_model`, `choose_action`, and `experience_replay` methods implemented as required.
3.  **Training Loop**: A main function to handle the agent's training over many episodes.
4.  **Monitoring & Saving**: Utilities for logging rewards, saving the model, and plotting progress.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Step 0: Install Dependencies

The `CarRacing-v3` environment requires the `Box2D` physics engine. The following cell installs `swig` (a prerequisite) and all necessary `gymnasium` components to ensure the environment can run correctly.

In [1]:
# Install swig, which is a dependency for Box2D
!pip install swig -q

# Install gymnasium with the Box2D extras, which includes the physics engine
!pip install "gymnasium[box2d]" -q

# Install other packages needed for rendering and processing in a headless environment
!pip install pygame pyvirtualdisplay -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m67.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.4/374.4 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for box2d-py (setup.py) ... [?25l[?25hdone


## Step 1: Imports and Setup

Now, we import all the necessary libraries. This includes `gymnasium` for the environment, `tensorflow` for the neural network, `numpy` for numerical operations, and other utilities for plotting and file management.

In [2]:
# --- Environment and System Imports ---
import gymnasium as gym
import numpy as np
import cv2
import random
import os
import datetime
from collections import deque

# --- TensorFlow Imports ---
import tensorflow as tf
from tensorflow.keras.models import Sequential, clone_model
from tensorflow.keras.layers import Input, Conv2D, Flatten, Dense
from tensorflow.keras.optimizers import Adam

# --- Monitoring and Plotting Imports ---
from tqdm import tqdm
import pandas as pd
import matplotlib.pyplot as plt

# --- GPU Configuration (Optional but Recommended) ---
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    print(f"GPU found, setting memory growth for {len(physical_devices)} device(s).")
    for gpu in physical_devices:
        tf.config.experimental.set_memory_growth(gpu, True)
else:
    print("No GPU found, training will run on CPU.")

GPU found, setting memory growth for 1 device(s).


## Step 2: Configuration

Here, we define all the key hyperparameters and settings for the project. You are encouraged to experiment with these values to improve the agent's performance.

In [3]:
# --- Directory Configuration ---
MODEL_TYPE = "DDQN_High_Throughput_Final"
TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
MODEL_DIR = f"./models/{MODEL_TYPE}/"
REWARD_DIR = f"./rewards/{MODEL_TYPE}/"

# --- TRAINING HYPERPARAMETERS (VECTORIZED) ---
TOTAL_TIMESTEPS = 500_000 # A solid target for a sub-1-hour run.
LEARNING_RATE = 0.00025
GAMMA = 0.99

# --- Epsilon-Greedy Strategy Parameters ---
EPSILON_START = 1.0
EPSILON_END = 0.05
# We now use linear decay over a fraction of total steps. It's more stable.
EPSILON_DECAY_FRACTION = 0.8

# --- REPLAY BUFFER AND BATCH PARAMETERS ---
REPLAY_BUFFER_MAX_SIZE = 100000
BATCH_SIZE = 256
REPLAY_BUFFER_START_SIZE = 20000

# --- HIGH-THROUGHPUT ARCHITECTURE PARAMETERS ---
NUM_ENVS = 8         # Number of parallel environments.
N_STEPS = 128        # (Rollout Buffer Size) Collect this many steps per env before training.
# After collecting N_STEPS*NUM_ENVS experiences, train for this many steps.
# This keeps the sample-to-update ratio high.
GRADIENT_STEPS = N_STEPS

# --- DDQN UPDATE AND SAVING FREQUENCIES (STEP-BASED) ---
TARGET_UPDATE_FREQ = 8000
SAVE_FREQUENCY = 50000

# --- Execution Mode ---
TEST_MODE = False
PRETRAINED_MODEL_PATH = f"./models/{MODEL_TYPE}/model_final.weights.h5"

## Step 3: Utility Functions

These helper functions handle state preprocessing and training visualization.

In [4]:
def convert_greyscale(state_rgb):
    """
    Converts an RGB state image to a single-channel grayscale image and crops it.
    The shape is reduced from (96, 96, 3) to (84, 96, 1) for the network.
    """
    # Convert to grayscale
    gray_state = cv2.cvtColor(state_rgb, cv2.COLOR_BGR2GRAY)
    # Crop the bottom part (scores/indicators)
    cropped_state = gray_state[:84, :]
    # Add channel dimension for the CNN
    processed_state = np.expand_dims(cropped_state, axis=-1)
    return processed_state

def plot_training_progress(log_data, file_path):
    """
    Plots and saves the agent's total reward and epsilon decay over episodes.
    """
    df = pd.DataFrame(log_data, columns=['Episode', 'Total Reward', 'Epsilon'])

    fig, ax1 = plt.subplots(figsize=(12, 7))

    # Plotting Total Reward
    color = 'tab:blue'
    ax1.set_xlabel('Episode')
    ax1.set_ylabel('Total Reward', color=color)
    ax1.plot(df['Episode'], df['Total Reward'], color=color, alpha=0.6, label='Total Reward')
    # Add a moving average for rewards
    moving_avg = df['Total Reward'].rolling(window=50).mean()
    ax1.plot(df['Episode'], moving_avg, color='red', label='50-Ep Moving Avg')
    ax1.tick_params(axis='y', labelcolor=color)
    ax1.legend(loc='upper left')
    ax1.grid(True)

    # Creating a second y-axis for Epsilon
    ax2 = ax1.twinx()
    color = 'tab:green'
    ax2.set_ylabel('Epsilon', color=color)
    ax2.plot(df['Episode'], df['Epsilon'], color=color, linestyle='--', label='Epsilon')
    ax2.tick_params(axis='y', labelcolor=color)
    ax2.legend(loc='upper right')

    fig.tight_layout()
    plt.title('Training Progress: Reward and Epsilon Decay')
    plt.savefig(file_path)
    plt.close(fig)

## Step 4: The DDQN Agent Class

This class encapsulates all the logic for our DDQN agent, including the neural network architecture, action selection, experience storage, and the learning algorithm itself.

In [5]:
class DDQN_Agent:
    def __init__(self, num_envs):
        self.action_space = [
            (-1, 1, 0.2), (0, 1, 0.2), (1, 1, 0.2), (-1, 1, 0), (0, 1, 0), (1, 1, 0),
            (-1, 0, 0.2), (0, 0, 0.2), (1, 0, 0.2), (-1, 0, 0), (0, 0, 0), (1, 0, 0)
        ]
        self.n_actions = len(self.action_space)
        self.num_envs = num_envs
        self.gamma = tf.constant(GAMMA, dtype=tf.float32)
        self.epsilon = EPSILON_START
        self.batch_size = BATCH_SIZE

        self.replay_buffer = deque(maxlen=REPLAY_BUFFER_MAX_SIZE)

        self.input_shape = (84, 96, 1)
        self.optimizer = Adam(learning_rate=LEARNING_RATE)
        self.loss_function = tf.keras.losses.MeanSquaredError()
        self.model = self.build_model()
        self.target_model = self.build_model()
        self.update_target_model()

    def build_model(self):
        model = Sequential([
            Input(shape=self.input_shape),
            Conv2D(24, 5, strides=2, activation='relu'), Conv2D(36, 5, strides=2, activation='relu'),
            Conv2D(48, 5, strides=2, activation='relu'), Conv2D(64, 3, activation='relu'),
            Conv2D(64, 3, activation='relu'), Flatten(), Dense(100, activation='relu'),
            Dense(50, activation='relu'), Dense(self.n_actions, activation='linear', dtype='float32')
        ])
        return model

    def store_transition(self, state, action, reward, next_state, done):
        self.replay_buffer.append((state, action, reward, next_state, done))

    def update_target_model(self):
        self.target_model.set_weights(self.model.get_weights())

    def choose_action(self, states):
        if np.random.rand() < self.epsilon:
            return np.random.randint(0, self.n_actions, size=self.num_envs)
        else:
            q_values = self.model(states, training=False)
            return tf.argmax(q_values, axis=1).numpy()

    # The new training function, performs a BURST of updates.
    def train(self, gradient_steps):
        if len(self.replay_buffer) < REPLAY_BUFFER_START_SIZE:
            return

        for _ in range(gradient_steps):
            minibatch = random.sample(self.replay_buffer, self.batch_size)
            states = tf.convert_to_tensor(np.array([t[0] for t in minibatch]), dtype=tf.float32)
            actions = tf.convert_to_tensor([t[1] for t in minibatch], dtype=tf.int32)
            rewards = tf.convert_to_tensor([t[2] for t in minibatch], dtype=tf.float32)
            next_states = tf.convert_to_tensor(np.array([t[3] for t in minibatch]), dtype=tf.float32)
            dones = tf.convert_to_tensor(np.array([t[4] for t in minibatch]).astype(np.float32), dtype=tf.float32)
            self._train_step(states, actions, rewards, next_states, dones)

    @tf.function
    def _train_step(self, states, actions, rewards, next_states, dones):
        q_next_online = self.model(next_states, training=False)
        best_next_actions = tf.argmax(q_next_online, axis=1, output_type=tf.int32)
        q_next_target = self.target_model(next_states, training=False)
        action_indices = tf.stack([tf.range(self.batch_size, dtype=tf.int32), best_next_actions], axis=1)
        q_value_of_best_action = tf.gather_nd(q_next_target, action_indices)
        target_q_values = rewards + self.gamma * q_value_of_best_action * (1.0 - dones)
        with tf.GradientTape() as tape:
            q_values = self.model(states, training=True)
            action_mask = tf.one_hot(actions, self.n_actions)
            q_action = tf.reduce_sum(tf.multiply(q_values, action_mask), axis=1)
            loss = self.loss_function(target_q_values, q_action)
        gradients = tape.gradient(loss, self.model.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))

    def save(self, file_path):
        self.model.save_weights(file_path)

    def load(self, file_path):
        self.model.load_weights(file_path)
        self.update_target_model()

## Step 5: The Training Loop

This function contains the main training process. It orchestrates the interaction between the agent and the environment over many episodes.

In [6]:
def train_agent_vectorized(agent, envs):
    log_data = []
    os.makedirs(MODEL_DIR, exist_ok=True)
    os.makedirs(REWARD_DIR, exist_ok=True)

    print(f"--- Warming up replay buffer with {REPLAY_BUFFER_START_SIZE} random steps... ---")
    states, _ = envs.reset()
    for _ in tqdm(range(REPLAY_BUFFER_START_SIZE // NUM_ENVS)):
        action_indices = np.random.randint(0, agent.n_actions, size=NUM_ENVS)
        actions_np = np.array([agent.action_space[i] for i in action_indices], dtype=np.float32)
        next_states, rewards, terminated, truncated, _ = envs.step(actions_np)
        dones = terminated | truncated
        for i in range(NUM_ENVS):
            agent.store_transition(states[i], action_indices[i], rewards[i], next_states[i], dones[i])
        states = next_states

    print("--- Warm-up complete. Starting high-throughput training. ---")

    states, _ = envs.reset()
    episode_count = 0
    total_timesteps = 0
    epsilon_decay_steps = int(TOTAL_TIMESTEPS * EPSILON_DECAY_FRACTION)

    pbar = tqdm(total=TOTAL_TIMESTEPS, unit='steps')
    while total_timesteps < TOTAL_TIMESTEPS:
        # --- PHASE 1: COLLECT ROLLOUTS ---
        for _ in range(N_STEPS):
            fraction = min(1.0, total_timesteps / epsilon_decay_steps)
            agent.epsilon = EPSILON_START + fraction * (EPSILON_END - EPSILON_START)

            actions_indices = agent.choose_action(states)
            actions_np = np.array([agent.action_space[i] for i in actions_indices], dtype=np.float32)

            next_states, rewards, terminated, truncated, infos = envs.step(actions_np)
            dones = terminated | truncated

            for i in range(NUM_ENVS):
                agent.store_transition(states[i], actions_indices[i], rewards[i], next_states[i], dones[i])

            states = next_states
            total_timesteps += NUM_ENVS
            pbar.update(NUM_ENVS)

            if "final_info" in infos:
                for info in infos["final_info"]:
                    if info is not None:
                        episode_count += 1
                        final_reward = info['episode']['r'][0]
                        log_data.append([episode_count, final_reward, agent.epsilon])
                        if episode_count % 10 == 0:
                            tqdm.write(f"Ep: {episode_count}, R: {final_reward:.2f}, Eps: {agent.epsilon:.3f}, Steps: {total_timesteps}")

        # --- PHASE 2: TRAIN ---
        agent.train(gradient_steps=GRADIENT_STEPS)

        # Periodic updates and saving
        if total_timesteps % TARGET_UPDATE_FREQ < (N_STEPS * NUM_ENVS):
             agent.update_target_model()

        if total_timesteps > 0 and (total_timesteps - (N_STEPS * NUM_ENVS)) // SAVE_FREQUENCY < total_timesteps // SAVE_FREQUENCY:
            model_path = os.path.join(MODEL_DIR, f"model_step_{total_timesteps}.weights.h5")
            agent.save(model_path)
            # ... (plotting and logging code)
            tqdm.write(f"\n[INFO] Checkpoint saved at step {total_timesteps}.\n")

    pbar.close()
    final_model_path = os.path.join(MODEL_DIR, "model_final.weights.h5")
    agent.save(final_model_path)
    print(f"\nTraining finished. Final model saved to {final_model_path}")
    envs.close()

## Step 6: Main Execution Block

This is the entry point of our script. It initializes the environment and the agent, and then starts either the training or testing process based on the `TEST_MODE` flag.

In [7]:
def test_agent(agent, env, model_path):
    """Tests a trained agent for a few episodes with rendering."""
    if not os.path.exists(model_path):
        print(f"Error: Model file not found at {model_path}")
        return

    print(f"Loading model from {model_path}...")
    agent.load(model_path)
    agent.epsilon = 0.0  # Set to evaluation mode (no random actions)

    for episode in range(5):  # Test for 5 episodes
        state_rgb, _ = env.reset()
        state_gray = convert_greyscale(state_rgb)
        done = False
        total_reward = 0

        print(f"--- Starting Test Episode {episode + 1} ---")
        while not done:
            action_index = agent.choose_action(state_gray)
            action_tuple = agent.action_space[action_index]

            # >>> FIX: Convert action tuple to a NumPy array for the environment.
            action_np = np.array(action_tuple, dtype=np.float32)
            next_state_rgb, reward, terminated, truncated, _ = env.step(action_np)

            done = terminated or truncated
            state_gray = convert_greyscale(next_state_rgb)
            total_reward += reward

        print(f"Episode {episode + 1} finished with reward: {total_reward:.2f}")

    env.close()

In [8]:
if __name__ == "__main__":
    def make_env():
        """Helper function to create and wrap the environment."""
        env = gym.make('CarRacing-v3', continuous=True, domain_randomize=False)
        new_space = gym.spaces.Box(low=0, high=255, shape=(84, 96, 1), dtype=np.uint8)
        return gym.wrappers.TransformObservation(env, convert_greyscale, observation_space=new_space)

    if TEST_MODE:
        print("--- Test mode selected ---")
    else:
        print("--- Starting in HIGH-SPEED TRAIN mode ---")
        try:
            envs = gym.vector.SyncVectorEnv([make_env for _ in range(NUM_ENVS)])
            agent = DDQN_Agent(num_envs=NUM_ENVS)
            agent.model.summary()
            train_agent_vectorized(agent, envs)
        except Exception as e:
            print(f"\nAn error occurred during setup or training: {e}")

--- Starting in HIGH-SPEED TRAIN mode ---


--- Warming up replay buffer with 20000 random steps... ---


100%|██████████| 2500/2500 [03:22<00:00, 12.35it/s]


--- Warm-up complete. Starting high-throughput training. ---


 10%|█         | 50184/500000 [11:20<9:01:13, 13.85steps/s] 


[INFO] Checkpoint saved at step 50176.



 20%|██        | 100360/500000 [22:54<8:13:48, 13.49steps/s]


[INFO] Checkpoint saved at step 100352.



 30%|███       | 150544/500000 [34:31<6:58:16, 13.92steps/s]


[INFO] Checkpoint saved at step 150528.



 40%|████      | 200720/500000 [46:16<6:07:29, 13.57steps/s]


[INFO] Checkpoint saved at step 200704.



 50%|█████     | 250888/500000 [58:23<5:27:38, 12.67steps/s]


[INFO] Checkpoint saved at step 250880.



 60%|██████    | 300040/500000 [1:10:23<5:51:38,  9.48steps/s]


[INFO] Checkpoint saved at step 300032.



 70%|███████   | 350224/500000 [1:22:46<3:01:41, 13.74steps/s]


[INFO] Checkpoint saved at step 350208.



 80%|████████  | 400392/500000 [1:35:23<4:08:51,  6.67steps/s]


[INFO] Checkpoint saved at step 400384.



 90%|█████████ | 450568/500000 [1:48:22<1:23:52,  9.82steps/s]


[INFO] Checkpoint saved at step 450560.



500736steps [2:01:14, 68.83steps/s]


[INFO] Checkpoint saved at step 500736.


Training finished. Final model saved to ./models/DDQN_High_Throughput_Final/model_final.weights.h5





In [10]:
# ================================================================= #
#      --- VISUAL TESTING CELL (FOR COLAB & JUPYTER NOTEBOOKS) ---  #
# ================================================================= #

import gymnasium as gym
import numpy as np
import tensorflow as tf
import os
import matplotlib.pyplot as plt
from matplotlib import animation
from IPython.display import display, HTML

# --- IMPORTANT: Set the path to the model you want to test ---
YOUR_MODEL_PATH = "./models/DDQN_High_Throughput_Final/model_final.weights.h5" # <<< CHANGE THIS

# --- Configuration for the Test ---
NUM_TEST_EPISODES = 10 # How many races you want to watch

def display_video(frames, framerate=30):
    """Converts a list of frames into an HTML video and displays it."""
    height, width, _ = frames[0].shape
    dpi = 70
    orig_backend = plt.get_backend()
    plt.switch_backend('Agg')
    fig, ax = plt.subplots(1, 1, figsize=(width / dpi, height / dpi), dpi=dpi)
    plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
    ax.set_axis_off()
    im = ax.imshow(frames[0])

    def update(frame):
        im.set_data(frame)
        return [im]

    anim = animation.FuncAnimation(fig=fig, func=update, frames=frames, interval=1000/framerate, blit=True)
    plt.close(fig)
    plt.switch_backend(orig_backend)
    return HTML(anim.to_html5_video())


def test_agent_visual(model_path, num_episodes):
    """
    Loads a trained model and runs it, creating a video for each episode.
    """
    print("--- Starting Visual Test ---")

    if not os.path.exists(model_path):
        print(f"ERROR: Model file not found at '{model_path}'.")
        return

    # 1. Create the environment with 'render_mode="rgb_array"'
    #    This tells the environment to give us image data instead of opening a window.
    env = gym.make('CarRacing-v3', continuous=True, render_mode="rgb_array")

    # 2. Initialize the agent
    agent = DDQN_Agent(num_envs=1)
    print(f"Loading model weights from: {model_path}")
    agent.load(model_path)
    agent.epsilon = 0.0 # Evaluation mode

    for episode in range(num_episodes):
        print(f"\n--- Running Test Episode {episode + 1}/{num_episodes} ---")

        state_rgb, _ = env.reset()
        state_gray = convert_greyscale(state_rgb)

        frames = [state_rgb] # List to store each frame for the video
        total_reward = 0
        terminated, truncated = False, False

        while not (terminated or truncated):
            state_batch = np.expand_dims(state_gray, axis=0)
            action_index = agent.choose_action(state_batch)[0]
            action_np = np.array(agent.action_space[action_index], dtype=np.float32)

            next_state_rgb, reward, terminated, truncated, _ = env.step(action_np)

            # Save the frame
            frames.append(next_state_rgb)

            state_gray = convert_greyscale(next_state_rgb)
            total_reward += reward

        print(f"Episode finished with Total Reward: {total_reward:.2f}")
        print("Displaying video...")

        # 3. Display the collected frames as a video
        video_html = display_video(frames)
        display(video_html)

    env.close()
    print("\n--- Visual Test Finished ---")


# --- Run the Visual Test ---
if __name__ == "__main__":
    test_agent_visual(model_path=YOUR_MODEL_PATH, num_episodes=NUM_TEST_EPISODES)

--- Starting Visual Test ---
Loading model weights from: ./models/DDQN_High_Throughput_Final/model_final.weights.h5

--- Running Test Episode 1/5 ---
Episode finished with Total Reward: -81.42
Displaying video...



--- Running Test Episode 2/5 ---
Episode finished with Total Reward: -2.26
Displaying video...



--- Running Test Episode 3/5 ---
Episode finished with Total Reward: -41.98
Displaying video...



--- Running Test Episode 4/5 ---
Episode finished with Total Reward: 432.61
Displaying video...



--- Running Test Episode 5/5 ---
Episode finished with Total Reward: 443.48
Displaying video...



--- Visual Test Finished ---


In [11]:
# ================================================================= #
#          --- DOWNLOAD YOUR MODELS AND REWARDS ---                 #
# ================================================================= #

from google.colab import files
import os

print("Zipping project files... Please wait.")

# --- Define the names for your zip files ---
MODELS_ZIP_NAME = 'models.zip'
REWARDS_ZIP_NAME = 'rewards.zip'

# --- Define the paths to the folders you want to zip ---
# This should match the MODEL_TYPE in your configuration cell.
MODELS_FOLDER_PATH = './models/DDQN_High_Throughput_Final'
REWARDS_FOLDER_PATH = './rewards/DDQN_High_Throughput_Final'


# --- Create the zip archives using a shell command ---
# The '-r' flag means "recursive", so it includes everything inside the folder.
if os.path.exists(MODELS_FOLDER_PATH):
    get_ipython().system(f"zip -r {MODELS_ZIP_NAME} {MODELS_FOLDER_PATH}")
    print(f"Successfully created '{MODELS_ZIP_NAME}'")
else:
    print(f"WARNING: Models folder not found at '{MODELS_FOLDER_PATH}'")

if os.path.exists(REWARDS_FOLDER_PATH):
    get_ipython().system(f"zip -r {REWARDS_ZIP_NAME} {REWARDS_FOLDER_PATH}")
    print(f"Successfully created '{REWARDS_ZIP_NAME}'")
else:
    print(f"WARNING: Rewards folder not found at '{REWARDS_FOLDER_PATH}'")


# --- Trigger the download for each zip file ---
print("\nStarting download... Your browser will now prompt you to save the files.")
print("If you have a pop-up blocker, you may need to allow downloads from this site.")

if os.path.exists(MODELS_ZIP_NAME):
    files.download(MODELS_ZIP_NAME)
else:
    print(f"Could not start download for '{MODELS_ZIP_NAME}' because it was not found.")

if os.path.exists(REWARDS_ZIP_NAME):
    files.download(REWARDS_ZIP_NAME)
else:
    print(f"Could not start download for '{REWARDS_ZIP_NAME}' because it was not found.")

Zipping project files... Please wait.
  adding: models/DDQN_High_Throughput_Final/ (stored 0%)
  adding: models/DDQN_High_Throughput_Final/model_final.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_400384.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_500736.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_250880.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_350208.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_300032.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_150528.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_450560.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_50176.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throughput_Final/model_step_100352.weights.h5 (deflated 10%)
  adding: models/DDQN_High_Throu

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>