# CNN-Based SameGame RL Experiment

This notebook implements a Deep Q-Network (DQN) agent using a Convolutional Neural Network to learn optimal SameGame strategies. The CNN architecture is designed to capture spatial patterns in the game board that are crucial for effective tile-clearing decisions.

## Experiment Configuration

Setting up the basic experiment parameters and metadata.

In [1]:
experiment_name = "CNN_simple_reward_base"

## Import Dependencies

Loading all necessary modules for the experiment.

In [2]:
from samegamerl.environments.samegame_env import SameGameEnv
from samegamerl.agents.dqn_agent import DqnAgent
from samegamerl.evaluation.plot_helper import plot_evals, plot_result
from samegamerl.evaluation.benchmark import Benchmark
from tqdm import tqdm
from torch import nn
from samegamerl.game.game_config import GameConfig, GameFactory
from samegamerl.training.train import train
from samegamerl.agents.replay_buffer import ReplayBuffer
from samegamerl.evaluation.benchmark_scripts import _compute_stats, benchmark_agent

ModuleNotFoundError: No module named 'asyncpg'

## Neural Network Architecture

Defining a CNN model that processes the game board as a multi-channel image. The architecture uses:
- Convolutional layers to detect local tile patterns
- Global average pooling to aggregate spatial information
- Fully connected layers for action value estimation

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self, config: GameConfig):
        super().__init__()
        self.config = config
        self.conv_stack = nn.Sequential(
            nn.Conv2d(config.num_colors, config.num_rows*config.num_cols, 3, padding=1),
            nn.ReLU(),
            # nn.MaxPool2d((2,2), (2,2)),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(),
            # nn.MaxPool2d((2,2), (2,2)),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(),
        )
        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(128, 256), nn.ReLU(), nn.Linear(256, config.action_space_size)
        )

    def forward(self, x):
        x = self.conv_stack(x)
        x = self.global_pool(x)
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

## Hyperparameters

Configuration of training parameters, exploration strategy, and reporting intervals.

In [6]:
# Training specific parameters
batch_size = 128
n_games = 1_000
max_steps = 30  # Maximum steps per episode

# Training intervals
update_target_num = 100    # Target network update frequency
report_num = 500             # Progress reporting interval
visualize_num = 0            # Visualization frequency
initial_update_done = n_games // 2

# Agent hyperparameters
learning_rate = 0.001
start_epsilon = 1.0           # Initial exploration rate
epsilon_decay = start_epsilon / n_games
final_epsilon = 0.1           # Minimum exploration rate
gamma = 0.95                   # Discount factor
tau = 0.05

## Environment and Agent Setup

Creating the SameGame environment and DQN agent with the CNN model.

In [7]:
# Use medium game configuration (8x8 board with 3 colors)
config = GameFactory.medium()

# Initialize environment and agent
env = SameGameEnv(config, partial_completion_base=5)
agent = DqnAgent(
    model=NeuralNetwork(config),
    config=config,
    model_name=experiment_name,
    learning_rate=learning_rate,
    initial_epsilon=start_epsilon,
    epsilon_decay=epsilon_decay,
    final_epsilon=final_epsilon,
    gamma=gamma,
    batch_size=batch_size,
)

agent.replay_buffer = ReplayBuffer(capacity=50_000)

Using mps device


## Load Pre-trained Model (Optional)

Loading a previously trained model to continue training from a checkpoint.

In [8]:
# Uncomment to load a pre-trained model
agent.load()

Model loaded from samegamerl/models/CNN_simple_reward_base.pth


## Training Loop

Execute the main training process using the configured parameters.

In [None]:
results = train(
    agent,
    env,
    epochs=n_games,
    max_steps=max_steps,
    report_num=report_num,
    visualize_num=visualize_num,
    update_target_num=update_target_num,
)

# Save the trained model
agent.save()

## Training Results Visualization

Plot the training progress to analyze learning performance.

In [9]:
plot_result(results, interval=10)

NameError: name 'results' is not defined

## Agent Evaluation

Evaluate the trained agent's performance on a validation set.

In [10]:
#benchmarker = Benchmark(config, 1000, storage_type='database')
#results = benchmarker.evaluate_agent(agent)
#stats = _compute_stats({agent.model_name: results})
results = benchmark_agent(agent, config, 1000)

Evaluating Custom Agent
Games: 1000

Running built-in bots...


2025-10-01 14:46:10,450	INFO worker.py:1951 -- Started a local Ray instance.


RandomBot: Using existing results for all 1000 games
LargestGroupBot: Using existing results for all 1000 games
GreedySinglesBot: Using existing results for all 1000 games
Running agent...
Evaluating agent: CNN_simple_reward_base_20251001_144612_9743f32a (ephemeral mode)


2025-10-01 14:46:13,590	INFO worker.py:1951 -- Started a local Ray instance.


  Using Ray parallel execution with 1000 tasks


Running CNN_simple_reward_base_20251001_144612_9743f32a (parallel): 100%|██████████| 1000/1000 [00:22<00:00, 44.02it/s]


Results computed but not saved (ephemeral evaluation)

Performance Results:
------------------------------
1. GreedySinglesBot
   Completion rate: 46.2%
   Avg tiles cleared: 62.9
   Avg moves made: 12.7
   Avg singles remaining: 1.1

2. RandomBot
   Completion rate: 15.2%
   Avg tiles cleared: 61.1
   Avg moves made: 13.6
   Avg singles remaining: 2.9

3. LargestGroupBot
   Completion rate: 14.3%
   Avg tiles cleared: 61.0
   Avg moves made: 11.7
   Avg singles remaining: 3.0

4. CNN_simple_reward_base
   Completion rate: 0.1%
   Avg tiles cleared: 17.4
   Avg moves made: 99.9
   Avg singles remaining: 13.2



## Interactive Game Visualization (Optional)

Watch the trained agent play the game interactively.