In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## 🎯 Epsilon Decision Function for Exploration vs Exploitation

In reinforcement learning, the **epsilon-greedy strategy** helps balance **exploration** (trying new actions) and **exploitation** (choosing the best-known action). Here's how it works:

In [5]:
import random

#Define epsilon decision function
def epsilonDecision(epsilon):
  action_decision = random.choices(['model','random'], weights = [1 - epsilon, epsilon])[0]
  return action_decision
epsilonDecision(epsilon = 0) # would always give 'model'

'model'

## 🎮 Creating and Interacting with a ConnectX Environment

In this section, we set up a **ConnectX** game environment using `kaggle_environments`, a powerful tool to simulate competitive reinforcement learning games on Kaggle.

In [6]:
from kaggle_environments import evaluate, make, utils
import gym
env = make("connectx", debug=True)
env.render()
#Creates a new random trainer
trainer = env.train([None, 'negamax'])
#Resets the board, shows initial state of all 0
trainer.reset()['board']
#Make a new action: play position 4
new_obs, winner, state, info = trainer.step(4)

ModuleNotFoundError: No module named 'kaggle_environments'

## 🧠 Building a Deep Q-Network (DQN) for ConnectX

In this step, we define a Deep Q-Network (DQN) using TensorFlow/Keras to learn how to play ConnectX.

### 📌 Key Components:

- **Number of Actions**: Set to 7 — one for each column where a move can be made.
- **State Representation**: The board is a 6x7 grid representing the current game state.

### 🧱 Network Architecture:

- **Input Layer**: Accepts the 6x7 board.
- **Flatten Layer**: Converts the 2D grid into a 1D vector.
- **Hidden Layers**: Seven fully connected layers with 50 neurons each and ReLU activation — this allows the model to learn complex patterns and strategies.
- **Output Layer**: A dense layer with 7 units (one per action), using a linear activation function to produce Q-values for each possible move.

This DQN architecture is designed to map game states to action values, enabling the agent to select optimal moves based on learned experiences.


In [7]:
#Number of possible actions
num_actions = 7
#Number of different states
num_states = [6,7]

# Create the DQN
import tensorflow as tf
inputs = tf.keras.layers.Input(shape=(6, 7))
flatten = tf.keras.layers.Flatten()(inputs)
hidden_1 = tf.keras.layers.Dense(50, activation = 'relu')(flatten)
hidden_2 = tf.keras.layers.Dense(50, activation = 'relu')(hidden_1)
hidden_3 = tf.keras.layers.Dense(50, activation = 'relu')(hidden_2)
hidden_4 = tf.keras.layers.Dense(50, activation = 'relu')(hidden_3)
hidden_5 = tf.keras.layers.Dense(50, activation = 'relu')(hidden_4)
hidden_6 = tf.keras.layers.Dense(50, activation = 'relu')(hidden_5)
hidden_7 = tf.keras.layers.Dense(50, activation = 'relu')(hidden_6)
output = tf.keras.layers.Dense(num_actions, activation = "linear")(hidden_7) 
model = tf.keras.models.Model(inputs = [inputs], outputs = [output])

## 🤖 Agent Decision-Making, Experience Tracking, and Reward Logic

This section includes key functions and classes to control how the agent interacts with the environment, chooses actions, and learns from experience.

---

### 🎯 `getAction(model, observation, epsilon)`

This function uses the **epsilon-greedy policy** to decide whether to:
- **Explore** by choosing a random action, or
- **Exploit** the current model by selecting the best action based on predicted Q-values.

- The game state (observation) is reshaped and passed through the model.
- Predictions are passed through a **softmax** function to obtain probability-like weights.
- Based on the epsilon decision, the agent either selects:
  - The action with the highest weight (`argmax`) or
  - A random action from the available options.

---

### 🧳 `Experience` Class

This class is a **memory buffer** to store the agent’s gameplay data:

- **`observations`**: Stores board states.
- **`actions`**: Stores actions taken.
- **`rewards`**: Stores rewards received.

This experience data is essential for training the DQN using reinforcement learning techniques like experience replay.

---

### ✅ `checkValid(obs, action)`

Ensures that the selected action is **valid** (i.e., the column is not full). If an invalid action is detected, the agent randomly selects a valid column.

- Helps prevent illegal moves that would break the game logic.

---

### 🏆 `getReward(winner, state)`

Defines the **reward strategy** based on the game outcome:

- **0**: Game is still ongoing.
- **+50**: Player 1 (agent) wins.
- **-50**: Player 2 (opponent) wins or the game is a draw.

This reward structure helps reinforce winning moves and penalize losses or non-optimal outcomes, guiding the learning process.

---

These utilities work together to drive the agent's decision-making, learning, and interaction with the ConnectX environment effectively.


In [8]:
def getAction(model, observation, epsilon):
  #Get the action based on greedy epsilon policy
  action_decision = epsilonDecision(epsilon)
  #Reshape the observation to fit in model
  observation = np.array([observation])
  #Get predictions
  preds = model.predict(observation)
  #Get the softmax activation of the logits
  weights = tf.nn.softmax(preds).numpy()[0]
  if action_decision == 'model':
    action = np.argmax(weights)
  if action_decision == 'random':
    action = random.randint(0,6)
  return int(action), weights

In [9]:
class Experience:
  def __init__(self):
    self.clear() 
  def clear(self):
    self.observations = []
    self.actions = []
    self.rewards = []
  def store_experience(self, new_obs, new_act, new_reward):
    self.observations.append(new_obs)
    self.actions.append(new_act)
    self.rewards.append(new_reward)

In [10]:
# Check if action is valid
# #Requires reshape(6,7) format
def checkValid(obs, action):
  valid_actions = set([0,1,2,3,4,5,6])
  if obs[0,action] != 0:
    valid_actions = valid_actions - set([action])
    action = random.choice(list(valid_actions))
  return action

In [11]:
#Function to get reward
def getReward(winner, state):
  #Game not done wet
  if not state:
    reward = 0
  if state: 
    #If player 1 wins
    if winner == 1:
      reward = 50
    #If player 2 wins
    if winner == -1:
      reward = -50
    if winner == 0:
      reward = -50
  return reward

## 🛠️ Training Step: Updating the DQN

The `train_step` function defines how the model is trained using collected gameplay experiences. It updates the agent’s neural network weights based on the **observed states**, **chosen actions**, and **received rewards**.

---

### 🔁 What Happens in Each Training Step?

- **Forward Pass**: The current observations (states) are fed into the model to compute predicted logits (raw Q-values before softmax).
  
- **Loss Calculation**: 
  - Uses **sparse softmax cross-entropy** to compare predicted logits with actual actions taken.
  - The loss is weighted by the reward received for each action, which encourages the network to favor actions that led to better outcomes.

- **Gradient Calculation**: 
  - A `GradientTape` tracks operations to compute the gradient of the loss with respect to the model's trainable parameters.

- **Backpropagation**: 
  - Gradients are applied using the specified optimizer to adjust model weights and improve future decision-making.

---

### 📌 Why It Works:

This training approach helps the model learn to assign **higher probabilities** (Q-values) to actions that lead to **higher rewards**, thus improving its performance over time.

This is a simple yet effective reinforcement learning update mechanism tailored for discrete action environments like ConnectX.


In [12]:
def train_step(model, optimizer, observations, actions, rewards):
    with tf.GradientTape() as tape:
      #Propagate through the agent network
        logits = model(observations)
        softmax_cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=actions)
        loss = tf.reduce_mean(softmax_cross_entropy * rewards)
        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))

## 🧪 Training the DQN Agent Against a Random Opponent

In this section, we train **Player 1 (the model)** to play ConnectX against a simple **random agent (Player 2)**.

---

### ⚙️ Setup:

- **Environment**: Created using `kaggle_environments.make("connectx")`.
- **Optimizer**: Adam optimizer is used to update the model’s weights.
- **Experience Buffer**: Stores observations, actions, and rewards during each episode.
- **Epsilon-Greedy Policy**: 
  - Starts with full exploration (`epsilon = 1`)
  - Gradually shifts toward exploitation by decaying `epsilon` each episode.

---

### 🔁 Training Loop Overview:

1. **Episodes**: Runs 100 training episodes.
2. **Game Initialization**:
   - Sets up a new match against a random agent.
   - Resets the board and clears stored experience.
3. **Agent's Turn**:
   - Chooses an action using the epsilon-greedy strategy.
   - Ensures the action is valid (column is not full).
   - Plays the action and observes the new board state.
   - Calculates the reward based on the outcome.
   - Stores the experience (observation, action, reward).
4. **Episode End**:
   - If the game ends, the model is updated via `train_step`.
   - Tracks the number of wins over episodes.

---

### 📈 Why This Works:

This loop teaches the model to gradually learn winning strategies by playing many games against a random agent. Over time:
- The agent starts exploiting more (making smarter moves).
- The model improves by adjusting to patterns in winning vs. losing moves.
- The win rate should increase, as tracked in the `win_track` list.

This forms the foundation of reinforcement learning training in a discrete environment like ConnectX.


In [None]:
import numpy as np
import tqdm

#Train P1 (model) against random agent P2
#Create the environment
env = make("connectx", debug=True)
#Optimizer
optimizer = tf.keras.optimizers.Adam()
#Set up the experience storage
exp = Experience()
epsilon = 1
epsilon_rate = 0.995
wins = 0
win_track = []
for episode in tqdm.tqdm(range(100)):
  #Set up random trainer
  trainer = env.train([None, 'random'])
  #First observation
  obs = np.array(trainer.reset()['board']).reshape(6,7)
  #Clear cache
  exp.clear()
  #Decrease epsilon over time if we want
  epsilon = epsilon * epsilon_rate
  #Set initial state
  state = False
  while not state:
    #Get action
    action, w = getAction(model, obs, epsilon)
    #Check if action is valid
    while True:
      temp_action = action
      action = checkValid(obs, temp_action)
      if temp_action == action:
        break
    #Play the action and retrieve info
    new_obs, winner, state, info = trainer.step(action)
    obs = np.array(new_obs['board']).reshape(6,7)
    #Get reward
    reward = getReward(winner, state)
    #Store experience
    exp.store_experience(obs, action, reward)
    #Break if game is over
    if state:
      #This would be where training step goes I think
      if winner == 1:
        wins += 1
      win_track.append(wins)
      train_step(model, optimizer = optimizer,
                 observations = np.array(exp.observations),
                 actions = np.array(exp.actions),
                 rewards = exp.rewards)
      break

In [None]:
# model.save("/kaggle/input/connect-4-model/keras/default/1/connect4_agent_model.h5")

In [14]:
import pandas as pd
import numpy as np
import tensorflow as tf

In [15]:
test_model = tf.keras.models.load_model("/home/soham/Downloads/connect4_agent_model.h5")
model = test_model

TypeError: Error when deserializing class 'InputLayer' using config={'batch_shape': [None, 6, 7], 'dtype': 'float32', 'sparse': False, 'name': 'input_layer_4'}.

Exception encountered: Unrecognized keyword arguments: ['batch_shape']

## 🎮 Play Connect 4 Against Your Trained AI!

This script sets up a **playable Connect 4 game** where you (the human) face off against the **trained AI agent** using the DQN model.

---

### 🧩 Game Logic Components:

- **Board Setup**:
  - The game board is a 6x7 NumPy array initialized with zeros.
  - Emoji representation (`⚪`, `🔴`, `🟡`) makes the game more visual and user-friendly in the notebook.

- **Game Mechanics**:
  - Validates moves to ensure players only drop tokens in legal columns.
  - Handles token placement and updates the board accordingly.
  - Includes all winning condition checks: horizontal, vertical, and both diagonals.
  - Detects full board for a draw outcome.

---

### 🧠 AI Integration:

- **Model Prediction**:
  - The trained model is loaded and used to predict Q-values for the current board state.
  - Invalid moves are masked with `-inf` so the AI avoids illegal plays.
  - The action with the highest Q-value among valid options is chosen.

---

### 🎮 Gameplay Loop:

- **Turn-Based Play**:
  - Human plays first by selecting a column via input.
  - The AI responds immediately after with its move.
  - The board is printed after every move to visualize progress.
  - Game ends on a win or a draw.

---

### 📁 Model Loading:

- The model is loaded from:  
  `/kaggle/working/connect4_agent_model_2.h5`

Make sure this file exists in your notebook directory for the script to run properly.

---

### ✅ How to Play:

- Run the cell, and you'll be prompted to enter a column (0–6) for your move.
- Watch the AI respond intelligently based on what it learned during training!

This is a fun and interactive way to evaluate how well your DQN agent learned the game of Connect 4.


In [2]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import load_model

ROWS = 6
COLUMNS = 7

def create_board():
    return np.zeros((ROWS, COLUMNS), dtype=int)

def print_board(board):
    emoji_map = {0: "⚪", 1: "🔴", 2: "🟡"}
    print()
    for row in board:
        print(" ".join(emoji_map[val] for val in row))
    print("0️⃣ 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣")

def is_valid_location(board, col):
    return board[0][col] == 0

def get_next_open_row(board, col):
    for r in reversed(range(ROWS)):
        if board[r][col] == 0:
            return r
    return None

def drop_piece(board, row, col, piece):
    board[row][col] = piece

def winning_move(board, piece):
    # Horizontal
    for r in range(ROWS):
        for c in range(COLUMNS - 3):
            if np.all(board[r, c:c+4] == piece):
                return True
    # Vertical
    for c in range(COLUMNS):
        for r in range(ROWS - 3):
            if np.all(board[r:r+4, c] == piece):
                return True
    # Positive diagonal
    for r in range(ROWS - 3):
        for c in range(COLUMNS - 3):
            if all(board[r+i][c+i] == piece for i in range(4)):
                return True
    # Negative diagonal
    for r in range(3, ROWS):
        for c in range(COLUMNS - 3):
            if all(board[r-i][c+i] == piece for i in range(4)):
                return True
    return False

def is_board_full(board):
    return np.all(board[0] != 0)

def get_valid_actions(board):
    return [c for c in range(COLUMNS) if is_valid_location(board, c)]

def get_ai_move(model, board):
    valid_moves = get_valid_actions(board)
    state = board.copy()
    state = state.reshape(1, ROWS, COLUMNS, 1).astype("float32")

    q_values = model.predict(state, verbose=0)[0]

    # Mask invalid moves
    for c in range(COLUMNS):
        if c not in valid_moves:
            q_values[c] = -np.inf

    return int(np.argmax(q_values))

def play_game(model):
    board = create_board()
    game_over = False
    print_board(board)

    while not game_over:
        # Player 1 (Human)
        valid = False
        while not valid:
            try:
                col = int(input("Your turn (0-6): "))
                if 0 <= col < COLUMNS and is_valid_location(board, col):
                    valid = True
                else:
                    print("Invalid column. Try again.")
            except ValueError:
                print("Enter a number between 0 and 6.")
        
        row = get_next_open_row(board, col)
        drop_piece(board, row, col, 1)
        print_board(board)

        if winning_move(board, 1):
            print("You win!")
            game_over = True
            break
        elif is_board_full(board):
            print("Draw!")
            break

        # Player 2 (AI)
        col = get_ai_move(model, board)
        row = get_next_open_row(board, col)
        drop_piece(board, row, col, 2)
        print(f"AI played column {col}")
        print_board(board)

        if winning_move(board, 2):
            print("AI wins!")
            game_over = True
        elif is_board_full(board):
            print("Draw!")
            game_over = True

if __name__ == "__main__":
    # model = load_model("/kaggle/working/connect4_agent_model_2.h5")
    play_game(model)


2025-04-16 14:47:20.475844: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-16 14:47:20.879160: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-04-16 14:47:20.879218: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-04-16 14:47:20.880672: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-16 14:47:21.145479: I tensorflow/core/platform/cpu_feature_g

NameError: name 'model' is not defined