Creating a reinforcement learning project to simulate the board game Ghosts by Alex Randolph involves several steps. This guide will cover the rules and logic of the game, and then outline a step-by-step approach to implementing a reinforcement learning agent using Gym and Q-Learning.
Game Rules and Logic
Game Setup:
The game is played on a 6x6 board.
Each player has 8 ghosts: 4 good ghosts and 4 evil ghosts. The identity of each ghost is hidden from the opponent.
Players arrange their ghosts in the back two rows on their side of the board.
Game Objective:
A player wins by either:
Capturing all four of the opponent's good ghosts.
Having the opponent capture all four of their evil ghosts.
Moving one of their good ghosts to one of the opponent's corner exits.
Game Play:
Players take turns moving one ghost per turn.
Ghosts can move one square in any direction (up, down, left, or right) but not diagonally.
A ghost can capture an opponent's ghost by moving into its square, revealing the captured ghost's identity.
Reinforcement Learning Project
Step 1: Environment Setup
Define the State Space:
The state can be represented as a 6x6 grid where each cell can be empty, contain a good ghost, or contain an evil ghost.
Include additional information to track which ghosts belong to which player and their identities.
Define the Action Space:
Actions include moving any ghost to an adjacent square (up, down, left, right).
Rewards:
Positive reward for capturing an opponent's good ghost or moving a good ghost to an exit.
Negative reward for capturing an evil ghost or losing a good ghost.
Step 2: Implement the Game Logic
Create a Python class to simulate the game board and enforce the rules.
Implement methods to initialize the board, make moves, check for game termination, and calculate rewards.
Step 3: Create a Gym Environment
Use OpenAI Gym to create a custom environment for the game.
Implement the necessary methods: reset(), step(action), render(), and close().
Step 4: Implement Q-Learning
Initialize Q-Table:
Use a dictionary or a large array to store Q-values for state-action pairs.
Define Hyperparameters:
Learning rate (
α
α), discount factor (
γ
γ), and exploration rate (
ϵ
ϵ).
Training Loop:
For each episode, reset the environment.
For each step in the episode:
Choose an action using an epsilon-greedy policy.
Execute the action and observe the reward and next state.
Update the Q-value using the Q-learning formula:
Q
(
s
,
a
)
=
Q
(
s
,
a
)
+
α
[
r
+
γ
max
⁡
a
′
Q
(
s
′
,
a
′
)
−
Q
(
s
,
a
)
]
Q(s,a)=Q(s,a)+α[r+γmax 
a 
′
 
​
 Q(s 
′
 ,a 
′
 )−Q(s,a)]
Update the state.
Reduce the exploration rate over time.
Step 5: Evaluate the Agent
Test the trained agent against a random or heuristic-based opponent.
Evaluate its performance based on win/loss ratio and the ability to learn optimal strategies.
Step 6: Optimize and Experiment
Experiment with different hyperparameters and strategies.
Consider using more advanced techniques like Deep Q-Learning if the state space is too large.
By following these steps, you can create a reinforcement learning agent capable of playing the game Ghosts. This project will help you understand the intricacies of game AI and the application of reinforcement learning techniques in imperfect information games.

In [125]:
import gym
from gym import spaces
import numpy as np
import random
from collections import deque
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from matplotlib.colors import ListedColormap
import time
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors


In [126]:
# Actions
UP = 0
RIGHT = 1
DOWN = 2
LEFT = 3
PLACE_BLUE = 4
PLACE_RED = 5
ACTIONS = [UP, RIGHT, DOWN, LEFT, PLACE_BLUE, PLACE_RED]

# Space
EMPTY = 0
PLAYER_BLUE = 1
PLAYER_RED = 2
OPPONENT_BLUE = 3
OPPONENT_RED = 4

# TWO PLAYERS GAME

# Define mappings from action indices to names
ACTION_NAMES = {
    UP: "Move Up",
    RIGHT: "Move Right",
    DOWN: "Move Down",
    LEFT: "Move Left",
    PLACE_BLUE: "Place Blue Piece",
    PLACE_RED: "Place Red Piece"
}

# Define mappings from space values to names
SPACE_NAMES = {
    EMPTY: "Empty",
    PLAYER_BLUE: "Player Blue",
    PLAYER_RED: "Player Red",
    OPPONENT_BLUE: "Opponent Blue",
    OPPONENT_RED: "Opponent Red"
}

In [127]:
import gym
import numpy as np
from gym import spaces

# Constants for ghost types and actions
EMPTY = 0
PLAYER_BLUE = 1
PLAYER_RED = 2
OPPONENT_BLUE = 3
OPPONENT_RED = 4

UP = 0
RIGHT = 1
DOWN = 2
LEFT = 3

class GhostsEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        super(GhostsEnv, self).__init__()

        # Size of the board is size x size
        self.size = 6

        # Action space is tuple: (action, x, y, player)
        # action: 0-3 (move up, right, down, left), 4 (capture), 5 (place)
        self.action_space = spaces.Tuple((spaces.Discrete(6), spaces.Discrete(self.size), spaces.Discrete(self.size), spaces.Discrete(2)))

        # Observation space is a 6x6x4 matrix
        self.observation_space = spaces.Box(low=0, high=6, shape=(self.size, self.size, 4), dtype=int)

        # Reward
        self.reward = 0

        # Initialize the game state
        self.reset()

    def reset(self):
        # Initialize the board
        self.board = np.zeros((self.size, self.size), dtype=int)

        # Reset phase (0: placement, 1: movement)
        self.phase = 0

        # Reset ghost counts
        self.player_blue_count = 0
        self.opponent_blue_count = 0
        self.player_red_count = 0
        self.opponent_red_count = 0

        return self.board
    
    def legal_placement_actions(self, player):
        actions = []
        if player == 0:  # Player's turn
            rows = [4, 5]
            valid_cells = [(x, y) for x in rows for y in range(1, self.size - 1) if self.board[x, y] == 0]
        else:  # Opponent's turn
            rows = [0, 1]
            valid_cells = [(x, y) for x in rows for y in range(1, self.size - 1) if self.board[x, y] == 0]

        if player == 0:
            if self.player_blue_count < 4:
                for x, y in valid_cells:
                    actions.append((4, x, y, player))  # Place blue ghost
            if self.player_red_count < 4:
                for x, y in valid_cells:
                    actions.append((5, x, y, player))  # Place red ghost
        else:
            if self.opponent_blue_count < 4:
                for x, y in valid_cells:
                    actions.append((4, x, y, player))  # Place blue ghost
            if self.opponent_red_count < 4:
                for x, y in valid_cells:
                    actions.append((5, x, y, player))  # Place red ghost

        return actions

    def legal_movement_actions(self, player):
        actions = []
        
        for x in range(self.size):
            for y in range(self.size):
                if (player == 0 and self.board[x, y] in [1, 2]) or (player == 1 and self.board[x, y] in [3, 4]):
                    # Check all four directions
                    if x > 0 and self.board[x-1, y] == EMPTY:  # Move left
                        actions.append((LEFT, x, y, player))
                    if x < self.size - 1 and self.board[x+1, y] == EMPTY:  # Move right
                        actions.append((RIGHT, x, y, player))
                    if y > 0 and self.board[x, y-1] == EMPTY:  # Move up
                        actions.append((UP, x, y, player))
                    if y < self.size - 1 and self.board[x, y+1] == EMPTY:  # Move down
                        actions.append((DOWN, x, y, player))

                    # Capture opponent's ghost
                    if x > 0 and player == 0 and self.board[x-1, y] in [3, 4]:  # Capture left
                        actions.append((LEFT, x, y, player))
                    if x < self.size - 1 and player == 0 and self.board[x+1, y] in [3, 4]:  # Capture right
                        actions.append((RIGHT, x, y, player))
                    if y > 0 and player == 0 and self.board[x, y-1] in [3, 4]:  # Capture up
                        actions.append((UP, x, y, player))
                    if y < self.size - 1 and player == 0 and self.board[x, y+1] in [3, 4]:  # Capture down
                        actions.append((DOWN, x, y, player))
        
        return actions

    def step(self, action):
        self.reward = 0
        if self.phase == 0:
            self._place(action)
        else:
            self._take_action(action)

        return self.board, self._get_reward(), self._is_done(), {}

    def _place(self, action):
        action, x, y, player = action
        if player == 0:
            if action == 4:
                self.board[x, y] = PLAYER_BLUE
                self.player_blue_count += 1
            elif action == 5:
                self.board[x, y] = PLAYER_RED
                self.player_red_count += 1
        else:
            if action == 4:
                self.board[x, y] = OPPONENT_BLUE
                self.opponent_blue_count += 1
            elif action == 5:
                self.board[x, y] = OPPONENT_RED
                self.opponent_red_count += 1

    def _take_action(self, action):
        action, x, y, player = action
        if player == 0:
            if action == UP:
                self._move(x, y, x, y-1)
            elif action == RIGHT:
                self._move(x, y, x+1, y)
            elif action == DOWN:
                self._move(x, y, x, y+1)
            elif action == LEFT:
                self._move(x, y, x-1, y)
        else:
            if action == UP:
                self._move(x, y, x, y-1)
            elif action == RIGHT:
                self._move(x, y, x+1, y)
            elif action == DOWN:
                self._move(x, y, x, y+1)
            elif action == LEFT:
                self._move(x, y, x-1, y)

    def _move(self, x, y, new_x, new_y):
        if new_x >= 0 and new_x < self.size and new_y >= 0 and new_y < self.size:
            if self.board[new_x, new_y] == EMPTY:
                self.board[new_x, new_y] = self.board[x, y]
                self.board[x, y] = EMPTY
            elif self.board[new_x, new_y] == OPPONENT_BLUE:
                self.board[new_x, new_y] = PLAYER_BLUE
                self.board[x, y] = EMPTY
                self.reward = 10
                self.opponent_blue_count -= 1
            elif self.board[new_x, new_y] == OPPONENT_RED:
                self.board[new_x, new_y] = PLAYER_RED
                self.board[x, y] = EMPTY
                self.reward = -10
                self.opponent_red_count -= 1
            elif self.board[new_x, new_y] == PLAYER_BLUE:
                self.board[new_x, new_y] = PLAYER_BLUE
                self.board[x, y] = EMPTY
                self.reward = -10
                self.player_blue_count -= 1
            elif self.board[new_x, new_y] == PLAYER_RED:
                self.board[new_x, new_y] = PLAYER_RED
                self.board[x, y] = EMPTY
                self.reward = 10
                self.player_red_count -= 1

    def _get_reward(self):
        if self.phase == 0:
            return 0  # No reward during placement phase
        
        # Reward for capturing opponent's ghosts
        if self.reward == 10:
            return 1
        elif self.reward == -10:
            return -1
        
        # Small negative reward for each move to encourage efficiency
        return -0.01
    
    def _is_done(self):
        if self.phase == 0:
            if self.player_blue_count == 4 and self.player_red_count == 4 and self.opponent_blue_count == 4 and self.opponent_red_count == 4:
                self.phase = 1
                return False
            return False
        else:
            if self.player_blue_count == 0 or self.opponent_red_count == 0:
                self.reward = -100
                return True
            elif self.player_red_count == 0 or self.opponent_blue_count == 0:
                self.reward = 100
                return True

    def render(self, mode='human', close=False):
        for i in range(self.size):
            print("|", end="")
            for j in range(self.size):
                print(" ", self.board[i, j], end=" ")
            print("|")
        print()
    
    def close(self):
        pass


In [128]:
class QLearningAgent:
    def __init__(self, env, alpha=0.1, gamma=0.9, epsilon=0.1):
        self.env = env
        self.q_table = [
            np.zeros((env.size, env.size, 6, 2)),  # Player 0's Q-table
            np.zeros((env.size, env.size, 6, 2))   # Player 1's Q-table
        ]
        self.alpha = alpha  # Learning rate
        self.gamma = gamma  # Discount factor
        self.epsilon = epsilon  # Exploration rate

    def select_action(self, state, player):
        # Debug: Print the current state
        print(f"\nCurrent state for player {player}:")
        self.env.render()

        # Determine legal actions
        if np.random.rand() < self.epsilon:
            # Random exploration
            legal_actions = self.env.legal_placement_actions(player) if self.env.phase == 0 else self.env.legal_movement_actions(player)
        else:
            # Exploitation
            legal_actions = self.env.legal_placement_actions(player) if self.env.phase == 0 else self.env.legal_movement_actions(player)
            
        # Debug: Print the legal actions available
        print(f"Legal actions for player {player}: {legal_actions}")

        if not legal_actions:
            raise ValueError(f"No legal actions available for player {player} in state: {state}")

        # Check if we need to perform exploitation
        if np.random.rand() >= self.epsilon and legal_actions:
            action_q_values = []
            for a in legal_actions:
                action_index = a[0]  # Assuming action index is the first element of the tuple
                if 0 <= action_index < self.q_table[player].shape[2]:  # Check if action index is within range
                    q_value = self.q_table[player][state[0], state[1], action_index, player]
                    action_q_values.append(np.max(q_value))
                    print(f"Q-value for action {a}: {q_value}")
                else:
                    action_q_values.append(float('-inf'))

            max_q_value_index = np.argmax(action_q_values)
            action = legal_actions[max_q_value_index]
        else:
            action = legal_actions[np.random.choice(len(legal_actions))]

        # Debug: Print the chosen action
        action_name = ACTION_NAMES.get(action[0], "Unknown Action")
        print(f"Chosen action for player {player}: {action_name} (Raw action: {action})")
        
        return action



    def update(self, state, action, reward, next_state, player):
        q_value = self.q_table[player][state[0], state[1], action[0], player]
        max_next_q_value = np.max(self.q_table[player][next_state[0], next_state[1], :, player])

        # Update rule
        self.q_table[player][state[0], state[1], action[0], player] = q_value + self.alpha * (reward + self.gamma * max_next_q_value - q_value)

        # Debug: Print the Q-value update
        print(f"Updated Q-value for player {player}, state {state}, action {action}: {self.q_table[player][state[0], state[1], action[0], player]}")

    def train(self, episodes):
        for episode in range(episodes):
            state = self.env.reset()
            done = False
            player = 0  # Player starts

            print(f"\nStarting Episode {episode + 1}")
            
            while not done:
                action = self.select_action(state, player)
                next_state, reward, done, _ = self.env.step(action)
                self.update(state, action, reward, next_state)

                # Debug: Print state transition and reward
                print(f"Reward {reward}, done: {done}")

                state = next_state

                # Alternate between players
                player = 1 - player

                if done:
                    self.env.render()
                    print(f"Game over. Reward: {reward}")

            if episode % 100 == 0:
                print(f"Episode {episode + 1}: Training in progress...")

        print("Training complete.")

    def play(self):
        state = self.env.reset()
        done = False
        player = 0

        while not done:
            action = self.select_action(state, player)
            next_state, reward, done, _ = self.env.step(action)

            state = next_state
            player = 1 - player

        print("Game over.")

# Initialize the environment
env = GhostsEnv()

# Initialize the Q-learning agent
agent = QLearningAgent(env)

# Train the agent
agent.train(10)


TypeError: 'Box' object is not iterable

In [124]:

# Play the game
agent.play()



Current state for player 0:
|  0   0   0   0   0   0 |
|  0   0   0   0   0   0 |
|  0   0   0   0   0   0 |
|  0   0   0   0   0   0 |
|  0   0   0   0   0   0 |
|  0   0   0   0   0   0 |

Legal actions for player 0: [(4, 4, 1, 0), (4, 4, 2, 0), (4, 4, 3, 0), (4, 4, 4, 0), (4, 5, 1, 0), (4, 5, 2, 0), (4, 5, 3, 0), (4, 5, 4, 0), (5, 4, 1, 0), (5, 4, 2, 0), (5, 4, 3, 0), (5, 4, 4, 0), (5, 5, 1, 0), (5, 5, 2, 0), (5, 5, 3, 0), (5, 5, 4, 0)]
Q-value for action (4, 4, 1, 0): [4.16189042 4.16189042 4.16189042 4.16189042 4.16189042 4.16189042]
Q-value for action (4, 4, 2, 0): [4.16189042 4.16189042 4.16189042 4.16189042 4.16189042 4.16189042]
Q-value for action (4, 4, 3, 0): [4.16189042 4.16189042 4.16189042 4.16189042 4.16189042 4.16189042]
Q-value for action (4, 4, 4, 0): [4.16189042 4.16189042 4.16189042 4.16189042 4.16189042 4.16189042]
Q-value for action (4, 5, 1, 0): [4.16189042 4.16189042 4.16189042 4.16189042 4.16189042 4.16189042]
Q-value for action (4, 5, 2, 0): [4.16189042 4.161