Many thanks to Gordon Henderson for creating [the original notebook](https://www.kaggle.com/code/gordotron85/teaching-an-agent-to-play-connect-4) upon which this workshop is based.

# Step 1: Introduction to Connect 4 on Kaggle

**What is Kaggle?** So far this semester, we've been using Google Colab for the workshops. Kaggle is like Google Colab, but it also features a community of AI enthusiasts who share code and datasets and compete in AI competitions. I highly recommend checking out Kaggle's competitions page if that sounds interesting to you: https://www.kaggle.com/competitions

*P.S. There is an ongoing [competition specifically for Connect 4](https://www.kaggle.com/competitions/connectx) that you can participate in if you'd like!*

**What is Connect 4?** Most people have probably played Connect 4 before. If not, the rules are simple: Players take turns dropping pieces into a 7x6 grid. The first player to make a line of four pieces in a row wins. The line can be vertical, horizontal, or diagonal. (It's like tic-tac-toe but with gravity that pulls all your pieces to the bottom of the board.)

**What is the ConnectX environment?** Kaggle provides a python package called `kaggle_environments` that provides fun game environments that you can use to practice your programming skills. (It's quite similar to [OpenAI Gym](https://gym.openai.com/envs/#classic_control), if you were here for that workshop.)

The environment is called ConnectX rather than Connect 4 because it is completely configurable. You can change the size of the board or the number of pieces in a row required (from 4 to something else). But by default, we'll use the standard Connect 4 rules.

Let's create a ConnectX environment now:

In [None]:
from kaggle_environments import make

# Create a ConnectX environment with the default Connect 4 board and rules:
env = make("connectx", debug=True)

# Display the board on-screen:
env.render(mode="ipython")

Amazing! You should see a 7x6 board without any pieces placed. To actually *play* the game, we need to create an "agent". An "agent" is basically a player. Soon we will create an agent that makes smart choices all on its own. But for now, let's create a simple human agent that just asks you, the user, what move to make.

The agent is just a function that returns an integer from 0-6, where 0 means *place a piece in the far left column* and 6 means *place a piece in the far right column*. This agent will just ask you to type in an integer, and then return that.

Run the following code to define the agent. (It won't actually *do* anything quite yet.)

In [None]:
from IPython.display import clear_output

def agent_human(obs, config):
    # Normally an agent wouldn't do any rendering, but we need the human player to be able to see what's going on
    clear_output(wait=True)
    env.render(mode="ipython")
    
    # Here's the actual agent code. Rather than making a decision automatically, this agent
    # just asks the human (you) which move to make, and returns that number.
    print("Please enter your column number (0-6):")
    return int(input())

Once you've run this, it's time to give the agent a test. Run this code to start a ConnectX game between two copies of `agent_human`:

In [None]:
# Create game environment
env = make("connectx", debug=True, configuration={ "actTimeout": 9999999999 })

# Agents play one game round
env.run([agent_human, agent_human])

# Show the game replay
clear_output(wait=True)
env.render(mode="ipython")

Amazing! Grab a friend, and you'll be able to play Connect 4 against each other. Once you are used to how the column numbers work, let's create our first real agent.

# Step 2: Random Agent

Our first agent, `agent_human`, just asks the person at the computer what to do. But we want to create agents that think for themselves! Let's begin with an agent that just chooses a random column on every turn.

To do this, we first need to know how to choose a random column. And the first step there is to get a *list* of all the column numbers. Let's use the `range()` function for this. Check out the code below and edit it as specified.

In [None]:
# The following code gets a list of the first 3 numbers, starting from 0.
# TODO: Can you get the first 20 numbers (0-19) instead?
number_of_columns = 3
valid_moves = range(number_of_columns)

print(list(valid_moves))

Awesome. We can use this to get a list of all the valid column numbers. (On the default Connect 4 board, there are seven columns, numbered 0-6.) Then, to choose a random item from that list, we'll use `random.choice()`.

In [None]:
import random
random.seed(None) # Make the randomness behave as expected (don't worry about this line too much)

number_of_columns = 3
valid_moves = range(number_of_columns)

# Choose a random item from the list (0, 1, or 2):
chosen_move = random.choice(valid_moves)

print(chosen_move)

Amazing! Let's use these tools to create an agent called `agent_random` that picks a random column. (Generally there are 7 columns in a Connect 4 board, but that could change, so `config.columns` will tell us the actual number.)

Complete the code below to choose a random column:

In [None]:
import random

def agent_random(obs, config):
    # Get the number of columns in the board
    number_of_columns = config.columns

    # TODO: Create a list of all the valid moves (columns) using `range()`
    valid_moves = # ???
    
    # TODO: Choose a random move (column) from the list
    chosen_move = # ???
    
    # ...and return it
    return chosen_move

Amazing! If you've created the agent correctly, it should now be able to choose random moves. Let's try playing a game against your agent (`agent_human` vs. `agent_random`):

In [None]:
# Create game environment
env = make("connectx", debug=True, configuration={ "actTimeout": 9999999999 })

# Agents play one game round
env.run([agent_human, agent_random])

# Show the game replay
clear_output(wait=True)
env.render(mode="ipython")

This is pretty good! You can even try replacing `agent_human` in the code above with `agent_random` to watch two random agents play against each other.

In fact, playing random agents against each other is pretty interesting. Let's try having two random agents play 100 games against each other.

The following code runs a little 100-game contest between two random agents. (You don't need to change anything; it will just work.) Try running it and we'll look at the results.

In [None]:
from kaggle_environments import evaluate
import numpy as np

def get_win_percentages(agent1, agent2, n_rounds):
    config = {'rows': 6, 'columns': 7, 'inarow': 4}
    outcomes = evaluate("connectx", [agent1, agent2], config, [], n_rounds//2)
    outcomes += [[b,a] for [a,b] in evaluate("connectx", [agent2, agent1], config, [], n_rounds-n_rounds//2)]

    print("Agent 1 Win Percentage:", np.round(outcomes.count([1,-1])/len(outcomes), 2))
    print("Agent 2 Win Percentage:", np.round(outcomes.count([-1,1])/len(outcomes), 2))
    print("Number of invalid plays by Agent 1:", outcomes.count([None, 0]))
    print("Number of invalid plays by Agent 2:", outcomes.count([0, None]))

# Play two random agents against each other 100 times
get_win_percentages(agent_random, agent_random, 100)

Uh oh! There's a problem here. We would expect each agent to win about half the time, but in reality they are winning about a third of the time each. What is happening in the other third of games?

Well, it says that the agents are making invalid plays. What does that mean?

Since `agent_random` just picks a random column from 0-6, it can sometimes try to play in a column that is already full. This is against the rules!

To fix this, we need to make sure that `valid_moves` only contains the moves that are *actually* valid (i.e. column numbers for columns that are not empty). This means we are going to have to look at the state of the board, which is stored in the variable `obs` (which stands for "observation"). Let's print out an example of what `obs` could be, just to understand what it contains...

In [None]:
# Reset the environment and manually make a few moves to get an interesting board position
data, _ = env.reset()
data, _ = env.step([3, 0])
data, _ = env.step([0, 2])
data, _ = env.step([3, 0])
data, _ = env.step([0, 3])

# Get `obs` so we can see what it looks like
obs = data.observation

print(obs)

Interesting! So `obs` contains `obs.board`, which is a big list of numbers. In a freshly-reset game, as above, all the entries are 0. But each number corresponds to a tile on the board, so as the game progresses, those 0s will be replaced with 1s (for player 1's pieces) and 2s (for player 2's pieces). It also contains `obs.mark`, which is the number of the current player (so in the example above, it is currently player 1's turn).

The board is given as a list, but it would be much nicer to have it as a rectangle. Let's use `np.reshape()` to turn it into one:

In [None]:
import numpy as np

grid = np.asarray(obs.board).reshape(6, 7)

print(grid)

Sweet! Now we can check if a particular column is full or not by just looking at the very top row. The following function should get all the valid moves (i.e. the non-full columns) from a particular board. You just need to make one change to make it correct:

In [None]:
def get_valid_moves(obs, config):
    # Get the board as a rectangle
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)

    first_row = grid[0]
    
    # Check each column to find the ones that are not full
    valid_moves = []
    for column_index in range(config.columns):
        if first_row[column_index] == 0:
            # This `column_index` is not full!
            # TODO: Add `column_index` to the list `valid_moves`
            # (Hint: Use `valid_moves.append()`)
            # ???
    
    return valid_moves

Once you have the above function correct, this new version of the agent should always work:

In [None]:
import random

def agent_random(obs, config):
    valid_moves = get_valid_moves(obs, config)
    return random.choice(valid_moves)

Let's give it a try by putting to `agent_random`s against each other. Hopefully they will always make valid moves. Watch and see:

In [None]:
# Create game environment
env = make("connectx", debug=True)

# Agents play one game round
env.run([agent_random, agent_random])

# Show the game replay
clear_output(wait=True)
env.render(mode="ipython")

If you've written your code correctly, everything should be working. You can even try playing against the random agent yourself by replacing one of the `agent_random`s in the code above with `agent_human`.

Now let's try the 100-game competition again and see what happens:

In [None]:
get_win_percentages(agent_random, agent_random, 100)

Hopefully you see zero invalid plays and about a 50% win rate for each agent.

# Step 3: Smart Agent

Let's try to make an agent that is better than random.

An obvious first step is to make an agent that makes random choices except when there's something obviously good to do. So if there's a way to win, the agent should take it right away. And if there's a way for the opponent to win on their next turn, the agent should block that move right away. Otherwise, just play randomly.

To help with this, I am going to provide you with two useful functions that are already correct. The first one, `drop_piece` will simulate dropping a piece so that the agent can see what the board would look like after making a particular move. The second function, `check_winning_grid`, will look at a grid of pieces and decide whether or not a particular player has won the game (i.e. whether a certain player has any 4-in-a-rows).

Run the following code once, and then you will have the functions available when you need them.

In [None]:
import numpy as np
import random

# Gets board at next step if agent drops piece in selected column
def drop_piece(grid, config, column, player):
    next_grid = grid.copy()
    for row in range(config.rows-1, -1, -1):
        if next_grid[row][column] == 0:
            break
    next_grid[row][column] = player
    return next_grid

# Check if a particular `player` has won the game on a particular `grid`
def check_winning_grid(grid, player, config):
    # horizontal
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row,col:col+config.inarow])
            if window.count(player) == config.inarow:
                return True

    # vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow,col])
            if window.count(player) == config.inarow:
                return True

    # positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if window.count(player) == config.inarow:
                return True

    # negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if window.count(player) == config.inarow:
                return True

    return False

Excellent. Now we're ready to try building `agent_smart`. Let's define a function called `check_winning_move` that will check to see whether placing a piece in a particular column will cause the player to win. (Then we will use this function in the logic for `agent_smart`.)

The following code is incomplete. Can you finish it?

In [None]:
# Returns True if dropping piece in column results in game win
def check_winning_move(obs, config, column, player):
    # Convert the board to a 2D grid
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)
    
    # Get the new grid after dropping the player's piece in the column
    # TODO: Use `drop_piece` to make this work
    next_grid = # ???

    # TODO: Return whether or not the player has won on `new_grid`
    # (Hint: Use `check_winning_grid` for this.)
    return # ???

Let's test the code to make sure it works. We'll simulate a game and hopefully `check_winning_move` will return `True` when there is a winning move available for player 1 and `False` when there is not a winning move available for player 2. Give it a try:

In [None]:
# Start a game and simulate some moves
env = make("connectx", debug=True)
data, _ = env.step([0, 0])
data, _ = env.step([0, 0])
data, _ = env.step([1, 0])
data, _ = env.step([0, 1])
data, _ = env.step([2, 0])
data, _ = env.step([0, 2])
obs = data.observation

# Draw the board
env.render(mode="ipython")

# Check if player 1 (blue) could win by playing in the middle column (3)
# Hopefully this is true
print("Player 1 can win? (Hopefully True):", check_winning_move(obs, env.configuration, 3, 1))

# Check if player 2 (white) could win by playing in the middle column (3)
# Hopefully this is false
print("Player 2 can win (Hopefully False):", check_winning_move(obs, env.configuration, 3, 2))

Awesome! Once `check_winning_move` is working, we can use it to create `agent_smart`. Recall that `agent_smart` should do three things:

1. If it can find a winning move (i.e. an instant win opportunity) for itself, it plays there.
2. If it can find a winning move for its oppenent, it plays there in order to block.
3. Otherwise, it just plays in a random column.

The following code does most of that. Your job is to finish the code using `check_winning_move`:

In [None]:
def agent_smart(obs, config):
    # Get a list of all the columns that are not full (the valid moves)
    valid_moves = get_valid_moves(obs, config)
    
    # Get the number of the `current_player` and the `other_player`
    current_player = obs.mark
    other_player = 2 if current_player == 1 else 1
    
    # If there's a winning move for the `current_player`, make it
    for column in valid_moves:
        # TODO: Check if this `column` is a winning move for the `current_player`
        # and return this column as our choice if so. (Hint: Use `check_winning_move`.)
        if # ???
            return column
    
    # If there's a winning move for the `other_player`, block it
    # by making the move that the other player wants
    for column in valid_moves:
        # TODO: Check if this `column` is a winning move for the `other_player`
        # and return this column as our choice if so.
        if # ???
            return column
    
    # Otherwise, choose to play in a random valid column
    return random.choice(valid_moves)

Let's see agent_smart (blue) play against agent_random (white).

In [None]:
# Agents play one game round
env.reset()
env.run([agent_smart, agent_random])

# Show the game
env.render(mode="ipython")

Now let's see how many times agent_smart beats agent_random in 100 games.

In [None]:
get_win_percentages(agent_smart, agent_random, 100)

Wow! `agent_smart` is a *lot* better than `agent_random`. Try playing some games against `agent_smart` yourself:

In [None]:
# Agents play one game round
env.reset()
env.run([agent_smart, agent_human])

# Show the game
clear_output()
env.render(mode="ipython")

You should notice that this agent is a much smarter oponnent, and you'll have to actively trick it in order to win (for example, by creating two possible winning positions for yourself that are directly on top of each other). But it is still obvious that it is playing randomly most of the time; it doesn't intentionally build good positions for itself.

# Step 4: Lookahead Agent

Let's replace the random behavior of `agent_smart` with something a little more intentional. We want the agent to build good board positions for itself, so what if we found a way to *score* particular boards based on whether they are good or bad?

We could count the occurrences of each of the following situations, and assign a score accordingly:

* **A:** The agent has four discs in a row (the agent won),
* **B:** The agent filled three spots, and the remaining spot is empty (the agent wins if it fills in the empty spot).
* **C:** The agent filled two spots, and the remaining two spots are empty (the agent wins if it fills in the empty two spots).
* **D:** The opponent filled two spots, and the remaining two spots are empty (the opponent wins by filling in the empty two spots).
* **E:** The opponent filled three spots, and the remaining spot is empty (the opponent wins by filling in the empty spot).

The board is better when A, B, and C are present and worse when D and E are present, so we can calculate the score accordingly. (Each time situation A appears, it's worth 10,000,000,000 points. B is worth 10,000. C is worth 100. D is worth -1. E is worth -1,000,000.)

For the sake of time (and convenience), the `score_move` function (plus the related helper functions) is created for you. You don't need to change anything, but it might be valuable to take a look at `score_move` and see if it makes sense. (If you have questions, feel free to ask!)

Run the following code to define `score_move`. No need to make any changes.

In [None]:
import numpy as np
import random

# Calculates score if agent drops piece in selected column
def score_move(grid, config, column, player):
    next_grid = drop_piece(grid, config, column, player)
    score = get_heuristic(next_grid, player, config)
    return score

# Helper function for score_move: calculates value of heuristic for grid    
def get_heuristic(grid, player, config):
    other_player = 1 if player == 2 else 2

    num_twos = count_windows(grid, 2, player, config)
    num_threes = count_windows(grid, 3, player, config)
    num_fours = count_windows(grid, 4, player, config)
    num_twos_opp = count_windows(grid, 2, other_player, config)
    num_threes_opp = count_windows(grid, 3, other_player, config)

    score = 1e10*num_fours + 1e4*num_threes + 1e2*num_twos + -1*num_twos_opp + -1e6*num_threes_opp
    return score

# Helper function for get_heuristic: checks if window satisfies heuristic conditions
def check_window(window, num_discs, piece, config):
    return (window.count(piece) == num_discs and window.count(0) == config.inarow-num_discs)

# Helper function for get_heuristic: counts number of windows satisfying specified heuristic conditions
def count_windows(grid, num_discs, piece, config):
    num_windows = 0

    # horizontal
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row, col:col+config.inarow])
            if check_window(window, num_discs, piece, config):
                num_windows += 1

    # vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow, col])
            if check_window(window, num_discs, piece, config):
                num_windows += 1

    # positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1

    # negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1

    return num_windows

Now that we have `score_move`, which gives a score based on how good or bad a move is for the given player, we can use it to create `agent_lookahead`. **The agent should simply search all the `valid_moves` for the one that gives the best score, and then play that.** This time, it's your job to do most of the hard work. Try to fill in the `# ???` in the following code with code that does this. If you need help, feel free to ask!

In [None]:
def agent_lookahead(obs, config):
    # Get list of valid moves
    valid_moves = get_valid_moves(obs, config)

    # Convert the board to a 2D grid
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)

    # Get the current player
    player = obs.mark
    
    # TODO: Use `score_move` to choose the best move out of all the `valid_moves`
    # Return the column number that is best!
    # ???

Sweet! Once you have an `agent_lookahead` that you think is correct, try putting `agent_lookahead` and `agent_smart` against each other. Hopefully `agent_lookahead` (blue) will win most of the time:

In [None]:
# Agents play one game round
env.run([agent_lookahead, agent_smart])

# Show the game
env.render(mode="ipython")

Let's run a 100-game competition and see the stats:

In [None]:
get_win_percentages(agent_lookahead, agent_smart, 100)

Amazing! It seems like `agent_lookahead` is genuinely better than `agent_smart`. Finally, take some time to play against `agent_lookahead` yourself. Can you win? Can you win consistently?

In [None]:
# Agents play one game round
env.reset()
env.run([agent_lookahead, agent_human])

# Show the game
clear_output()
env.render(mode="ipython")

# Step 5 (Bonus): Minimax Agent

Lookahead plays pretty well, but it doesn't really look *that* far head. It is just scoring each possible move based on the board position it would create. A better strategy is to simulate multiple moves into the future and pick the best possible path forward. This is what the minimax algorithm does.

I've provided an example minimax agent here. You can try poking around with the code to see how it works; it's generally a bit better than `agent_lookahead`, but it also takes longer to play.

In [None]:
import numpy as np
import random

# Uses minimax to calculate value of dropping piece in selected column
def score_move_minimax(grid, col, mark, config, nsteps):
    next_grid = drop_piece(grid, config, col, mark)
    score = minimax(next_grid, nsteps-1, False, mark, config)
    return score

# Helper function for minimax: checks if game has ended
def is_terminal_node(grid, config):
    # Check for draw 
    if list(grid[0, :]).count(0) == 0:
        return True
    
    # Check for player 1 win
    if check_winning_grid(grid, 1, config):
        return True

    # Check for player 2 win
    if check_winning_grid(grid, 2, config):
        return True
    
    return False

# Minimax implementation
def minimax(node, depth, maximizingPlayer, player, config):
    valid_moves = [c for c in range(config.columns) if node[0][c] == 0]

    if depth == 0 or is_terminal_node(node, config):
        return get_heuristic(node, player, config)
    
    other_player = 1 if player == 2 else 2

    if maximizingPlayer:
        value = -np.Inf
        for col in valid_moves:
            child = drop_piece(node, config, col, player)
            value = max(value, minimax(child, depth-1, False, player, config))
        return value
    else:
        value = np.Inf
        for col in valid_moves:
            child = drop_piece(node, config, col, other_player)
            value = min(value, minimax(child, depth-1, True, player, config))
        return value

def agent_minimax(obs, config):
    # If agent gets first move, put marker in middle column
    #if sum(obs.board) == 0:
        #return 3

    # Get list of valid moves
    valid_moves = [c for c in range(config.columns) if obs.board[c] == 0]

    # Convert the board to a 2D grid
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)

    # Use the heuristic to assign a score to each possible board in the next step
    scores = dict(zip(valid_moves, [score_move_minimax(grid, col, obs.mark, config, 3) for col in valid_moves]))

    # Get a list of columns (moves) that maximize the heuristic
    max_cols = [key for key in scores.keys() if scores[key] == max(scores.values())]

    # Select at random from the maximizing columns
    return random.choice(max_cols)

Use the following code to play against `agent_minimax`:

In [None]:
# Agents play one game round
env.reset()
env.run([agent_minimax, agent_human])

# Show the game
env.render(mode="ipython")