<h1>ConnectX - Functions Explained</h1>

<p>
The following notebook follows the third step of the tutorial. It explains each function in depth so the reader can understand the function of the minimax agent. This notebook is self-contained and can be submitted but its value is in the explanations as no changes (other than comments) have been made. For similar notebooks, see my other notebook on ConnectX or the Intro to Game AI and Reinforcement Learning course.
</p>

<h3>General issues</h3>
<p>
    Some of you may just skim this to look for quick tips on how to debug your program so I've attached these first. If these do not make sense, I recommend continuing reading.
    <ul>
        <li>The environment variable <i>config</i> is not a dictionary. You can access values through dot notation i.e. config.rows.</li>
        <li>Some functions are redundant. This is pointed out below and alternatives and adjustments are suggested.
            <ul>
                <li>Replace <i>is_terminal_window()</i> and <i>is_terminal_node()</i></li>
                <li>Include the first node in our <i>minimax()</i> function rather than considering the first node in <i>agent_minimax()</i></li>
            </ul>
        <li>Requiring the grid size to be 6x7 and that we are playing Connect4 will simplify your program to help debugging.</li>
    </ul>
</p>

<h2>Implement the game environment</h2>
<p>
Let's setup our environment as shown in the course notebook. This allows us to:
    <ol>
        <li>Test and edit our agents</li>
        <li>Submit our solution</li>
    </ol>
To show this is running, we pit two random agents against one another.
</p>

In [None]:
# Here we import standard libraries and our environment
# You must first add the data for the task in the settings column
import random
import numpy as np
from kaggle_environments import make, evaluate

# Create the game environment
env = make("connectx", debug=True)

#Run to random agents against one another
env.run(["random", "random"])

# To render using iPython, we have to use a notebook as the Kaggle editor can't show HTML objects
# Rendering allows us to observe the game
env.render(mode="ipython")

<h3> 

<h2>Functions to show if the game is over: <i>is_terminal_window()</i> and <i>is_terminal_node()</i></h2>

<p>
    <i>is_terminal_node()</i> returns only True or False to indicate whether the game has been won by either player (True) or not yet won (False). We can think of this function as telling us whether the game is finished or not hence <i>is_finished()</i> may have been a better name. The function has a few distinct parts:
    <ol>
        <li>Check if all moves have been played. If so, return False to indicate the game has not been won</li>
        <li>Check all possible groupings (called windows) of "4 in a row" by considering groupings with orientations:
            <ul>
                <li>Horizontal</li>
                <li>Vertical</li>
                <li>Positive diagonal (i.e. groupings running from the top-left to the bottom-right)</li>
                <li>Negative diagonal (i.e. groupings running from the top-right to the bottom-left)</li>
            </ul>
        </li>
    </ol>
</p>

<p>
    <i>is_terminal_window()</i> is a function used by the <i>is_terminal_node()</i> function. It returns True if any grouping (called window in the script) contains all 1 values or all 2 values to indicate either player has won.
</p>

<p>
    These functions are quite large and can be hard to understand but their use is simple - to determine whether either player has won.
</p>

In [None]:
# Helper function for minimax: checks if agent or opponent has four in a row in the window
# A window is a group of four slots in our game
# If all 4 slots in the window contain either a 1 or a 2 then one of the players has won and True is returned
def is_terminal_window(window, config):
    return window.count(1) == config.inarow or window.count(2) == config.inarow


# Helper function for minimax: checks if game has ended
def is_terminal_node(grid, config):

    # Check for draw
    # The following line looks for remaining empty slots in the top row
    # If none are found, there can be no more moves so the game ends and we return True
    if list(grid[0, :]).count(0) == 0:
        return True
    
    # Check for win: horizontal, vertical, or diagonal
    # Horizontal
    # The grid is shown below. We loop through all slots with an x
    # Next, we create a window with our slot and the 3 slots to the right
    # Then we call is_terminal_window() with our window as a parameter
    # If a player has won then is_terminal_window() returns True and we return True
    # [[x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0]]
    # For all following loops, we will show only the grid as the explanation remains the same
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row, col:col+config.inarow])
            if is_terminal_window(window, config):
                return True
            
    # Vertical
	# [[x, x, x, x, x, x, x],
	#  [x, x, x, x, x, x, x]
	#  [x, x, x, x, x, x, x],
	#  [0, 0, 0, 0, 0, 0, 0],
	#  [0, 0, 0, 0, 0, 0, 0],
	#  [0, 0, 0, 0, 0, 0, 0]]
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow, col])
            if is_terminal_window(window, config):
                return True
            
    # Positive diagonal
	# [[x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [0, 0, 0, 0, 0, 0, 0],
	#  [0, 0, 0, 0, 0, 0, 0],
	#  [0, 0, 0, 0, 0, 0, 0]]
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if is_terminal_window(window, config):
                return True
            
    # Negative diagonal
	# [[0, 0, 0, 0, 0, 0, 0],
	#  [0, 0, 0, 0, 0, 0, 0],
	#  [0, 0, 0, 0, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0],
	#  [x, x, x, x, 0, 0, 0]]
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if is_terminal_window(window, config):
                return True
            
    return False

<h3>Possible improvements:</h3>
<p>
<ol>
    <li>Remove confusion by checking for a draw in a separate function.</li>
    <li>Re-name the functions. The functions stand-alone and should be understandable without the broader context of the minimax function</li>
    <li>Consider combining functionality with <i>count_windows()</i>. They seem to have a very similar function, we could reduce the length of our script by combining them.</li>
</ol>

IMPORTANT: Both of these functions could be removed and replaced with
<code>
def is_terminal_node(grid, config):
    if list(grid[0, :]).count(0) == 0:
        return True
    return bool(count_windows(grid, 4, 1, config)+count_windows(grid, 4, 2, config))
</code>

That was painful and took a long time to figure out. I am less happy than when I started :(
</p>

<h2>The confusing repetition in <i>check_window()</i> and <i>count_windows()</i></h2>
<p>
    The functions <i>check_window()</i> and <i>count_windows()</i> have very similar functionality to <i>is_terminal_window()</i> and <i>is_terminal_node()</i>, respectively. Now, instead of returning True if we find 4-in-a-row and False otherwise, we look in every grouping of 4 adjacent slots (called a window here) and return the number of times we found 3-in-a-row or 4-in-a-row. The number-in-a-row we search for is determined by the parameter <i>num_discs</i>. Hence if we set <i>num_discs=4</i>, we determine if the game has finished - this makes the previous functions redundant.
</p>

<table style="width:50%", align="left">
  <tr>
    <th>Parameter</th>
    <th>Values</th>
    <th>Explanation</th>
  </tr>

  <tr>
    <td><i>num_discs</i></td>
    <td>[1,2,3,4]</td>
      <td>Count number of times we find <i>num_discs</i>-in-a-row</td>
  </tr>
    
  <tr>
    <td><i>piece</i></td>
    <td>[1,2]</td>
    <td>The piece of the player being considered</td>
  </tr>
</table>

In [None]:
# For a given grouping of 4 slots (window), count the number of pieces (either piece 1 or piece 2)
# Note zero is returned if the window is entirely full i.e [1,1,1,2] -> 0
# This lets us only consider groupings that could turn into 4-in-a-row
def check_window(window, num_discs, piece, config):
    return (window.count(piece) == num_discs and window.count(0) == config.inarow-num_discs)

# This function is so similar to is_terminal_node, I believe this should be self-explanatory
# The difference here is that we tot up the total number of windows with x-in-a-row in num_windows
def count_windows(grid, num_discs, piece, config):
    num_windows = 0
    
    # Horizontal
    for row in range(config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[row, col:col+config.inarow])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
                
    # Vertical
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns):
            window = list(grid[row:row+config.inarow, col])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
                
    # Positive diagonal
    for row in range(config.rows-(config.inarow-1)):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row+config.inarow), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
                
    # Negative diagonal
    for row in range(config.inarow-1, config.rows):
        for col in range(config.columns-(config.inarow-1)):
            window = list(grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
            if check_window(window, num_discs, piece, config):
                num_windows += 1
                
    return num_windows

<h2>Let's talk about some useful functions now</h2>

<h3><i>drop_piece()</i></h3>
<p>
    Given a column and a piece, we return a new grid with a piece placed in that column. Within our column, we scan up from the bottom to find a free slot and place our piece there. The following function may be a more intuitive version for beginners.
</p>
<code>
def drop_piece(grid, col, piece, config):
    next_grid = grid.copy()
    for row in range(config.rows-1, -1, -1):
        if next_grid[row][col] == 0:
            next_grid[row][col] = piece
            break
    return next_grid
</code>

In [None]:
# Gets board at next step if agent drops piece in selected column
def drop_piece(grid, col, piece, config):
    next_grid = grid.copy()
    for row in range(config.rows-1, -1, -1): # Loop from the bottom to the top
        if next_grid[row][col] == 0: # When an empty slot is found, place a piece
            break
    next_grid[row][col] = piece
    return next_grid

<h3><i>get_heuristic()</i></h3>
<p>
    Another intuitive function :) Here, we search through all windows (these are the groupings of four slots we mentioned earlier) and count the number of times we find 3 of our pieces, 3 of the opposing player's pieces, 4 of our pieces or 4 of the opposing player's pieces. We use an arbritrary weighting to assign a score. This is an important topic in Reinforcement Learning and the course explains this well. I'll make a new notebook shortly to further explain it.
</p>

<table style="width:50%", align="left">
  <tr>
    <th>Number of windows where</th>
    <th>Weight</th>
  </tr>

  <tr>
    <td><i>Player has 3</i></td>
    <td>1</td>
  </tr>
    
  <tr>
    <td><i>Opposing player has 3</i></td>
    <td>-100</td>
  </tr>
    
  <tr>
    <td><i>Player has 4</i></td>
    <td>1000000</td>
  </tr>
    
  <tr>
    <td><i>Opposing player has 4</i></td>
    <td>-10000</td>
  </tr>
</table>

In [None]:
def get_heuristic(grid, mark, config):
    num_threes = count_windows(grid, 3, mark, config)
    num_fours = count_windows(grid, 4, mark, config)
    num_threes_opp = count_windows(grid, 3, mark%2+1, config)
    num_fours_opp = count_windows(grid, 4, mark%2+1, config)
    score = num_threes - 1e2*num_threes_opp - 1e4*num_fours_opp + 1e6*num_fours
    return score

<h2>The minimax agent</h2>

<h3>The truth</h3>
<p>
    Ok, so, you've made it this far. Whoop whoop! But here's where it gets trickier. You don't need to understand all of the following so don't be too disheartened if it seems overwhelming.
</p>

<h3>The confusion</h3>

<p>
    I think the confusion here comes from how the function is called making it difficult to analyse as a standalone function. Later, when we call the function, we call it on each possible move for our player. This means that we don't really start at the top of the tree but rather at the second branch. Looking forward in the <i>score_move()</i> function, you'll see <i>minimax()</i> is called with the parameter <i>maximisingPlayer=False</i> which shows us that the first branch of the tree actually involves minimising the score of the opposing player.
</p>
<p>
    If you are willing to change this to simiplify your code, note that the <i>valid_moves</i> expression is the same in both our <i>minimax()</i> and <i>agent_minimax()</i>. 
</p>

<h3>Three at once</h3>

<p>
    Because these functions are so interdependent, we unfortunately have to tackle them together. First, <i>agent_minimax()</i> finds all valid moves and calls <i>minimax()</i> on each possible option. Now, <i>maximisingPlayer=False</i> so within our <i>minimax()</i> function, we consider all possible opponent moves. The function is recursive and calls itself but with <i>maximisingPlayer=True</i> so that we now consider the player's move. This process repeats until we have done it <i>N_STEPS=3</i> times. At this point, we find <i>minimax()</i> is called with <i>depth=0</i> so our function returns the heuristic score for the current setup including all the hypothetical moves that our tree has assumed.
</p>
<p>
    At this point, we can return to the explanation in the course. We calculate the value of each branch of our tree by maximising the score (if it is our turn) or minimising the score (if it is our opponent's turn). Finally, when we have summed the value of all the branches, we should have an optimal choice. The function <i>agent_minimax()</i> has a dictionary called <i>scores</i> which stores all the possible moves (by their column) with the heuristic score of the tree of that move. We then pick the best scoring move (or randomly pick out of the top scoring ones if there is a tie).
</p>

In [None]:
def minimax(node, depth, maximizingPlayer, mark, config):
    is_terminal = is_terminal_node(node, config) # This asks is the game finished
    valid_moves = [c for c in range(config.columns) if node[0][c] == 0] # This asks if there are any columns that aren't completely filled
    
    # If we're at the end of our tree (the leaf) or if the game is finished, return just the heuristic score
    if depth == 0 or is_terminal:
        return get_heuristic(node, mark, config)
    
    # Maximise the score on our turn
    if maximizingPlayer:
        value = -np.Inf
        for col in valid_moves:
            child = drop_piece(node, col, mark, config) # Returns new grid if valid move played
            value = max(value, minimax(child, depth-1, False, mark, config))
        return value
    
    # Minimise the score on our opponent's turn
    else:
        value = np.Inf
        for col in valid_moves:
            child = drop_piece(node, col, mark%2+1, config)
            value = min(value, minimax(child, depth-1, True, mark, config)) # Aww jeez, recursion alert
        return value

In [None]:
# Uses minimax to calculate value of dropping piece in selected column
def score_move(grid, col, mark, config, nsteps):
    next_grid = drop_piece(grid, col, mark, config)
    score = minimax(next_grid, nsteps-1, False, mark, config)
    return score

In [None]:
def agent_minimax(obs, config):
    # How deep to make the game tree: higher values take longer to run!
    N_STEPS = 3
    
    # Get list of valid moves
    valid_moves = [c for c in range(config.columns) if obs.board[c] == 0]
    
    # Convert the board to a 2D grid
    grid = np.asarray(obs.board).reshape(config.rows, config.columns)
    
    # Use the heuristic to assign a score to each possible board in the next step
    scores = dict(zip(valid_moves, [score_move(grid, col, obs.mark, config, N_STEPS) for col in valid_moves]))
    
    # Get a list of columns (moves) that maximize the heuristic
    max_cols = [key for key in scores.keys() if scores[key] == max(scores.values())]
    
    # Select at random from the maximizing columns
    return random.choice(max_cols)

<h2>Test the Agent</h2>

<h3>Determine the winning percentages</h3>
<p>
A contrast to prior functions, the course explains <i>get_win_percentages()</i> well and there is no need to edit it to improve your agent. As such, I shall just leave this here. Uncomment the last line to evaluate your agent. If it takes to long, lower <i>n_rounds</i>.
</p>

In [None]:
# Run once to observe the procedure is implemented correctly
env.run([agent_minimax, "random"])
env.render(mode="ipython")

def get_win_percentages(agent1, agent2, n_rounds=100):
    # Use default Connect Four setup
    config = {'rows': 6, 'columns': 7, 'inarow': 4}
    # Agent 1 goes first (roughly) half the time          
    outcomes = evaluate("connectx", [agent1, agent2], config, [], n_rounds//2)
    # Agent 2 goes first (roughly) half the time      
    outcomes += [[b,a] for [a,b] in evaluate("connectx", [agent2, agent1], config, [], n_rounds-n_rounds//2)]
    print("Agent 1 Win Percentage:", np.round(outcomes.count([1,-1])/len(outcomes), 2))
    print("Agent 2 Win Percentage:", np.round(outcomes.count([-1,1])/len(outcomes), 2))
    print("Number of Invalid Plays by Agent 1:", outcomes.count([None, 0]))
    print("Number of Invalid Plays by Agent 2:", outcomes.count([0, None]))

# Determine the winning percentages with
# get_win_percentages(agent1=agent_blocker, agent2=agent_random)

<h3>Generate the <i>submission.py</i> file</h3>

<p>
    As shown in other notebooks, we use an intuitive one-by-one procedure to include our functions into our submission file. Note that we must also include the libraries used in the script for completeness.
</p>

In [None]:
import inspect
import os

f = open("submission.py", "w")
f.write("import random \n")
f.write("import numpy as np \n")
f.write(inspect.getsource(drop_piece))
f.write(inspect.getsource(count_windows))
f.write(inspect.getsource(is_terminal_window))
f.write(inspect.getsource(is_terminal_node))
f.write(inspect.getsource(check_window))
f.write(inspect.getsource(get_heuristic))
f.write(inspect.getsource(minimax))
f.write(inspect.getsource(score_move))
f.write(inspect.getsource(agent_minimax))
f.close()

print("agent_minimax", "written to", "submission.py")

<p>
To check this has worked, look in the tab to the right. Under data, find the "output" folder. Find your "submission.py" file and download it to observe the contents. We should have a self-contained script with all necessary functions for our agent. Now, we are ready to submit this solution!
</p>

<h3>Submitting to the competition</h3>

<p>
I want to refer back to the course here as it offers a very good explanation of how to make our final submission.
<ol>
    <li>Begin by clicking on the blue Save Version button in the top right corner of the window. This will generate a pop-up window.</li>
    <li>Ensure that the Save and Run All option is selected, and then click on the blue Save button.</li>
    <li>This generates a window in the bottom left corner of the notebook. After it has finished running, click on the number to the right of the Save Version button. This pulls up a list of versions on the right of the screen. Click on the ellipsis (...) to the right of the most recent version, and select Open in Viewer. This brings you into view mode of the same page. You will need to scroll down to get back to these instructions.</li>
    <li>Click on the Output tab on the right of the screen. Then, click on the blue Submit button to submit your results to the leaderboard.</li>
</ol>
</p>
<p>
And you've submitted!!! :)
</p>
<p>
I hope you all found success with this notebook. If you're still having difficulties submitting, feel free to comment on this notebook and I'll reply as soon as I can.
</p>