# Chapter 4: Recursion and MiniMax Tree Search



***
*“In another thirty years people will laugh at anyone who tries to invent a language without closures, just as they'll laugh now at anyone who tries to invent a language without recursion.”*

-- Mark Jason Dominus, in 2005
***



What you'll learn in this chapter:

* The logic behind MiniMax tree search
* Understanding recursion and applying it in MiniMax tree search
* Implementing MiniMax tree search in the coin game
* Testing the effectiveness of the MiniMax agent

In Chapters 2 and 3, you learned how to use look-ahead search to design
intelligent game strategies in Tic Tac Toe and Connect Four. However,
the search process was hard coded. If we look ahead beyond three steps,
the coding becomes tedious and error-prone. You may wonder if there is a
systematic and more efficient way of conducting look-ahead search. The
answer is yes: MiniMax tree search does exactly that. The MiniMax
algorithm is a decision rule in artificial intelligence and game theory.
The algorithm assumes that each player in the game makes the best
possible decisions at each step. Further, each player knows that other
players make fully rational decisions as well, and so on.

In this chapter, you'll learn to implement MiniMax tree search in the
coin game. Specifically, you'll use recursion to call a function inside
the function itself. This creates an infinite loop: all command lines in
the function are executed iteration after iteration until a certain
condition is met. The recursive algorithm allows the MiniMax agent to search ahead
until the end of the game.

You'll create a MiniMax agent in the coin game by using the game environment that we developed in
Chapter 1. The algorithm makes hypothetical future moves and exhausts all
possible future game paths. The algorithm then uses backward induction
to calculate the best move in each step of the game. The MiniMax agent
solves the coin game and plays perfectly: it always wins when it pays
second. The MiniMax agent makes moves very quickly as well: each move
takes a fraction of a second.

After this chapter, you'll understand the logic behind MiniMax tree
search and be able to design game strategies for any game based on the
algorithm. You'll apply the algorithm to Tic Tac Toe and Connect Four as
well in the next few chapters and find ways to overcome or mitigate
drawbacks associated with MiniMax tree search.

# 1. Introducing MiniMax and Recursion
This section introduces MiniMax tree search and explains the concept of recursion in programming languages. 

## 1.1. What is MiniMax Tree Search?

## 1.2. Backward Induction and the Solution to MiniMax

## 1.3. What is Recursion? 
Recursion is the calling of a function inside the function itself. We'll use recursion to implement MiniMax tree search in this book. Below, I'll show you one example of recursion. 

Suppose you want to create a clock to tell time. The normal approach is as follows:

In [1]:
import time

def clock():
    time_now=time.strftime("%H:%M:%S")
    print(f"The current time is {time_now}") 
clock()

The current time is 19:41:45


In [2]:
start=time.time()
def clock():
    time_now=time.strftime("%H:%M:%S")
    print(f"The current time is {time_now}") 
    time.sleep(1)
    if time.time()-start<=10:
        clock()
clock()    

The current time is 19:45:46
The current time is 19:45:47
The current time is 19:45:48
The current time is 19:45:49
The current time is 19:45:50
The current time is 19:45:51
The current time is 19:45:52
The current time is 19:45:53
The current time is 19:45:54
The current time is 19:45:55


In the above cell, we call the *clock()* function in the function itself, unless more than ten seconds have passed. As a result, the function tells time for ten consecutive seconds. 

# 2. MiniMax Tree Search in the Coin Game

We'll use a simplified version of the self-made coin game environment from Chapter 1 to speed up MiniMax tree search. Specifically, the module is saved as *coin_simple_env.py* in the folder *utils* in the book's GitHub repository https://github.com/markhliu/AlphaGoSimplified. Download the file and save it in the folder /Desktop/ags/utils/ on your computer. The file *coin_simple_env.py* is the same as *coin_env.py* that we used in Chapter 1, except that we have deleted the graphical game window functionality. As a result, you cannot use the render() method in the simplified coin game environment. We use the simplified coin game environment to make the MiniMax agent make moves faster. 

First, let's define a couple of functions that the MiniMax algorithm will use. 

## 2.1. The *minimax()* Function 

In [3]:
from copy import deepcopy
from random import choice

def minimax(env):
    # create a list to store winning moves
    wins=[]
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        new_state, reward, done, info = env_copy.step(m) 
        # if move m lead to a win now, take it
        if done and reward==1:
            return m 
        # see what's the best response from the opponent
        opponent_payoff=maximized_payoff(env_copy,reward,done)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise randomly pick
    return env.sample()

## 2.2. The *maximized_payoff()* Function 
Next, we'll define *maximized_payoff()* function in the local module *ch04util*. The function produces the best possible outcome for the next player in the next step of the game. Note this function applies to any player in any stage of the game so we don't need to define one function for Player 1 and another function for Player 2.

In [4]:
def maximized_payoff(env, reward, done):
    # if the game has ended after the previous player's move
    if done:
        return -1
    # otherwise, search for action to maximize payoff
    best_payoff=-2
    # iterate through all possible next moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        new_state,reward,done,info=env_copy.step(m)  
        # what's the opponent's response
        opponent_payoff=maximized_payoff(env_copy, reward, done)
        # opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        # update your best payoff 
        if my_payoff>best_payoff:        
            best_payoff=my_payoff
    return best_payoff

## 2.3. Human versus MiniMax in the Coin Game
Next, you'll play a game against the MiniMax algorithm. We'll let the MiniMax agent move second and see if it can win the game. 

In [5]:
from utils.coin_simple_env import coin_game
from utils.ch04util import minimax

# Initiate the game environment
env=coin_game()
state=env.reset()   
# Play a full game
while True:
    print(f"there are {state} coins in the pile")   
    action=input("Player 1, what's your move (1 or 2)?")
    print(f"Player 1 has chosen action={action}")    
    state, reward, done, info=env.step(action)
    if done:
        print(f"there are {state} coins in the pile")
        print(f"Player 1 has won!") 
        break
    print(f"there are {state} coins in the pile") 
    start=time.time()
    action=minimax(env)
    print(f"time lapse = {time.time()-start:.5f} seconds") 
    print(f"Player 2 has chosen action={action}")  
    state, reward, done, info=env.step(action)
    if done:
        print(f"there are {state} coins in the pile")
        print(f"Player 2 has won!") 
        break

there are 21 coins in the pile
Player 1, what's your move (1 or 2)?2
Player 1 has chosen action=2
there are 19 coins in the pile
time lapse = 0.31630 seconds
Player 2 has chosen action=1
there are 18 coins in the pile
Player 1, what's your move (1 or 2)?1
Player 1 has chosen action=1
there are 17 coins in the pile
time lapse = 0.12109 seconds
Player 2 has chosen action=2
there are 15 coins in the pile
Player 1, what's your move (1 or 2)?2
Player 1 has chosen action=2
there are 13 coins in the pile
time lapse = 0.02154 seconds
Player 2 has chosen action=1
there are 12 coins in the pile
Player 1, what's your move (1 or 2)?1
Player 1 has chosen action=1
there are 11 coins in the pile
time lapse = 0.00852 seconds
Player 2 has chosen action=2
there are 9 coins in the pile
Player 1, what's your move (1 or 2)?2
Player 1 has chosen action=2
there are 7 coins in the pile
time lapse = 0.00100 seconds
Player 2 has chosen action=1
there are 6 coins in the pile
Player 1, what's your move (1 or 2)?1

# 3. Effectiveness of MiniMax in the Coin Game
Next, we’ll test how often the MiniMax Algorithm wins against the rule-based AI game strategy that we developed in Chapter 1. We'll first let the MiniMax agent play againt random moves. We'll then test the MiniMax agent against the rule-based AI. 

## 3.1. Minimax versus Random Moves in the Coin Game

In [6]:
from utils.ch01util import random_player, one_coin_game

env=coin_game()
results=[]
for i in range(100):
    # MiniMax moves first 
    result=one_coin_game(minimax,random_player)
    # record game outcome
    results.append(result)
# count how many times MiniMax has won
wins=results.count(1)
print(f"the MiniMax algorithm won {wins} games")
# count how many times MiniMax has lost
losses=results.count(-1)
print(f"the MiniMax algorithm lost {losses} games") 

the MiniMax algorithm won 96 games
the MiniMax algorithm lost 4 games


In [7]:
results=[]
for i in range(100):
    # MiniMax moves second
    result=one_coin_game(random_player,minimax)
    # record negative game outcome
    results.append(-result)
# count how many times MiniMax has won
wins=results.count(1)
print(f"the MiniMax algorithm won {wins} games")
# count how many times MiniMax has lost
losses=results.count(-1)
print(f"the MiniMax algorithm lost {losses} games") 

the MiniMax algorithm won 100 games
the MiniMax algorithm lost 0 games


## 3.2. MiniMax versus Rule-Based AI in the Coin Game

In [8]:
from utils.ch01util import rule_based_AI

env=coin_game()
results=[]
for i in range(100):
    # MiniMax moves first 
    result=one_coin_game(minimax,rule_based_AI)
    # record game outcome
    results.append(result)
# count how many times MiniMax has won
wins=results.count(1)
print(f"the MiniMax algorithm won {wins} games")
# count how many times MiniMax has lost
losses=results.count(-1)
print(f"the MiniMax algorithm lost {losses} games")   

the MiniMax algorithm won 0 games
the MiniMax algorithm lost 100 games


In [9]:
results=[]
for i in range(100):
    # MiniMax moves second
    result=one_coin_game(rule_based_AI,minimax)
    # record negative game outcome
    results.append(-result)
# count how many times MiniMax has won
wins=results.count(1)
print(f"the MiniMax algorithm won {wins} games")
# count how many times MiniMax has lost
losses=results.count(-1)
print(f"the MiniMax algorithm lost {losses} games") 

the MiniMax algorithm won 100 games
the MiniMax algorithm lost 0 games
