# Tic Tac Toe

Given rules of the game we want to use a version of genetic algorithm to train our model for playing ttt.

## The idea

- Any state of the board is represented by a vector in $\mathbb R^9$. Although it seems natural for the case at hand, we may keep in mind to investigate $\mathbb R^n$. 
    - What is this representation?
    - Is it a good idea to represent empty slots with 0's?
    - Should one use bitwise operators instead?
- Every player (strategy) is a set of weights, biases (matrices) $W$ and $b$. It takes a state as an input and decides where to make a move. The output is also $\mathbb R^9$.
- We start with just one layer, meaning that we have $$W = (9\times 9), ~~ \text{and} ~~ b=(9\times 1).$$
- At the very beginning (generation 0) we generate randomly a population of $N$ players, i.e. a set of pairs $(W_i,b_i)$, with $i=1,\dots,N$, and let them play against each other (there are $N(N-1)$ games in total).
- We score them somehow according to their performance in this competition. They get to reproduce according to their score. First $N_{best}$ cross-bread leaving offsprings. To complete new generation there are several options:
    1. We add $N_{rand}$ new (random) players.
    2. We randomly choose several lucky ones among the rest and let them reproduce as well.
    3. We cross-bread all with all and chose randomly a number of lucky offsprings.

## Implementation

### Importing necessary packages

In [2]:
%matplotlib inline
%matplotlib nbagg

import numpy as np
from tempfile import TemporaryFile

#import csv
from IPython.display import clear_output
from decimal import *

import matplotlib.pyplot as plt
#from sklearn.neural_network import MLPClassifier
#from sklearn.model_selection import train_test_split

plt.rcParams["figure.figsize"] = (8, 8)
plt.rcParams["font.size"] = 14

### Definitions
All classes and functions taking field as an argument consider it as $(9\times1)$ vector (not $(3\times3)$ matrix)

In [8]:
def sigmoid(x):
    return 1/(1+np.exp(-x))
       
class strategy: # Players knowing that going in the occupied slot is forbidden
    def __init__(self,weights,biases,mutation_rate=0,name=None): # Is mutation rate just learning rate?
        self.weights=weights
        self.biases=biases
        self.name=name
        self.mutation_rate=mutation_rate
        
    def intensity(self,field): # Given the state of field it computes probability for every move using weights and biases
        n_layers=len(self.biases)
        x_in=field
        for counter in range(n_layers):
            argument=np.matmul(self.weights[counter],x_in)+self.biases[counter]
            x_in=sigmoid(argument)
        return argument

    
    def occupied_q(self,field,slot): # Checks is a particular slot is occupied
        return field[slot]
    
    def whereto(self,field,one_hot=True): # Decides where to make the next move by finding the maximal intensity among unoccupied slots
        sorted_args=np.argsort(self.intensity(field))[::-1]
        number=0
        while self.occupied_q(field,sorted_args[number]):
            number=number+1
            
        if not one_hot:
            return sorted_args[number]
        else:
            return np.eye(9)[sorted_args[number]]

class history: # Used for keeping track of evolution
    def __init__(self,names_all=np.array([]),names_best=np.array([]),scores_all=np.array([]),scores_best=np.array([])):
        #self.weights=weights
        #self.biases=biases
        self.scores_all=scores_all
        self.scores_best=scores_best
        self.names_all=names_all
        self.names_best=names_best
        #self.generation=generation
        
def winner_q(field):
    
    reward=np.array([2,-1]) # 2 points for winning and -1 point for losing
    field=field.reshape((3,3))
    plus=np.array([1,1,1])
    minus=np.array([-1,-1,-1])
    def indicator(vec):
        return ((field[0]==vec).all() or (field[1]==vec).all() or (field[2]==vec).all() 
            or (field.T[0]==vec).all() or (field.T[1]==vec).all() or (field.T[2]==vec).all() 
            or (np.diag(field)==vec).all() or (np.diag(np.fliplr(field))==vec).all())    
    if indicator(plus):
        return reward
    elif indicator(minus):
        return reward[::-1]
    else:
        return np.array([0,0])
    
def game(strat1,strat2,verbose=False):
    
    if verbose:
        field=np.zeros(9)
        counter=0
        while (not (winner_q(field)).any()) and counter<10:
            print('Step 1: \n')
            print(strat1.intensity(field))
            field=field+strat1.whereto(field)
            counter=counter+1
            print(field.reshape((3,3)))
            if (winner_q(field)).any() or counter>8:
                break
            print(strat2.intensity(field))
            field=field-strat2.whereto(field)
            counter=counter+1
            print(field.reshape((3,3)))
        return winner_q(field)
    else:
        field=np.zeros(9)
        counter=0
        while (not (winner_q(field)).any()) and counter<10:
            field=field+strat1.whereto(field)
            counter=counter+1
            if (winner_q(field)).any() or counter>8:
                break
            field=field-strat2.whereto(field)
            counter=counter+1
#         if winner_q(field)[0]>0:
#             print('Player',strat1.name,'wins against',strat2.name)
#         elif winner_q(field)[0]<0:
#             print('Player',strat1.name,'loses against',strat2.name)
#         else:
#             print('It is a tie in:',strat1.name,'vs.',strat2.name)
        return winner_q(field)


def mutation(strat,rate,name):
    
    w=strat.weights
    b=strat.biases
    mr=rate
    
    noise_w=(2*np.random.random(w.shape)-1)*mr
    noise_b=(2*np.random.random(b.shape)-1)*mr
    
    new_weights=w*(1+noise_w)
    new_biases=b*(1+noise_b)

    new_name=name
    
    return strategy(weights=new_weights,biases=new_biases,name=new_name)




def get_names(players):
    names=np.array([])
    for player in players:
        names=np.append(names,player.name)
    return names


def tournament(players):
    scores=np.zeros(num_players) # Initialize scores with zeros
    for counter1 in range(num_players):
        for counter2 in range(num_players):
            match=game(players[counter1],players[counter2])
            scores[counter1]+=match[0]
            scores[counter2]+=match[1]
#             clear_output(wait=True)
#             print('Games played: %',int(10**4*(counter1*num_players+counter2+1)/num_players**2)/100)
#             print('Players: ',players[counter1].name,players[counter2].name)
#             print(scores[players[counter1].name])
#             print(scores[players[counter2].name],'\n')
    return scores

def training_sess(trained,partners):
    scores=np.zeros(num_trained)
    for i in range(num_trained):
        for j in range(num_partners):
            match1=game(trained[i],partners[j])
            match2=game(partners[j],trained[i])
            scores[i]+=match1[0]+match2[1]
    return scores


def new_gen(players,partners,learning_rate):
    
    scores=training_sess(players,partners).astype(np.int)
    #parent=np.random.choice(players[scores==np.max(scores)])
    
    parent=players[np.argmax(scores)]
    
    chosen_name=parent.name
    parent.name='0'    
    players=np.array([parent])
    for _ in range(1,num_trained):
        players=np.append(players,mutation(strat=parent,rate=learning_rate,name=str(_)))
    return players,scores,chosen_name

### SG??

#### Initial players and needed defs

In [47]:
np.random.seed(0)
num_trained=100
num_partners=100
threshold=2*2*num_partners

num_layers=2
generation=0

players=np.array([strategy(weights=2*np.random.random((num_layers,9,9))-1,
                           biases=2*np.random.random((num_layers,9))-1,
                           name=str(i)) for i in range(num_trained)])

hist=np.array([])


print(num_trained)
print(num_partners)
print(threshold)

100
100
400


In [None]:
for _ in range(10):
    partners=np.array([strategy(weights=2*np.random.random((num_layers,9,9))-1,
                               biases=2*np.random.random((num_layers,9))-1) for i in range(num_partners)])
    players,scores,parent_name=new_gen(players,partners,0.07)
#     print(scores)
#     print(parent_name,'\n')
    hist=np.append(hist,parent_name)
    generation+=1
print(generation)

In [None]:
print(hist[-10:])
print(len(hist))
scores

In [239]:
# np.savez('best_2', 
#          weights=players[0].weights,
#          biases=players[0].biases,
#          name=players[0].name,
#          mutation_rate=players[0].mutation_rate)
# np.load('file.npz')['weights']

#players=np.append(players[:-1],external_player)

In [240]:
external_player=strategy(weights=np.load('best_1.npz')['weights'],
                         biases=np.load('best_1.npz')['biases'],
                         name=np.load('best_1.npz')['name'])

### Tesing

In [232]:
field=np.array([[1,0,0],
                [0,-1,-1],
                [0,1,0]])
print('Current best:')
#print(field+players[0].whereto(field.reshape(-1)).reshape(3,-1),'\n')
for _ in range(num_trained):
    print(field+players[_].whereto(field.reshape(-1)).reshape(3,-1),'\n')

#print('Initial best:')
#print(field+best0.whereto(field.reshape(-1)).reshape(3,-1))

Current best:
[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 

[[ 1.  0.  0.]
 [ 0. -1. -1.]
 [ 1.  1.  0.]] 



In [328]:
# np.savez('best_sofar', 
#          weights=best_ever.weights,
#          biases=best_ever.biases,
#          name=best_ever.name,
#          mutation_rate=best_ever.mutation_rate,
#          generation=generation)

In [329]:
np.load('best_sofar.npz').files

['weights', 'biases', 'name', 'mutation_rate', 'generation']

In [27]:
field=np.array([[0,0,0],
                [-1,-1,0],
                [0,1,1]])
print(players[0].intensity(field.reshape(9)))
print(players[3].intensity(field.reshape(9)))
print(np.argmax(players[0].intensity(field.reshape(9))))
print(np.argmax(players[2].intensity(field.reshape(9))))
print(np.argmax(players[8].intensity(field.reshape(9))))

[ 0.84955189 -1.07961474  0.27725177  1.08092424  1.3713455  -0.37533411
  1.40599152 -1.115399   -0.25129812]
[ 0.94643846 -0.8386913   0.29870048  0.80525908  1.73803349 -0.39658534
  1.31436656 -1.56768852 -0.33094924]
6
6
4


In [11]:
field=np.array([[0,0,0],
                [-1,-1,0],
                [0,1,1]])
field+best_strategies[0].whereto(field.reshape(-1)).reshape(3,-1)

array([[ 0.,  0.,  0.],
       [-1., -1.,  1.],
       [ 0.,  1.,  1.]])

In [322]:
alpha=3

field=np.array([[1,0,0],
                [1,-1,-1],
                [-1,0,1]])

strat1=strategy(weights=w,biases=b)
strat2=strategy(weights=w+alpha*diff_w,biases=b+alpha*diff_b)

print(strat1.whereto(field.reshape(9),one_hot=False))
print(strat2.whereto(field.reshape(9),one_hot=False))

1
2
