# AI Practicals

---

# Practical 3: Reinforcement	Learning

## Author : Junjie Li, Manuel Liu Wang

In [None]:
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
import copy
import itertools

import chess as ch
import aichess
import lib


### Introduction
> ...


1) RL	algorithms	may operate	both	in	stochastic	and	deterministic	environments.	This	first	 part	of	the	practical	consists	of	implementing	a	Q-Leaning	algorithm	that	learns	to	solve	 the	same	problem	proposed	in	practical	1,	where	only	the	white	pieces	may	move. This	is	 an	 example	 of	 the	 simpler	 case	 of	 deterministic	 environments.	 Your	 instruction	 is	 to	 implement a	Q-learning	algorithm,	updated	via Temporal	Difference,	which	finds	the	path	 towards	 check-mate	 in	 the	 least	 number	 of	 moves	 possible.	 Use	 a	 table	 to	 store	 the	 corresponding	Q-Values.	Comment	the	code	accordingly	(6p).

Before let we set: 
>       learning rate = 0.1
>       gamma = 0.9
>       epsilon = 0.95
>       episode = 100000

In [None]:
lib.initLogging()

# intiialize board
# current state initialization
TA = np.zeros((8, 8))

# white pieces
TA[7][5] = 6        # white king
TA[7][0] = 2        # white rook

# black pieces
TA[0][5] = 12       # black king

# initialize board
chess = ch.Chess(TA)
# print board
chess.board.print_board()


WhitePlayerAichess = aichess.Aichess(TA, True, True, 
                     learning_rate= 0.1, gamma = 0.9, epsilon=0.95, episode=100000)
path = WhitePlayerAichess.Q_Learning()


> Now we can run the path generated above:

In [None]:
startState = path[0]
path = path[1:]
for nextState in path:
    aichess.movePiece(WhitePlayerAichess, startState, nextState)
    startState = nextState
    WhitePlayerAichess.chess.board.print_board()

2)  Using	the	code	of	practical	2	as	a	starting	point,	program two	RL	agents	(Whites	&	Blacks),	 and	make	them	learn	to	compete	each	other	until	check-mate.	Use	Q-Learning	on either	 side	(4p).



In [None]:
# intiialize board
# current state initialization
TA = np.zeros((8, 8))

# white pieces
TA[7][5] = 6        # white king
TA[7][0] = 2        # white rook

# black pieces
TA[0][5] = 12       # black king
TA[0][0] = 8        # black rook

# initialize board
chess = ch.Chess(TA)
# print board
chess.board.print_board()

WhitePlayerAichess = aichess.Aichess(TA, True, True, 
                     learning_rate= 0.1, gamma = 0.9, epsilon=0.95, episode=10000)

BlackPlayerAichess = aichess.Aichess(TA, False, True, 
                     learning_rate= 0.1, gamma = 0.9, epsilon=0.95, episode=10000)

### Warning 

#### We don't write tie cases, and checkmate is not perfect, so it can get stuck in an infinite loop.

In [None]:
while True:

    # ----------------White Player -------------------------------- #
    print("white turn")
    WhitePlayerCurrentState = copy.deepcopy(chess.board.currentStateW)
    WhitePlayerAichess.chess = copy.deepcopy(chess)

    WhitePlayerAichess.currentStateW = copy.deepcopy(chess.board.currentStateW)
    WhitePlayerAichess.currentStateB = copy.deepcopy(chess.board.currentStateB)
    WhitePlayerAichess.innitialStateW = copy.deepcopy(chess.board.currentStateW)
    WhitePlayerAichess.innitialStateB = copy.deepcopy(chess.board.currentStateB)

    WhitePlayerNextState = WhitePlayerAichess.Q_Learning()
    nextpath = WhitePlayerNextState[1]
    ch.movePiece(chess, WhitePlayerCurrentState, nextpath)

    chess.board.print_board()

    if ch.GameOver(chess.board.board):
        print("Black WIN!!")
        break

    # ----------------Black Player -------------------------------- #
    print("black turn")
    BlackPlayerCurrentState = copy.deepcopy(chess.board.currentStateB)
    BlackPlayerAichess.chess = copy.deepcopy(chess)

    BlackPlayerAichess.currentStateW = copy.deepcopy(chess.board.currentStateW)
    BlackPlayerAichess.currentStateB = copy.deepcopy(chess.board.currentStateB)
    BlackPlayerAichess.innitialStateW = copy.deepcopy(chess.board.currentStateW)
    BlackPlayerAichess.innitialStateB = copy.deepcopy(chess.board.currentStateB)

    BlackPlayerNextState = BlackPlayerAichess.Q_Learning()
    nextpath = BlackPlayerNextState[1]
    ch.movePiece(chess, BlackPlayerCurrentState, nextpath)

    chess.board.print_board()

    if ch.GameOver(chess.board.board):
        print("Black WIN!!")
        break


a) Which	differences	can	you	describe	of	the	behaviour of	these	agents	with	respect	
to	that	of	point	1.

> Now, we come to the confrontation game of two aichess. A big difference from the first question is that now the game is for each aichess to take the next step, so it is not enough to blindly find the optimal solution, because the aichess cannot go all the way to the end, so we can train ai to obtain the global optimal solution next step in . Then update the chess table to let aichess retrain the current situation.

b) How	long	does	it	take	to	learn?


> Their learning time depends on their learning rate, gamma, epsilon, and episode values. If the learning rate is too small, it will greatly reduce the convergence speed and increase the training time. Different gamma and epsilon values will also have a certain degree of influence on whether it can be fast The optimal solution is obtained, and the episode represents the number of training times. A reasonable episode value can often find the best performance in the training time and the optimal solution. Their learning time depends on their learning rate, 

c) What	happens	if	you	vary	their	relative	learning	rates?

In [None]:
WhitePlayerAichess = aichess.Aichess(TA, True, True, 
                     learning_rate= 0.05, gamma = 0.9, epsilon=0.95, episode=10000)

BlackPlayerAichess = aichess.Aichess(TA, False, True, 
                     learning_rate= 0.2, gamma = 0.9, epsilon=0.95, episode=10000)

> The **Learning rate** is an important hyperparameter in supervised learning and Q-learning, 
which determines whether the objective function can converge to the local minimum and when it can converge 
to the minimum. An appropriate learning rate can make the objective function converge to a local minimum
within an appropriate time.

>When the learning rate is set too small, the convergence process will become very slow. And when the learning 
rate is set too large, the gradient may oscillate back and forth near the minimum value, and may even fail 
to converge.

> In this example, if we set different learning rates for the two aichesses, if the learning rate is too small for one of them, it will greatly reduce the convergence speed and increase the training time. If one of them sets the learning rate too large, it may cause oscillations back and forth on both sides of the optimal solution, so that the optimal solution cannot be found.

### Conclusion

> ...