## Assignment 3 Question 2

Is it possible to predict ELO of a player based on context of a potential en passant move?

Only investigating the player who is in the position of making the en passant move.

Lila See FDS PCA solutions for 3 good points for what PCA is good for - include in report?

#### Import Libraries

In [1]:
import pandas as pd
import chess
import chess.pgn
import io

# PCA
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

#### Import CSV file

66879 entries in dataframe

In [2]:
# Load CSV into a big dataframe
CHESS_DATA_LOCATION = "data/club_games_data.csv"
chess_data = pd.read_csv(CHESS_DATA_LOCATION)

#### Dataframe info

Using:
print(chess_data.dtypes)
print(chess["rules"].unique())


Chess rules:
['chess' 'chess960' 'threecheck' 'crazyhouse' 'kingofthehill']

Time control:
['1/259200' '1/172800' '1800' '1/86400' '1/432000' '1/604800' '600'
 '120+1' '900+10' '300' '180+2' '3600+5' '2700+45' '3600' '1/1209600'
 '180' '600+10' '60' '480+3' '300+5' '420+3' '600+5' '600+2' '1200' '30'
 '60+1' '120' '1500+3' '900+2' '1500+5' '1500+10' '1/864000' '900' '300+2'
 '1500' '7200' '300+1' '5400' '3600+60' '2700+30' '3480+45' '10' '2700+10'
 '15' '2700' '3600+20' '4500' '4200' '900+5' '1800+10' '2700+5' '480+5'
 '1800+30' '300+3' '600+1' '1800+5' '420+5' '5400+30' '240+10' '420' '303'
 '60+10']

 Time class:
['daily' 'rapid' 'bullet' 'blitz']

#### Clean data

Undeveloped board shouldn't matter if we're filtering games for potential ep

Same for draws

Can filter our time class if are looking at time controls

After making a new move_list column, should we drop the pgn column?


Variables we are considering when predicting ELO (for a player who could potentially make en passant move) are: (Y = DONE, N = NOT DONE)
- Y: Colour who had ep opportunity (boolean)
- Y: Did they take the en passant? (boolean)
- N: Does their choice on taking/not taking support them if gaining an advantage? (numerical value for how much of an advantage it gives)
- N: Time taken to decide to capture/not capture en passant (... whatever can be time, a number in seconds ig)
- Y: Is the game rated? (boolean)
- Y: Game time class


To do:
- Work out which colour is making the potential en passant move, add a column to dataframe detailing this
- Make dataframe columns for other variables
- Apply PCA reduction

In [3]:
# Drops rows if any value is a NaN (data is clean so it doesn't do anything)
chess_data.dropna(axis=0, how='any', inplace=True)

# Filter out alternative rules like chess960 etc
# Important this comes before the drop irrelevant columns line
chess_data = chess_data[chess_data['rules'] == "chess"]

# Save PGN column from dataframe
full_pgn = chess_data['pgn']

def get_moves(entry):
    '''
    Retrive series of moves in a game when given the whole full_pgn entry
    '''
    pgn = entry.splitlines()[-1]
    return pgn

# Add list of moves (string) as a new column to dataframe
chess_data['move_list'] = full_pgn.apply(get_moves)

# Drop irrelevant columns
chess_data = chess_data.drop(['time_control', 'white_username', 'black_username','white_id', 'black_id', 'white_result', 'black_result', 'rules'], axis=1, inplace=True)

print(chess_data.head())


   white_rating  black_rating time_control  rated  \
0          1708          1608     1/259200   True   
1          1726          1577     1/172800   True   
2          1727           842     1/172800   True   
3           819          1727     1/172800   True   
4          1729          1116     1/172800   True   

                                                 fen  \
0  r2r4/p2p1p1p/b6R/n1p1kp2/2P2P2/3BP3/PP5P/4K2R ...   
1       8/5Q1k/4n1pp/8/7P/2N2b2/PP3P2/5K2 b - - 1 33   
2  rn1q1b1r/kb2p1pp/2p5/p1Q5/N1BP2n1/4PN2/1P3PPP/...   
3  r3kb1r/pp3ppp/3p1n2/2pKp3/P3P3/1P6/4qP1P/QNB5 ...   
4  r3b2r/pp6/2pPpR1k/4n3/2P3Q1/3B4/PP4PP/R5K1 b -...   

                                                 pgn  \
0  [Event "Enjoyable games 2 - Round 1"]\n[Site "...   
1  [Event "Rapid Rats - Board 5"]\n[Site "Chess.c...   
2  [Event "CHESS BOARD CLASH - Round 1"]\n[Site "...   
3  [Event "CHESS BOARD CLASH - Round 1"]\n[Site "...   
4  [Event "CHESS BOARD CLASH - Round 1"]\n[Site "...   

       

#### Import chess info
https://python-chess.readthedocs.io/en/latest/core.html#chess.Board.san


#### En Passant functions
- has_legal_en_passant() tests if en passant capturing would actually be possible on the next move.
- has_pseudo_legal_en_passant()
- has_legal_en_passant()
- is_en_passant(move: Move) Checks if the given pseudo-legal move is an en passant capture.




Use StringIO to parse games from a string.

```python
import io
pgn = io.StringIO("1. e4 e5 2. Nf3 *")
game = chess.pgn.read_game(pgn)
```

#### Clean data specifically for ep

Filter out columns that don't have potential ep
Add columns for: whether ep happened, which colour had potential to take ep

In [4]:

def check_pgn_opportunity(pgn_in):
    '''
    Checks PGN for whether opportunity for EP happened in the game.
    '''
    pgn = io.StringIO(pgn_in)       # PGN as a file
    game = chess.pgn.read_game(pgn) # Read PGN and put into game
    board = game.board()            # "board" of a game
    
    precheck = False                # Is en passant possible
    moved = False                   # Was en passant moved?

    # Find only pawn moves in game
    for move in game.mainline_moves():
        # B bishop Q queen K king N knight R rook O castle ELSE pawn
        turn = board.turn
        san = board.san(move)
        move_piece = san[0]

        match move_piece:
            # Ignore if not pawn
            case "K" | "Q" | "B" | "N" | "R" | "O":
                board.push(move)
                continue

            # Pawn
            case _:
                # Push the move before checking the board
                board.push(move)
                precheck = board.has_legal_en_passant()

                # Return True the moment a potential ep move has been found
                if precheck:
                    return True
                
    # If ran through every move without ep opportunity then return False
    return False

def check_pgn_happened(pgn_in):
    '''
    Checks PGN for whether EP actually happened in the game.

    Assumes game passed in already has the opportunity for an EP move.
    Returns True if EP was actually the next move.
    '''
    pgn = io.StringIO(pgn_in)       # PGN as a file
    game = chess.pgn.read_game(pgn) # Read PGN and put into game
    board = game.board()            # "board" of a game
    
    moved = False                   # Was en passant moved?

    # Find only pawn moves in game
    for move in game.mainline_moves():
        # B bishop Q queen K king N knight R rook O castle ELSE pawn
        turn = board.turn
        san = board.san(move)
        move_piece = san[0]

        match move_piece:
            # Ignore if not pawn
            case "K" | "Q" | "B" | "N" | "R" | "O":
                board.push(move)
                continue

            # Pawn
            case _:
                # Check each move to see if it was an ep move
                moved = board.is_en_passant(move)
                board.push(move)

                # If ep actually happened, immediately return True
                if moved:
                    return True
                
    # If ran through every move without ep then return False
    return False

def check_pgn_turn(pgn_in):
    '''
    Checks which player has the opportunity for EP.

    Assumes check_pgn_opportunity is true for this pgn - otherwise None is returned.
    Returns 'White' or 'Black' depending on which player has opportunity for ep
    '''
    pgn = io.StringIO(pgn_in)       # PGN as a file
    game = chess.pgn.read_game(pgn) # Read PGN and put into game
    board = game.board()            # "board" of a game
    
    precheck = False                # Is en passant possible
    moved = False                   # Was en passant moved?

    # Find only pawn moves in game
    for move in game.mainline_moves():
        # B bishop Q queen K king N knight R rook O castle ELSE pawn
        turn = board.turn
        san = board.san(move)
        move_piece = san[0]

        match move_piece:
            # Ignore if not pawn
            case "K" | "Q" | "B" | "N" | "R" | "O":
                board.push(move)
                continue

            # Pawn
            case _:
                # Find the ep opportunity move
                # Push the move before checking the board
                board.push(move)
                precheck = board.has_legal_en_passant()

                if precheck:
                    if turn:
                        return 'Black'
                    else:
                        return 'White'
                
    # If pgn without ep opportunity was passed in, return None
    return None

def rating_of_colour(colour):
    '''
    Returns rating of the player specified
    '''
    if colour == 'White':
        return chess_data['white_rating']
    else:
        return chess_data['black_rating']


# Save PGN column from dataframe
full_pgn = chess_data['pgn']

# Add Boolean value to dataframe for whether an ep opportunity arose
chess_data['ep_opportunity'] = full_pgn.apply(check_pgn_opportunity)

# Filter out games where ep didn't happen
chess_data = chess_data[chess_data['ep_opportunity'] == True]
chess_data = chess_data.drop(['ep_opportunity'], axis=1, inplace=True)

# Re-save PGN column from updated dataframe
full_pgn = chess_data['pgn']

# Add Boolean value to dataframe for whether an ep capture actually happened
chess_data['ep_happened'] = full_pgn.apply(check_pgn_happened)

# Add column indicating the colour of the player who had the ep choice
chess_data['ep_colour'] = full_pgn.apply(check_pgn_turn)

# Add column indicating the rating of the player who had the ep choice
chess_data['ep_rating'] = chess_data['ep_colour'].apply(rating_of_colour)
chess_data = chess_data.drop(['white_rating', 'black_rating'], axis=1, inplace=True)


    white_rating  black_rating time_control  rated  \
40          1569          1546     1/259200   True   
48          1505          1635     1/259200   True   
55          1468          1870     1/259200   True   
56          1530          1459     1/172800   True   
81          1498          1540          600   True   

                                                  fen  \
40  r5k1/2b2pp1/p6p/1p2Q3/8/1P5P/1PPr1P1P/5RK1 b -...   
48  r1b2rk1/1p1n1qbp/p2pp1p1/P1p3Bn/2P1P3/2N2N2/1P...   
55  r2q1rk1/1p1n2pp/p6b/2pPp3/P3N3/2N4P/1PP1QPP1/R...   
56  r3r1k1/pb3p1p/1p4p1/6P1/3pPq2/3P1B1P/P4K2/Q6R ...   
81  r4k2/1p3r2/p2p2n1/2pPp3/2P1P1b1/1P2b1P1/P3NR1P...   

                                                  pgn  \
40  [Event "Enjoyable games 2 - Round 1"]\n[Site "...   
48  [Event "Besiktas J.K - Round 1"]\n[Site "Chess...   
55  [Event "Enjoyable games 2 - Round 1"]\n[Site "...   
56  [Event "Let's Play!"]\n[Site "Chess.com"]\n[Da...   
81  [Event "Live Chess"]\n[Site "Chess.com"]\n[

#### Cleaning data for EP - Advantage for ep

Analysing game - finding if ep move would have given an edvantage


Split into cases: ep happened and ep didn't happen


NOT DONE YET

In [None]:
# def find_fen_ep_opportunity(pgn_in):
#     '''
#     Returns FEN of board when the ep opportunity arises
#     '''
#     pgn = io.StringIO(pgn_in)       # PGN as a file
#     game = chess.pgn.read_game(pgn) # Read PGN and put into game
#     board = game.board()            # "board" of a game
    
#     precheck = False                # Is en passant possible
#     moved = False                   # Did ep happen

#     fen_before_ep = ""
#     fen_after_ep = ""
#     reached_ep_opp = 0

#     # Find only pawn moves in game
#     for move in game.mainline_moves():
#         # B bishop Q queen K king N knight R rook O castle ELSE pawn
#         turn = board.turn
#         san = board.san(move)
#         move_piece = san[0]

#         match move_piece:
#             # Ignore if not pawn
#             case "K" | "Q" | "B" | "N" | "R" | "O":
#                 board.push(move)
#                 continue

#             # Pawn
#             case _:
#                 # Find the ep opportunity move
#                 # Push the move before checking the board
#                 moved = board.is_en_passant(move)
#                 board.push(move)
#                 precheck = board.has_legal_en_passant()

                
#                 if moved:
#                     fen_after_ep = board.fen()
#                     # Break out of loop
#                     return fen_before_ep, fen_after_ep
#                 elif reached_ep_opp > 0:
#                     # If ep wasn't taken, find fen for if it was actually taken

#                     # LILA STILL GOT TO DO THIS
#                     # (Find what the ep move would be and add it to the pgn if ep hadn't actually been chosen)
                    
#                 elif precheck:
#                     fen_before_ep = board.fen()
#                     reached_ep_opp += 1
                
#     # If pgn without ep opportunity was passed in, return None
#     return fen_before_ep, fen_after_ep





# def find_ep_advantage(pgn_in, ep_happened):
#     '''
#     Returns advantage given from ep move
#     '''
#     pgn = io.StringIO(pgn_in)       # PGN as a file
#     game = chess.pgn.read_game(pgn) # Read PGN and put into game
#     board = game.board()            # "board" of a game

#     fen_before_ep = find_fen_ep_opportunity(pgn)


### Number of games with at least 1 possible en-passant move - ???:
## 4750
(maybe 4913 or even 4945)

### Number of games with an actual en-passant move - ???:
## 1566

#### Applying PCA


In [None]:
# Replace Boolean and string variables with numbers
chess_data['ep_happened'] = chess_data['ep_happened'].replace({True:1, False:0})
chess_data['ep_colour'] = chess_data['ep_colour'].replace({True:1, False:0})
chess_data['rated'] = chess_data['rated'].replace({True:1, False:0})
chess_data['time_class'] = chess_data['time_class'].replace({'daily':3, 'rapid':2, 'blitz':1, 'bullet':0})

# Drop irrelevant columns, and save differently - as a dataframe including the rating and one without
chess_data_with_elo = chess_data.drop(['fen', 'pgn', 'move_list'], axis=1, inplace=True)
chess_data_without_elo = chess_data_with_elo.drop(['ep_rating'], axis=1, inplace=True)

print(chess_data_with_elo.head())
print('\n')
print(chess_data_without_elo.head())

##### Data Standardisation (For PCA)