# Chess Move Sequence ML
### Julian Rückerl h120104017

My project's documentation will be in this Jupyter Notebook file, referring the essential areas in the code with comments or as markdown (Comments: technical explanations; Markdown: broader Project context explanations).

Essentially, I am interested in predicting the outcome of chess games based on the first two moves in the game. As is usual in chess, White always starts. This is a small advantage for White because he can open the game however he wants, but not every opening is a good opening, and some can even give the opponent an advantage. Especially when it comes to real players and not a chess engine.

This project was inspired by my passion for chess and as part of a university project. Since my data would not be sufficient for this project, I have decided to use the data of one if not the greatest chess player of all time. However, I think that this model would work even better with data from a regular player, where the opening is often more crucial, as more traps are played and the endgame is less important (a player's advantage is usually high early on or a player has already been checkmated).
Moreover, I will include the same notebook in the appendix, but only with the first two moves of each player.

In [1]:
import bz2
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Loading the data
Lichess is beside Chess.com the biggest platform to play chess. Moreover, you can retrieve data from lichess.org API of every player and save it as a pgn. This can be converted to CSV. Which I already done so this stepp can be skipped.

In [2]:
#from converter.pgn_data import PGNData
#pgn_data = PGNData("lichess_DrNykterstein_2023-11-16.pgn")
#pgn_data.export() 

This will create two csv files, one for the moves and one for the general game stats. As I need information from both, I need to merge them before I can continue.

In [3]:
use_cols = ['game_id', 'move_no', 'move_no_pair', 'move_sequence','notation', 'piece', 'color']
df1 = pd.read_csv('lichess_DrNykterstein_2023-11-16_moves.csv', usecols=use_cols, encoding='utf-8')

#df1 = pd.read_csv('carlsen_games_moves.csv', encoding = 'utf-8') # to large -> would take quite some time 
df1.head()

Unnamed: 0,game_id,move_no,move_no_pair,notation,piece,color,move_sequence
0,67512e0d-96dc-4e3e-8d86-c208cdc372c7,1,1,d4,P,White,d4
1,67512e0d-96dc-4e3e-8d86-c208cdc372c7,2,1,g6,P,Black,d4|g6
2,67512e0d-96dc-4e3e-8d86-c208cdc372c7,3,2,c4,P,White,d4|g6|c4
3,67512e0d-96dc-4e3e-8d86-c208cdc372c7,4,2,Bg7,B,Black,d4|g6|c4|Bg7
4,67512e0d-96dc-4e3e-8d86-c208cdc372c7,5,3,Nc3,N,White,d4|g6|c4|Bg7|Nc3


In [4]:
df2 = pd.read_csv('lichess_DrNykterstein_2023-11-16_game_info.csv', encoding = 'utf-8')
df2.head()

Unnamed: 0,game_id,game_order,event,site,date_played,round,white,black,result,white_elo,...,winner_loser_elo_diff,eco,termination,time_control,utc_date,utc_time,variant,ply_count,date_created,file_name
0,67512e0d-96dc-4e3e-8d86-c208cdc372c7,1,Chess party with DJ Assios,https://lichess.org/NOH9m1vV,2023.07.15,?,penguingim1,DrNykterstein,1-0,3073,...,-144,A40,Normal,60+0,2023.07.15,23:28:59,Standard,,2023-11-16T11:34:26+0000,lichess_DrNykterstein_2023-11-16.pgn
1,31889391-a167-4dae-9b75-277239b1f8d0,2,Chess party with DJ Assios,https://lichess.org/DlnKVdID,2023.07.15,?,MatthewG-p4p,DrNykterstein,0-1,3050,...,163,A02,Normal,60+0,2023.07.15,23:27:14,Standard,,2023-11-16T11:34:26+0000,lichess_DrNykterstein_2023-11-16.pgn
2,c3886b21-f98d-424e-9cff-0c405c4a5a11,3,Chess party with DJ Assios,https://lichess.org/R3QuS4LL,2023.07.15,?,DrNykterstein,penguingim1,1-0,3209,...,134,C60,Normal,60+0,2023.07.15,23:26:18,Standard,,2023-11-16T11:34:26+0000,lichess_DrNykterstein_2023-11-16.pgn
3,3b7b3143-6348-46a1-a12c-7920c7980ddc,4,Chess party with DJ Assios,https://lichess.org/U4XRlo1O,2023.07.15,?,KontraJaKO,DrNykterstein,0-1,3119,...,86,B00,Normal,60+0,2023.07.15,23:23:50,Standard,,2023-11-16T11:34:26+0000,lichess_DrNykterstein_2023-11-16.pgn
4,385deafb-e2c7-4ca0-9ccd-e7ae630dd39b,5,Chess party with DJ Assios,https://lichess.org/CjNfGKGP,2023.07.15,?,DrNykterstein,mutdpro,1-0,3201,...,148,C47,Normal,60+0,2023.07.15,23:22:05,Standard,,2023-11-16T11:34:26+0000,lichess_DrNykterstein_2023-11-16.pgn


In [5]:
columns_to_keep = ['game_id','move_no', 'move_no_pair', 'move_sequence','notation', 'piece', 'color'] 
s_df = df1[columns_to_keep]
s_df.head(10)

Unnamed: 0,game_id,move_no,move_no_pair,move_sequence,notation,piece,color
0,67512e0d-96dc-4e3e-8d86-c208cdc372c7,1,1,d4,d4,P,White
1,67512e0d-96dc-4e3e-8d86-c208cdc372c7,2,1,d4|g6,g6,P,Black
2,67512e0d-96dc-4e3e-8d86-c208cdc372c7,3,2,d4|g6|c4,c4,P,White
3,67512e0d-96dc-4e3e-8d86-c208cdc372c7,4,2,d4|g6|c4|Bg7,Bg7,B,Black
4,67512e0d-96dc-4e3e-8d86-c208cdc372c7,5,3,d4|g6|c4|Bg7|Nc3,Nc3,N,White
5,67512e0d-96dc-4e3e-8d86-c208cdc372c7,6,3,d4|g6|c4|Bg7|Nc3|Nf6,Nf6,N,Black
6,67512e0d-96dc-4e3e-8d86-c208cdc372c7,7,4,d4|g6|c4|Bg7|Nc3|Nf6|Bh6,Bh6,B,White
7,67512e0d-96dc-4e3e-8d86-c208cdc372c7,8,4,d4|g6|c4|Bg7|Nc3|Nf6|Bh6|O-O,O-O,K,Black
8,67512e0d-96dc-4e3e-8d86-c208cdc372c7,9,5,d4|g6|c4|Bg7|Nc3|Nf6|Bh6|O-O|Bxg7,Bxg7,B,White
9,67512e0d-96dc-4e3e-8d86-c208cdc372c7,10,5,d4|g6|c4|Bg7|Nc3|Nf6|Bh6|O-O|Bxg7|Kxg7,Kxg7,K,Black


In [6]:
columns_to_keep = ['game_id', 'result'] 
v_df = df2[columns_to_keep]
v_df.head(10)

Unnamed: 0,game_id,result
0,67512e0d-96dc-4e3e-8d86-c208cdc372c7,1-0
1,31889391-a167-4dae-9b75-277239b1f8d0,0-1
2,c3886b21-f98d-424e-9cff-0c405c4a5a11,1-0
3,3b7b3143-6348-46a1-a12c-7920c7980ddc,0-1
4,385deafb-e2c7-4ca0-9ccd-e7ae630dd39b,1-0
5,79e582e3-bcc2-4a63-a00f-d7e535c7b260,0-1
6,d53e2a32-37ba-47f9-b1f7-6ee727e586cd,1-0
7,7b974ac1-5edc-4fee-8fe7-a7d46a8e6007,0-1
8,9a0f820b-c58c-4eb5-92b5-2c141561d1b9,1-0
9,4716f3f8-40ae-4caf-a061-2467acba4734,0-1


In [7]:
merged_df = pd.merge(s_df, v_df, on="game_id")
merged_df.head()

Unnamed: 0,game_id,move_no,move_no_pair,move_sequence,notation,piece,color,result
0,67512e0d-96dc-4e3e-8d86-c208cdc372c7,1,1,d4,d4,P,White,1-0
1,67512e0d-96dc-4e3e-8d86-c208cdc372c7,2,1,d4|g6,g6,P,Black,1-0
2,67512e0d-96dc-4e3e-8d86-c208cdc372c7,3,2,d4|g6|c4,c4,P,White,1-0
3,67512e0d-96dc-4e3e-8d86-c208cdc372c7,4,2,d4|g6|c4|Bg7,Bg7,B,Black,1-0
4,67512e0d-96dc-4e3e-8d86-c208cdc372c7,5,3,d4|g6|c4|Bg7|Nc3,Nc3,N,White,1-0


In [8]:
new_df = merged_df[(merged_df['move_no_pair'] == 1) & (merged_df['move_no'] == 2)]
new_df.head(20)

Unnamed: 0,game_id,move_no,move_no_pair,move_sequence,notation,piece,color,result
1,67512e0d-96dc-4e3e-8d86-c208cdc372c7,2,1,d4|g6,g6,P,Black,1-0
44,31889391-a167-4dae-9b75-277239b1f8d0,2,1,f4|g6,g6,P,Black,0-1
126,c3886b21-f98d-424e-9cff-0c405c4a5a11,2,1,e4|e5,e5,P,Black,1-0
191,3b7b3143-6348-46a1-a12c-7920c7980ddc,2,1,e4|Nc6,Nc6,N,Black,0-1
331,385deafb-e2c7-4ca0-9ccd-e7ae630dd39b,2,1,e4|Nf6,Nf6,N,Black,1-0
438,79e582e3-bcc2-4a63-a00f-d7e535c7b260,2,1,Nf3|g6,g6,P,Black,0-1
528,d53e2a32-37ba-47f9-b1f7-6ee727e586cd,2,1,Nf3|d5,d5,P,Black,1-0
581,7b974ac1-5edc-4fee-8fe7-a7d46a8e6007,2,1,d4|Nf6,Nf6,N,Black,0-1
641,9a0f820b-c58c-4eb5-92b5-2c141561d1b9,2,1,Nf3|Nf6,Nf6,N,Black,1-0
708,4716f3f8-40ae-4caf-a061-2467acba4734,2,1,e4|c6,c6,P,Black,0-1


In [9]:
new_df['result'] = new_df['result'].replace(to_replace='1-0',value= 1) # white wins 
new_df['result'] = new_df['result'].replace(to_replace='0-1',value= 0) # black wins 
ml_df = new_df[new_df['result'] != '1/2-1/2'] # drop draws
ml_df.head()

Unnamed: 0,game_id,move_no,move_no_pair,move_sequence,notation,piece,color,result
1,67512e0d-96dc-4e3e-8d86-c208cdc372c7,2,1,d4|g6,g6,P,Black,1
44,31889391-a167-4dae-9b75-277239b1f8d0,2,1,f4|g6,g6,P,Black,0
126,c3886b21-f98d-424e-9cff-0c405c4a5a11,2,1,e4|e5,e5,P,Black,1
191,3b7b3143-6348-46a1-a12c-7920c7980ddc,2,1,e4|Nc6,Nc6,N,Black,0
331,385deafb-e2c7-4ca0-9ccd-e7ae630dd39b,2,1,e4|Nf6,Nf6,N,Black,1


In [10]:
columns_left = ['move_no', 'move_no_pair','result', 'move_sequence'] 
fin_df = ml_df[columns_left]
fin_df.head(20)

Unnamed: 0,move_no,move_no_pair,result,move_sequence
1,2,1,1,d4|g6
44,2,1,0,f4|g6
126,2,1,1,e4|e5
191,2,1,0,e4|Nc6
331,2,1,1,e4|Nf6
438,2,1,0,Nf3|g6
528,2,1,1,Nf3|d5
581,2,1,0,d4|Nf6
641,2,1,1,Nf3|Nf6
708,2,1,0,e4|c6


In [11]:
minobs = min(fin_df.value_counts('result').values)
fin_df = fin_df.groupby('result').sample(n=minobs, random_state=1).sample(frac=1, random_state=1)
print(fin_df['result'].value_counts())

0    4095
1    4095
Name: result, dtype: int64


In [12]:
fin_df['result'] = fin_df['result'].astype(np.int8)
fin_df.dtypes

move_no           int64
move_no_pair      int64
result             int8
move_sequence    object
dtype: object

In [13]:
next_df = pd.get_dummies(fin_df)
next_df

Unnamed: 0,move_no,move_no_pair,result,move_sequence_Nc3|Nf6,move_sequence_Nc3|b5,move_sequence_Nc3|b6,move_sequence_Nc3|c5,move_sequence_Nc3|c6,move_sequence_Nc3|d5,move_sequence_Nc3|e5,...,move_sequence_g3|e6,move_sequence_g3|g6,move_sequence_g3|h5,move_sequence_g4|a5,move_sequence_g4|c5,move_sequence_g4|d5,move_sequence_g4|d6,move_sequence_g4|e5,move_sequence_g4|g6,move_sequence_h4|d5
591037,2,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
449235,2,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
211109,2,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
598697,2,1,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
497836,2,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
275853,2,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
726288,2,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
520802,2,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
548467,2,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
next_df.value_counts('result') # just a double check

result
0    4095
1    4095
dtype: int64

In [15]:
X = np.asarray(next_df.iloc[:,3:]) # Important, here the variables defined what should take into account 
X                                  # Only the move sequences should be taken

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

In [16]:
Xmax = X.max(0)

In [17]:
X = X / Xmax

print(X)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [18]:
Y = np.asarray(next_df['result'], dtype='int8')
print(Y)

[1 1 0 ... 0 0 1]


In [19]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

print(X.shape, Y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15)
print('training size:', X_train.shape[0], 
      'testing size:', X_test.shape[0],
      'label counts:', np.unique(y_train, return_counts=True)[1])

clf = MLPClassifier(hidden_layer_sizes=(100,20,),
                    max_iter=50).fit(X_train, y_train)

print('score train:', clf.score(X_train, y_train))
print('score test: ', clf.score(X_test, y_test))

(8190, 154) (8190,)
training size: 6961 testing size: 1229 label counts: [3458 3503]
score train: 0.5912943542594454
score test:  0.580960130187144


In [20]:
print('pred prob: ', clf.predict_proba(X[:1]))
print('pred class:', clf.predict(X[:1]))

pred prob:  [[0.41359949 0.58640051]]
pred class: [1]


### What does this mean?
The model estimates there's approximately a 41.36% chance that the outcome is a win for black (class 0) and a 58.64% chance that it's a win for white.

Since the prediction is 1, the model predicts that the outcome for this particular chess game state is a win for white.

## Upcoming steps
My project's next steps will be carried out on a local device. As a result, the following libraries should be installed on the local device: scikit learn pandas flask gunicorn The code from the Jupyter notebook is copied into the function predict_outcome() of the file flaskapp_Rueckerl.py. (The model is also preserved there in a pickle file so that we can use it again later to handle additional inputs). 

A small website is established where you may enter an ID and move order to guess who will win using the ML model described above. 
The rest of the code is in the Flask file.

#### Here a quick summarise:
- Init DB initializes the database
- List Game Data contains data that needs to be inserted via the tab "Enter Game Data"
- Creating a small website whith the following tabs : List Game Data, Ger Game Prediction, Enter Game Data, Init DB
- Get Game Prediction: here we can insert an ID which belongs to the entered game data so that we can call our modell that is saved to the pickle file and get our Result

### Toughest task
Integrating my ML model into the Flask app proved to be quite a journey. The main issue stemmed from encoding the move_sequence values for newly inserted data. Move secuence values are simply strings that need to be encoded, just like in the training model. In training, the get_dummies function effectively created columns for each unique move sequence, but this approach wasn't feasible for new, single-entry data in the Flask app. In retrospect, the solution may appear straightforward, but it took some time to understand where the issue lies and how to solve it.
To solve this, I replicated the get_dummies effect using a loop that simulated the same encoding process, ensuring the model received data in the correct format for accurate predictions. This adaptation was the key to aligning the data processing of the Flask app with the requirements of the training model. In the end, I'm pleased that I got a quite good result and will work on it in my spare time


I also created two robot.txt files (r1_jr and r2_jr). The first, r1, creates the Database, while the second, enters a move sequence into "Enter Game Data" with ID 2401 and afterwards clicks get game prediction and display the result for 5 seconds.

- 1. Read this Jupyter Notebook
- 2. Call the attacheed flaskapp_rueckerl.py on the terminal - than insert the adress (http://localhost:8088) into the chrome browser
- 3. On another terminal execute r1_jr.txt and r1_jr.txt 
- 5. Feel free to create a new ID and insert an opening of your choice

Don't hesitate to load your own chess data and try out some of your favourite moves. They may be better than you would expect.