# Data Sprint #39: Chess Endgame

#### Problem Statement
 
Chess endgames are complex domains which are enumerable. Endgame databases are tables of stored game-theoretic values for the enumerated elements (legal positions) of the domain. The game-theoretic values stored denote whether or not positions are won for either side, or include also the depth of win (number of moves) assuming minimax-optimal play.

#### Objective
 
Build a machine learning model to calculate the depth of win (i.e. the number of moves required to win the game).

### Import Libraries

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

Import Train dataset

In [2]:
df = pd.read_csv('Train_Data.csv')

Basic EDA

In [3]:
df.head()

Unnamed: 0,White King file,White King rank,White Rook file,White Rook rank,Black King file,Black King rank,optimal depth-of-win
0,d,3,f,5,a,5,nine
1,d,3,e,5,f,2,eight
2,d,1,g,6,d,7,thirteen
3,c,2,e,8,a,4,ten
4,d,4,a,8,b,1,eight


Separating target and variables

In [4]:
X = df.drop(columns=['optimal depth-of-win'])
y = df['optimal depth-of-win']

Replacing the categorical names for numbers

In [5]:
from sklearn.preprocessing import LabelEncoder
lb = LabelEncoder()
X['White King file'] = lb.fit_transform(X['White King file'])  
X['White Rook file'] = lb.fit_transform(X['White Rook file'])
X['Black King file'] = lb.fit_transform(X['Black King file'])

Reducing every variable to the same scale

In [6]:
from sklearn.preprocessing import MinMaxScaler # Normalizer
scaler = MinMaxScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns = (X.columns))

Balancing data

In [7]:
from collections import Counter
from imblearn.over_sampling import SMOTE
smote = SMOTE() # equal between classes
X_train_ns, y_train_ns = smote.fit_resample(X, y)

Aplying the XGBoost to the dataset and training the model

In [8]:
from xgboost import XGBClassifier
xgb = XGBClassifier()
xgb.fit(X_train_ns, y_train_ns)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=0, num_parallel_tree=1,
              objective='multi:softprob', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=None, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

Import Test Dataset and aplying the same changes done for Train Dataset

In [9]:
test = pd.read_csv('Test_Data.csv')

In [10]:
test['White King file'] = lb.fit_transform(test['White King file']) 
test['White Rook file'] = lb.fit_transform(test['White Rook file'])
test['Black King file'] = lb.fit_transform(test['Black King file'])

In [11]:
test = pd.DataFrame(scaler.fit_transform(test), columns = (test.columns))

Making predictions

In [12]:
Y_pred = xgb.predict(test)

Saving predictions to CSV file

In [None]:
res = pd.DataFrame(Y_pred) 
res.columns = ["prediction"]
res.to_csv("XGBsubmission.csv", index=False) 

This is the short version, the final work, but there where many phases to get to this code. This is just an example of a solution to solve this DataSprint.

Hope that can help you guys.