# Machine Learning Classification Model

##### Creating Classification Models for data available on https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29

##### Trying to predict the chance of White-can-win ("won") and White-cannot-win ("nowin").


##### Describing the dataset: For more informations about how the dataset is organizated see in the file below:

In [10]:
#importing the first modules 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm

with open('dataset/kr-vs-kp.names','r') as arch:
    data_info=arch.readlines()
    for x in data_info:
        print(x)


1. Title: Chess End-Game -- King+Rook versus King+Pawn on a7

   (usually abbreviated KRKPA7).  The pawn on a7 means it is one square

   away from queening.  It is the King+Rook's side (white) to move.



2. Sources:

    (a) Database originally generated and described by Alen Shapiro.

    (b) Donor/Coder: Rob Holte (holte@uottawa.bitnet).  The database

        was supplied to Holte by Peter Clark of the Turing Institute

        in Glasgow (pete@turing.ac.uk).

    (c) Date: 1 August 1989



3. Past Usage:

     - Alen D. Shapiro (1983,1987), "Structured Induction in Expert Systems",

       Addison-Wesley.  This book is based on Shapiro's Ph.D. thesis (1983)

       at the University of Edinburgh entitled "The Role of Structured

       Induction in Expert Systems".

     - Stephen Muggleton (1987), "Structuring Knowledge by Asking Questions",

       pp.218-229 in "Progress in Machine Learning", edited by I. Bratko

       and Nada Lavrac, Sigma Press, Wilmslow, England  SK9 5BB.

In [11]:
#import the dataset used
data=pd.read_csv('dataset/kr-vs-kp.data',names=[x for x in range(0,37)])
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,27,28,29,30,31,32,33,34,35,36
0,f,f,f,f,f,f,f,f,f,f,...,f,f,f,f,f,f,t,t,n,won
1,f,f,f,f,t,f,f,f,f,f,...,f,f,f,f,f,f,t,t,n,won
2,f,f,f,f,t,f,t,f,f,f,...,f,f,f,f,f,f,t,t,n,won
3,f,f,f,f,f,f,f,f,t,f,...,f,f,f,f,f,f,t,t,n,won
4,f,f,f,f,f,f,f,f,f,f,...,f,f,f,f,f,f,t,t,n,won


In [12]:
#Starting to convert the data in integer types to augment the processing 
from sklearn.preprocessing import LabelEncoder
trf=LabelEncoder()
data_converted=data.apply(lambda col: trf.fit_transform(col))
data_converted=data_converted.drop(36,axis=1)
data_converted.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,26,27,28,29,30,31,32,33,34,35
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,0
1,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,0
2,0,0,0,0,1,0,1,0,0,0,...,0,0,0,0,0,0,0,1,1,0
3,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,1,1,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,0


##### Splitting the dataset

In [13]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(data_converted,data[36],
                                               random_state=0,test_size=0.15)

#### Importing the modules for classification


In [14]:
#KNN Classifier
from sklearn.neighbors import KNeighborsClassifier
#Logistic Regression Classifier
from sklearn.linear_model import LogisticRegression
#Support Vector Machine and Kernellized Vector Machine
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
#Decision Tree, Random Forests(parallel series learning) and 
#Gradient Boosting(Series of trees learning)
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
#Naïves Bayes Gaussian Classifier
from sklearn.naive_bayes import GaussianNB
#Neural Network and Beggining of deep learning classifier
from sklearn.neural_network import MLPClassifier

##### Importing Evaluation metrics

In [15]:
#Individual scores
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
#Complete tables
from sklearn.metrics import classification_report

##### KNN model

In [49]:
knn_max_train=0
knn_max_test=0
i_max=0
for i in np.arange(1,31,1):
    knn= KNeighborsClassifier(n_neighbors=i).fit(X_train,y_train)
    if knn.score(X_test,y_test)>knn.score(X_train,y_train):
        if knn.score(X_test,y_test)>knn_max_test:
            knn_max_train=knn.score(X_train,y_train)
            knn_max_test=knn.score(X_test,y_test)
            i_max=i
        else:
            continue
    else:
        continue
print(f'The best number of neighbors is {i_max} with these scores:\n\
training:\t{knn_max_train:.3f}\ntest:\t\t{knn_max_test:.3f}')

The best number of neighbors is 8 with these scores:
training:	0.975
test:		0.979


In [None]:
#As we saw above the best number among the 