* **Introduction**

The aim is to predict outcome between two players using the XGboost method. After inputs the information of two players, you can see whether player_1 will win or not predicted by the model. 

In [None]:
#import the library
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from termcolor import colored
import warnings
warnings.filterwarnings('ignore')

* **Data preprocessing**

Let's load the match data as "dataset" and explore the first 5 rows

In [None]:
df = pd.read_csv('../input/sc2-matches-history.csv')
df.head()

Based on the information in the data. I think I can build a clasification model to predict whether player_1 will win.
I choose column 'player_1_match_status' as the dependent variable and encoded it. ( '1' means player_1 win and '0' means player_1 lose)

I choose columns 'player_1', 'player_2', 'player_1_race', 'player_2_race', 'addon', 'tournament_type' as the independent variables and encoded them.

Then split the data into training data and test data

In [None]:
#data process
X = df[['player_1', 'player_2', 'player_1_race', 'player_2_race', 'addon', 'tournament_type']]
y = df['player_1_match_status']
# Encoding the categorical data
labelencoder_X_1 = LabelEncoder()
X['player_1'] = labelencoder_X_1.fit_transform(X['player_1'])
labelencoder_X_2 = LabelEncoder()
X['player_2'] = labelencoder_X_2.fit_transform(X['player_2'])
labelencoder_X_3 = LabelEncoder()
X['player_1_race'] = labelencoder_X_3.fit_transform(X['player_1_race'])
labelencoder_X_4 = LabelEncoder()
X['player_2_race'] = labelencoder_X_4.fit_transform(X['player_2_race'])
labelencoder_X_5 = LabelEncoder()
X['addon'] = labelencoder_X_5.fit_transform(X['addon'])
labelencoder_X_6 = LabelEncoder()
X['tournament_type'] = labelencoder_X_6.fit_transform(X['tournament_type'])
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
y=pd.Series(y)
# Splitting the dataset into the Training set and Validation set
Xt, Xv, yt, yv = train_test_split(X, y, test_size = 0.25, random_state = 0)
dt = xgb.DMatrix(Xt.as_matrix(),label=yt.as_matrix())
dv = xgb.DMatrix(Xv.as_matrix(),label=yv.as_matrix())

* **Model Selection and Prediction**

I tried some models for this classification problem and found XGBoost should be better.

After using grid search to get the best parameters in my own machine(It took so much time, so I did not put all available parameters here), I sucessfully built the XGBoost model and use it to fit and predict.

The AUC score is only about 0.75. But considering that results in matches are so random and there are so many outer influence factors, I think it is good enough.

After making the confusion matrix, the prediction accuracy on the test set is around 70%. I think it is quite enough for predicting the outcome of a match.


In [None]:
#Build the model
params = {
    "eta": 0.2,
    "max_depth": 4,
    "objective": "binary:logistic",
    "silent": 1,
    "base_score": np.mean(yt),
    'n_estimators': 1000,
    "eval_metric": "logloss"
}
model = xgb.train(params, dt, 3000, [(dt, "train"),(dv, "valid")], verbose_eval=400)

In [None]:
#Prediction on validation set
y_pred = model.predict(dv)

# Making the Confusion Matrix
cm = confusion_matrix(yv, (y_pred>0.5))
print(colored('The Confusion Matrix is: ', 'red'),'\n', cm)
# Calculate the accuracy on test set
predict_accuracy_on_test_set = (cm[0,0] + cm[1,1])/(cm[0,0] + cm[1,1]+cm[1,0] + cm[0,1])
print(colored('The Accuracy on Test Set is: ', 'blue'), colored(predict_accuracy_on_test_set, 'blue'))

* **Predict the outcome of any two players**

Let's input any two player's information and predict the outcome.

In [None]:
# Input the data you want to predict
print("please input the folowing information:player_1_name")
player_1_name = input("Player_1_name:")
print("please input the folowing information:player_2_name")
player_2_name = input("Player_2_name:")
print("please input the folowing information:player_1_race")
player_1_race = input("Player_1_race:")
print("please input the folowing information:player_2_race")
player_2_race = input("Player_2_race:")
print("please input the folowing information:addon")
addon = input("Addon:")
print("please input the folowing information:tournament_type")
tournament_type = input("Tournament_type:")
#  Encoding categorical data
player_1_name = labelencoder_X_1.transform(np.array([[player_1_name]]))
player_2_name = labelencoder_X_2.transform(np.array([[player_2_name]]))
player_1_race = labelencoder_X_3.transform(np.array([[player_1_race]]))
player_2_race = labelencoder_X_4.transform(np.array([[player_2_race]]))
addon = labelencoder_X_5.transform(np.array([[addon]]))
tournament_type = labelencoder_X_6.transform(np.array([[tournament_type]]))

In [None]:
# Make prediction
new_prediction = model.predict(xgb.DMatrix([[int(player_1_name), int(player_2_name), int(player_1_race), int(player_2_race), int(addon) , int(tournament_type)]]))
if(new_prediction > 0.5):
    print(labelencoder_X_1.inverse_transform(player_1_name), " should be winner")
else:
    print(labelencoder_X_2.inverse_transform(player_2_name), " should be winner")

* **Thank you**