# Project Introduction
Pokemon series created by Nintendo is one of the most popular game series in the world. People can capture Pokemon in the game world and train them to combat with others' Pokemon. There are thousands of Pokemon with totally different properties and of course, some of them are stronger than the others.
The main purpose of this project is training a model to predict the winner of combats between Pokemons. We downloaded the Pokemon data and combat data from https://www.kaggle.com/sekarmg/pokemon/data which include 800 Pokemon's information with their properties and 50000 combat results.

There are two parts of this project. 
* The first part is a warm-up. In this part, we will train a model to predict if a Pokemon is 'Legendary' according to its properties by SVM.
* In the second part, we would like to construct a model to predict the combat result as we mentioned above. We will try different ways to train the data in order to make the predicting accuracy as high as possible.

## Pakcages Import

In [27]:
# import packages
import pandas as pd
import numpy as np
from numpy import argmax
from sklearn import svm
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelBinarizer
from sklearn import linear_model
import numpy.polynomial.polynomial as poly

## Part I: Pokemon Legendary Prediction

### Data Specification
Read data from Pokemon dataset.

In [28]:
# read pokemon data from dataset
pk = pd.read_csv('Pokemon.csv', na_values = ['', ' ', 'NaN', np.nan], index_col = 0)

The Pokemon dataset includes name, types and properties of each Pokemon.

In [29]:
pk.head(6)

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
4,Mega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
5,Charmander,Fire,,39,52,43,60,50,65,1,False
6,Charmeleon,Fire,,58,64,58,80,65,80,1,False


Remove the 'name' column of the dataset.

In [30]:
pk = np.array(pk)[:,1:]

We encode the type of each Pokemon by 'One-hot Coding' with sklearn preprossing model.

In [31]:
# Data preprocessing
# merge two type into one
type_merge = []
for pair in zip(pk[:,0], pk[:,1]):
    pair = set(pair)
    type_merge.append(pair)
values = np.array(type_merge)
# transform types into one hot coding
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
onehot_encoder = OneHotEncoder(sparse = False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
onehot_len = onehot_encoded.shape[1]
pk = np.delete(pk, [0,1], 1)
pk = np.hstack((onehot_encoded, pk))
# transform bool to int
lb = LabelBinarizer()
pk[:,-1] = np.transpose(lb.fit_transform(pk[:,-1].tolist()))

In [32]:
# Take the first 160 rows (20%) to test
Xts = pk[:160,:-1]
Xtr = pk[161:,:-1]
yts = pk[:160,-1]
ytr = pk[161:,-1]

We apply SVM with kernel "rbf" to train the model.

In [33]:
svc = svm.SVC(probability = False, kernel = "rbf", C = 100, gamma = 1, verbose = False)
svc.fit(Xtr, ytr.astype(float))
yhat = svc.predict(Xts)
acc = np.mean(yhat == yts)
print('Accuracy = {0:f}'.format(acc))

Accuracy = 0.981250


As shown above, the accuracy showed that our model works pretty well.

## Part II: Pokemon Combat Result Prediction

In this part, we would firstly read combat data.

In [34]:
# read combats data from dataset
cb = pd.read_csv('combats.csv', na_values = ['', ' ', 'NaN', np.nan])

pk = np.delete(pk, -1, 1) # Delete the Legendary column
pk = np.delete(pk, -1, 1) # Delete the Generation column

cb = np.array(cb)

# Training data and test data
Xtr_cb = cb[:cb.shape[0]*4//5]
Xts_cb = cb[cb.shape[0]*4//5+1:]
ytr = []
yts = []
# If the first Pokemon win, label y as 1, if the second Pokemon win label y as 0
for pair in Xtr_cb:
    if pair[2] == pair[0]: ytr.append(1)
    else: ytr.append(0)
for pair in Xts_cb:
    if pair[2] == pair[0]: yts.append(1)
    else: yts.append(0)
ytr = np.array(ytr)
yts = np.array(yts)

(800, 527)
521


To describe the relationship between two sides of a combat, calculating the difference of properties between two sides of each combat is a direct thought. Besides, the difference between types of two Pokemon cannot be calculated directly, we treated them as two features of data which need to be trained.

In [36]:
Xtr_new = np.zeros((Xtr_cb.shape[0], pk.shape[1]-1+onehot_len))
Xts_new = np.zeros((Xts_cb.shape[0], pk.shape[1]-1+onehot_len))
for i, pair in enumerate(Xtr_cb):
    Xtr_new[i] = np.concatenate((pk[pair[0]-1,1:onehot_len+1], pk[pair[1]-1,1:onehot_len+1], pk[pair[0]-1,onehot_len+1:]-pk[pair[1]-1,onehot_len+1:]))
for i, pair in enumerate(Xts_cb):
    Xts_new[i] = np.concatenate((pk[pair[0]-1,1:onehot_len+1], pk[pair[1]-1,1:onehot_len+1], pk[pair[0]-1,onehot_len+1:]-pk[pair[1]-1,onehot_len+1:]))

We firstly applied LogisticRegression() to train the data since this problem is a classic binary classification problem.

In [37]:
logreg = linear_model.LogisticRegression()
logreg.fit(Xtr_new, ytr)
yhat = logreg.predict(Xts_new)
acc = np.mean(yhat == yts)
print("Accuracy = {0:f}".format(acc))

Accuracy = 0.890089


The accuracy is 89%, OK but not good enough. We tried linear SVM then.

In [8]:
svc = svm.SVC(probability = False, kernel = "linear")
svc.fit(Xtr_new, ytr.astype(float))
yhat = svc.predict(Xts_new)
acc = np.mean(yhat == yts)
print('Accuracy = {0:f}'.format(acc))

Accuracy = 0.915292


The result increased by 2.5% to 91.5%, which is better than previous method.

Accuracy = 0.524052
