# **Winning the Premier League**
---
#### _Data analysis_
Turns out we don't need to analyze anything as the only data provided prior to the match actually happening is the team names

#### _Approach_
Dropping all the data in favour of the actual team names is the only logical approach as the model can only take in team names during production to predict the winner of a match that is yet to come. 

### _Reading data and dropping unwanted columns_

In [170]:
import pandas as pd
df = pd.read_csv('premierleague.csv')
df = df[['Home Team', 'Away Team', 'Winner']]
df

Unnamed: 0,Home Team,Away Team,Winner
0,MAN UTD,SWANSEA,Away
1,WEST BROM,SUNDERLAND,Draw
2,LEICESTER CITY,EVERTON,Draw
3,WEST HAM,TOTTENHAM,Away
4,QPR,HULL CITY,Away
...,...,...,...
2655,ARSENAL,BRIGHTON,Home
2656,SHEFFIELD UTD,BURNLEY,Home
2657,LEICESTER CITY,TOTTENHAM,Away
2658,WEST HAM,SOUTHAMPTON,Home


### _Encoding teams and output with sklearn lable encoder_

In [171]:
from sklearn.preprocessing import LabelEncoder
teamEncoder = LabelEncoder()
teamEncoder.fit(df['Home Team'])

LabelEncoder()

In [172]:
winnerEncoder = LabelEncoder()
winnerEncoder.fit(df['Winner'].values)

LabelEncoder()

In [173]:
df['Home Team'] = teamEncoder.transform(df['Home Team'])
df['Away Team'] = teamEncoder.transform(df['Away Team'])
df['Winner'] = winnerEncoder.transform(df['Winner'])

### _Data processing_

In [174]:
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
import numpy as np
from sklearn.preprocessing import StandardScaler

x = df.drop('Winner', axis=1)

sc = StandardScaler()
sc.fit(x.values)
x = pd.DataFrame(sc.transform(x))

y = df['Winner']
y = to_categorical(y)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=32)




### _Network_

In [175]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(8, activation='relu', input_shape=(2, )))
model.add(Dense(6, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_45 (Dense)            (None, 8)                 24        
                                                                 
 dense_46 (Dense)            (None, 6)                 54        
                                                                 
 dense_47 (Dense)            (None, 3)                 21        
                                                                 
Total params: 99
Trainable params: 99
Non-trainable params: 0
_________________________________________________________________


In [176]:
model.fit(x_train, y_train, epochs=10, validation_data=[x_test, y_test], batch_size=40)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x25d2e04e100>

## _Predicting a value_

In [177]:
xpred = teamEncoder.transform(['MAN CITY', 'SOUTHAMPTON'])
xpred = sc.transform([xpred])
y = model.predict(xpred)
y = y.argmax(axis=-1)
value = winnerEncoder.inverse_transform(y)[0]
value

'Home'