# Mchezopesa Football Outcome Predictions

## Defining the Question

### a) Problem Statement

> Mchezopesa Limited a company within the sports gambling world wishes build a model that can be used to predict the results of any game between two teams, Team 1 and Team 2.

> There are two possible approaches to this problem (as  shown below) given the datasets provided:

> **Approach 1: Polynomial approach** - What to train given:

            Rank of home team
            Rank of away team
            Tournament type

    Model 1: Predict how many goals the home team scores.

    Model 2: Predict how many goals the away team scores.

> **Approach 2: Logistic approach** - Feature Engineering 

    Figure out from the home team’s perspective if the game is a Win, Lose or Draw (W, L, D)


### b) Success Metrics
* Accuracy score of above 80%
* Least possible RMSE value
* Identifying the best Model


### c) Understanding the context 

> After a long period testing and analysing the best way to calculate the FIFA/Coca-Cola World Ranking, a new model took effect in August 2018 after approval by the FIFA Council.

> This new version developed by FIFA was named "SUM" as it relies on adding/subtracting points won or lost for a game to/from the previous point totals rather than averaging game points over a given time period as in the previous version of the World Ranking.

> The points which are added or subtracted are partially determined by the relative strength of the two opponents, including the logical expectation that teams higher in the ranking should fare better against teams lower in the ranking.

This shall be the basis of our analysis.

### d) Recording the Experimental Design

* Cleaning the data
* Merging Datasets
* Perform any necessary feature engineering
* Check of multicollinearity
* Building the model
* Cross-validation of the model
* Compute RMSE
* Creating residual plots for the models to assess heteroscedasticity using Bartlett’s test
* Conclusion

### e) Data Relevance

For this study there are two datasets available.
* Team Ranking provided by Fifa
* Previous Match Resuslts

These two will provide the perception of the team in terms of rank and previous results of matches. Combining the two provides needed input for the model.

## Reading the Data and Loading Dependencies

In [2]:
# DEPENDENCIES

# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
import datetime as dt

# ML Processes

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, f1_score, accuracy_score, confusion_matrix

  import pandas.util.testing as tm


In [3]:
# Loading data
team_ranking = pd.read_csv('/content/fifa_ranking.csv')
match_results = pd.read_csv('/content/results.csv')


## Checking the Data

Fifa Team Ranking Data

In [5]:
# No of records in our dataset
team_ranking.shape

(57793, 16)

In [6]:
# Previewing the top of our dataset
team_ranking.head()

Unnamed: 0,rank,country_full,country_abrv,total_points,previous_points,rank_change,cur_year_avg,cur_year_avg_weighted,last_year_avg,last_year_avg_weighted,two_year_ago_avg,two_year_ago_weighted,three_year_ago_avg,three_year_ago_weighted,confederation,rank_date
0,1,Germany,GER,0.0,57,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,UEFA,1993-08-08
1,2,Italy,ITA,0.0,57,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,UEFA,1993-08-08
2,3,Switzerland,SUI,0.0,50,9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,UEFA,1993-08-08
3,4,Sweden,SWE,0.0,55,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,UEFA,1993-08-08
4,5,Argentina,ARG,0.0,51,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08


In [7]:
# Previewing the bottom of our dataset
team_ranking.tail()

Unnamed: 0,rank,country_full,country_abrv,total_points,previous_points,rank_change,cur_year_avg,cur_year_avg_weighted,last_year_avg,last_year_avg_weighted,two_year_ago_avg,two_year_ago_weighted,three_year_ago_avg,three_year_ago_weighted,confederation,rank_date
57788,206,Anguilla,AIA,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONCACAF,2018-06-07
57789,206,Bahamas,BAH,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONCACAF,2018-06-07
57790,206,Eritrea,ERI,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CAF,2018-06-07
57791,206,Somalia,SOM,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CAF,2018-06-07
57792,206,Tonga,TGA,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,OFC,2018-06-07


In [9]:
# Checking whether each column has an appropriate datatype
team_ranking.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57793 entries, 0 to 57792
Data columns (total 16 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   rank                     57793 non-null  int64  
 1   country_full             57793 non-null  object 
 2   country_abrv             57793 non-null  object 
 3   total_points             57793 non-null  float64
 4   previous_points          57793 non-null  int64  
 5   rank_change              57793 non-null  int64  
 6   cur_year_avg             57793 non-null  float64
 7   cur_year_avg_weighted    57793 non-null  float64
 8   last_year_avg            57793 non-null  float64
 9   last_year_avg_weighted   57793 non-null  float64
 10  two_year_ago_avg         57793 non-null  float64
 11  two_year_ago_weighted    57793 non-null  float64
 12  three_year_ago_avg       57793 non-null  float64
 13  three_year_ago_weighted  57793 non-null  float64
 14  confederation         

In [10]:
# Further statistical description of the data
team_ranking.describe()

Unnamed: 0,rank,total_points,previous_points,rank_change,cur_year_avg,cur_year_avg_weighted,last_year_avg,last_year_avg_weighted,two_year_ago_avg,two_year_ago_weighted,three_year_ago_avg,three_year_ago_weighted
count,57793.0,57793.0,57793.0,57793.0,57793.0,57793.0,57793.0,57793.0,57793.0,57793.0,57793.0,57793.0
mean,101.628086,122.068637,332.302926,-0.009897,61.798602,61.798602,61.004602,30.502377,59.777462,17.933277,59.173916,11.834811
std,58.618424,260.426863,302.872948,5.804309,138.014883,138.014883,137.688204,68.844143,136.296079,40.888849,135.533343,27.106675
min,1.0,0.0,0.0,-72.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,51.0,0.0,56.0,-2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,101.0,0.0,272.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,152.0,92.79,525.0,1.0,32.25,32.25,26.66,13.33,21.5,6.45,21.25,4.25
max,209.0,1775.03,1920.0,92.0,1158.66,1158.66,1169.57,584.79,1159.71,347.91,1200.77,240.15


Match Result Data

In [11]:
# No of records in our dataset
match_results.shape

(40839, 9)

In [12]:
# Previewing the top of our dataset
match_results.head()

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1872-11-30,Scotland,England,0,0,Friendly,Glasgow,Scotland,False
1,1873-03-08,England,Scotland,4,2,Friendly,London,England,False
2,1874-03-07,Scotland,England,2,1,Friendly,Glasgow,Scotland,False
3,1875-03-06,England,Scotland,2,2,Friendly,London,England,False
4,1876-03-04,Scotland,England,3,0,Friendly,Glasgow,Scotland,False


In [13]:
# Previewing the bottom of our dataset
match_results.tail()

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
40834,2019-07-18,American Samoa,Tahiti,8,1,Pacific Games,Apia,Samoa,True
40835,2019-07-18,Fiji,Solomon Islands,4,4,Pacific Games,Apia,Samoa,True
40836,2019-07-19,Senegal,Algeria,0,1,African Cup of Nations,Cairo,Egypt,True
40837,2019-07-19,Tajikistan,North Korea,0,1,Intercontinental Cup,Ahmedabad,India,True
40838,2019-07-20,Papua New Guinea,Fiji,1,1,Pacific Games,Apia,Samoa,True


In [14]:
# Checking whether each column has an appropriate datatype
match_results.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40839 entries, 0 to 40838
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   date        40839 non-null  object
 1   home_team   40839 non-null  object
 2   away_team   40839 non-null  object
 3   home_score  40839 non-null  int64 
 4   away_score  40839 non-null  int64 
 5   tournament  40839 non-null  object
 6   city        40839 non-null  object
 7   country     40839 non-null  object
 8   neutral     40839 non-null  bool  
dtypes: bool(1), int64(2), object(6)
memory usage: 2.5+ MB


In [15]:
# Further statistical description of the data
match_results.describe()

Unnamed: 0,home_score,away_score
count,40839.0,40839.0
mean,1.745709,1.188105
std,1.749145,1.40512
min,0.0,0.0
25%,1.0,0.0
50%,1.0,1.0
75%,2.0,2.0
max,31.0,21.0


## Tidying the Dataset

In [16]:
# Checking for duplicate values
def duplicated_data_check(data):
  return data.duplicated().any()

print('Data 1: ', duplicated_data_check(team_ranking))
print('Data 2: ', duplicated_data_check(match_results))

Data 1:  True
Data 2:  False


* The team_ranking dataset has got duplicate values

In [17]:
# Removing the duplicates from team_ranking team
team_ranking.drop_duplicates(keep='first', inplace = True)

In [19]:
# Checking for missing values
def missing_data_check(data):
  return data.isnull().any().sum()

print('Data 1: ', missing_data_check(team_ranking))
print('Data 2: ', missing_data_check(match_results))

Data 1:  0
Data 2:  0


In [50]:
# Creating a new column of WIN, LOSE or DRAW for the match_results data

classification = []
for i in range(len(match_results['home_team'])):
  if match_results['home_score'][i] > match_results['away_score'][i]:
    classification.append('Win')
  elif match_results['home_score'][i] < match_results['away_score'][i]:
    classification.append('Lose')
  else:
    classification.append('Draw')
    
match_results['class'] = classification

In [20]:
# Checking column for both the data sets
print('Data 1: ', team_ranking.columns)
print('\n')
print('Data 2', match_results.columns)

Data 1:  Index(['rank', 'country_full', 'country_abrv', 'total_points',
       'previous_points', 'rank_change', 'cur_year_avg',
       'cur_year_avg_weighted', 'last_year_avg', 'last_year_avg_weighted',
       'two_year_ago_avg', 'two_year_ago_weighted', 'three_year_ago_avg',
       'three_year_ago_weighted', 'confederation', 'rank_date'],
      dtype='object')


Data 2 Index(['date', 'home_team', 'away_team', 'home_score', 'away_score',
       'tournament', 'city', 'country', 'neutral'],
      dtype='object')


* In combining of the two data sets it is important to note that the rank should be picked from time relative to the match

In [22]:
# Renaming the rank_date column in team_ranking dataset as in match_result dataset
#
team_ranking.rank_date = team_ranking.rename({'rank_date': 'date'}, axis = 1, inplace = True)
team_ranking.columns

Index(['rank', 'country_full', 'country_abrv', 'total_points',
       'previous_points', 'rank_change', 'cur_year_avg',
       'cur_year_avg_weighted', 'last_year_avg', 'last_year_avg_weighted',
       'two_year_ago_avg', 'two_year_ago_weighted', 'three_year_ago_avg',
       'three_year_ago_weighted', 'confederation', 'date'],
      dtype='object')

In [25]:
# Comparing date formats
print(list(team_ranking.date[:10]))
print('')
print(list(match_results.date[:10]))

['1993-08-08', '1993-08-08', '1993-08-08', '1993-08-08', '1993-08-08', '1993-08-08', '1993-08-08', '1993-08-08', '1993-08-08', '1993-08-08']

['1872-11-30', '1873-03-08', '1874-03-07', '1875-03-06', '1876-03-04', '1876-03-25', '1877-03-03', '1877-03-05', '1878-03-02', '1878-03-23']


In [35]:
# Using the / instead of - for the dates
team_ranking.date = team_ranking.date.apply(lambda x: x.replace('/', '-'))
match_results.date = match_results.date.apply(lambda x: x.replace('/', '-'))

In [37]:
# Converting the two date columns into datetime variables
team_ranking.date = pd.to_datetime(team_ranking.date)
match_results.date = pd.to_datetime(match_results.date)

In [42]:
# Checking the the date range for match_result data
match_results.date.describe()

count                   40839
unique                  15115
top       2012-02-29 00:00:00
freq                       66
first     1872-11-30 00:00:00
last      2019-07-20 00:00:00
Name: date, dtype: object

In [43]:
# Checking the the date range for match_result data
team_ranking.date.describe()

count                   57756
unique                    286
top       2017-04-06 00:00:00
freq                      211
first     1993-08-08 00:00:00
last      2018-06-07 00:00:00
Name: date, dtype: object

* One data set has years running from 1872 to 2019 while the other one has got years running from 1993 to 2018.
* If these two are to be combined for machile learning model, a lot of missind data will be generated. Therefore, working with data available only is inevitable
* To have full data we will select time period 1993 to 2018 for the match_results data, that will follow later.

In [57]:
# Creating year columns for each of the two above datasets

# team_ranking data 'year' column
team_ranking['year'] = team_ranking.date.dt.year
team_ranking['month'] = team_ranking.date.dt.month

# match_result data 'year' column
match_results['year'] = match_results.date.dt.year
match_results['month'] = match_results.date.dt.month

In [60]:
# Creating home_team information DataFrame
home_team = pd.merge(match_results, team_ranking, left_on=['home_team', 'year', 'month'], 
                     right_on = ['country_full', 'year', 'month'], how = 'inner')

home_team.shape

(18593, 28)

In [61]:
# Creating away_team information DataFrame
away_team = pd.merge(match_results, team_ranking, left_on=['away_team', 'year', 'month'], 
                     right_on = ['country_full', 'year', 'month'], how = 'inner')

away_team.shape

(18502, 28)

In [62]:
# away_team columns
away_team.columns

Index(['date_x', 'home_team', 'away_team', 'home_score', 'away_score',
       'tournament', 'city', 'country', 'neutral', 'class', 'year', 'month',
       'rank', 'country_full', 'country_abrv', 'total_points',
       'previous_points', 'rank_change', 'cur_year_avg',
       'cur_year_avg_weighted', 'last_year_avg', 'last_year_avg_weighted',
       'two_year_ago_avg', 'two_year_ago_weighted', 'three_year_ago_avg',
       'three_year_ago_weighted', 'confederation', 'date_y'],
      dtype='object')

In [69]:
# Renaming some of the away_team data set column
# The at added imply away team
away_team = away_team.rename(columns = {'rank':'at_rank', 'total_points':'at_total_points', 'previous_points':'at_previous_points',
                                        'rank_change':'at_rank_change','cur_year_avg':'at_cur_year_avg', 'cur_year_avg_weighted':'at_cur_year_avg_weighted',
                                        'last_year_avg':'at_last_year_avg','last_year_avg_weighted':'at_last_year_avg_weighted',
                                        'two_year_ago_avg':'at_two_year_ago_avg', 'two_year_ago_weighted':'at_two_year_ago_weighted',
                                        'three_year_ago_avg':'at_three_year_ago_avg', 'three_year_ago_weighted':'at_three_year_ago_weighted',
                                        'confederation':'at_confederation'}, inplace = False)

In [79]:
# Snip view of away data
away_team.head()

Unnamed: 0,date_x,home_team,away_team,home_score,away_score,tournament,city,country,neutral,class,year,month,at_rank,country_full,country_abrv,at_total_points,at_previous_points,at_rank_change,at_cur_year_avg,at_cur_year_avg_weighted,at_last_year_avg,at_last_year_avg_weighted,at_two_year_ago_avg,at_two_year_ago_weighted,at_three_year_ago_avg,at_three_year_ago_weighted,at_confederation,date_y
0,1993-08-01,Colombia,Paraguay,0,0,FIFA World Cup qualification,Barranquilla,Colombia,False,Draw,1993,8,67,Paraguay,PAR,0.0,22,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
1,1993-08-29,Argentina,Paraguay,0,0,FIFA World Cup qualification,Buenos Aires,Argentina,False,Draw,1993,8,67,Paraguay,PAR,0.0,22,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
2,1993-08-01,Peru,Argentina,0,1,FIFA World Cup qualification,Lima,Peru,False,Lose,1993,8,5,Argentina,ARG,0.0,51,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
3,1993-08-08,Paraguay,Argentina,1,3,FIFA World Cup qualification,Asunción,Paraguay,False,Lose,1993,8,5,Argentina,ARG,0.0,51,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
4,1993-08-15,Colombia,Argentina,2,1,FIFA World Cup qualification,Barranquilla,Colombia,False,Win,1993,8,5,Argentina,ARG,0.0,51,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08


In [70]:
# Away team new columns
away_team.columns

Index(['date_x', 'home_team', 'away_team', 'home_score', 'away_score',
       'tournament', 'city', 'country', 'neutral', 'class', 'year', 'month',
       'at_rank', 'country_full', 'country_abrv', 'at_total_points',
       'at_previous_points', 'at_rank_change', 'at_cur_year_avg',
       'at_cur_year_avg_weighted', 'at_last_year_avg',
       'at_last_year_avg_weighted', 'at_two_year_ago_avg',
       'at_two_year_ago_weighted', 'at_three_year_ago_avg',
       'at_three_year_ago_weighted', 'at_confederation', 'date_y'],
      dtype='object')

In [71]:
# Home team columns
home_team.columns

Index(['date_x', 'home_team', 'away_team', 'home_score', 'away_score',
       'tournament', 'city', 'country', 'neutral', 'class', 'year', 'month',
       'rank', 'country_full', 'country_abrv', 'total_points',
       'previous_points', 'rank_change', 'cur_year_avg',
       'cur_year_avg_weighted', 'last_year_avg', 'last_year_avg_weighted',
       'two_year_ago_avg', 'two_year_ago_weighted', 'three_year_ago_avg',
       'three_year_ago_weighted', 'confederation', 'date_y'],
      dtype='object')

In [72]:
# Combining the final DataFrame
final_df = pd.merge(home_team, away_team, left_on=['date_x','home_team','away_team','home_score','away_score','tournament','city','country',
                                                   'neutral','class','year','month'],
                    right_on=['date_x', 'home_team', 'away_team', 'home_score', 'away_score','tournament', 'city', 'country', 'neutral', 'class', 
                              'year', 'month'],
                    how = 'inner')

In [74]:
final_df.head()

Unnamed: 0,date_x,home_team,away_team,home_score,away_score,tournament,city,country,neutral,class,year,month,rank,country_full_x,country_abrv_x,total_points,previous_points,rank_change,cur_year_avg,cur_year_avg_weighted,last_year_avg,last_year_avg_weighted,two_year_ago_avg,two_year_ago_weighted,three_year_ago_avg,three_year_ago_weighted,confederation,date_y_x,at_rank,country_full_y,country_abrv_y,at_total_points,at_previous_points,at_rank_change,at_cur_year_avg,at_cur_year_avg_weighted,at_last_year_avg,at_last_year_avg_weighted,at_two_year_ago_avg,at_two_year_ago_weighted,at_three_year_ago_avg,at_three_year_ago_weighted,at_confederation,date_y_y
0,1993-08-01,Colombia,Paraguay,0,0,FIFA World Cup qualification,Barranquilla,Colombia,False,Draw,1993,8,19,Colombia,COL,0.0,36,16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08,67,Paraguay,PAR,0.0,22,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
1,1993-08-15,Colombia,Argentina,2,1,FIFA World Cup qualification,Barranquilla,Colombia,False,Win,1993,8,19,Colombia,COL,0.0,36,16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08,5,Argentina,ARG,0.0,51,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
2,1993-08-29,Colombia,Peru,4,0,FIFA World Cup qualification,Barranquilla,Colombia,False,Win,1993,8,19,Colombia,COL,0.0,36,16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08,70,Peru,PER,0.0,16,8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
3,1993-08-01,Peru,Argentina,0,1,FIFA World Cup qualification,Lima,Peru,False,Lose,1993,8,70,Peru,PER,0.0,16,8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08,5,Argentina,ARG,0.0,51,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08
4,1993-08-08,Peru,Colombia,0,1,FIFA World Cup qualification,Lima,Peru,False,Lose,1993,8,70,Peru,PER,0.0,16,8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08,19,Colombia,COL,0.0,36,16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,CONMEBOL,1993-08-08


* This final_df has got data on both the away_team and the home_team i.e.

> Rank, Country Abbreviation, Total Points, Previous Points, Rank Change, Average Previous Years Points, Average Previous Years Points Weighted (50%), Average 2 Years Ago Points, Average 2 Years Ago Points Weighted (30%), Average 3 Years Ago Points, Average 3 Years Ago Points Weighted (20%), Confederation

In [77]:
# Dropping unnecessary columns as a result of merging
final_df = final_df.drop(['year','month','country_full_x','country_abrv_x','date_y_x','country_full_y',
                          'country_abrv_y','at_confederation', 'date_y_y'], axis=1)

In [78]:
# shape of our final df
final_df.shape

(16918, 35)

In [51]:
# Viewing the new colum
match_results.head()

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral,class
0,1872-11-30,Scotland,England,0,0,Friendly,Glasgow,Scotland,False,Draw
1,1873-03-08,England,Scotland,4,2,Friendly,London,England,False,Win
2,1874-03-07,Scotland,England,2,1,Friendly,Glasgow,Scotland,False,Win
3,1875-03-06,England,Scotland,2,2,Friendly,London,England,False,Draw
4,1876-03-04,Scotland,England,3,0,Friendly,Glasgow,Scotland,False,Win


## 6. Exploratory Analysis

In [None]:
# Ploting the univariate summaries and recording our observations
#

## 7. Implementing the Solution

In [None]:
# Implementing the Solution
# 

## 8. Challenging the solution

> The easy solution is nice because it is, well, easy, but you should never allow those results to hold the day. You should always be thinking of ways to challenge the results, especially if those results comport with your prior expectation.






In [None]:
# Reviewing the Solution 
#