No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
BestClassifier.pkl
BestClassifier.py
Classifier.py
Classifier_aditya.py
Classifier_top_features.py
LR_team_win_stats.png
README
RF_team_win_stats.png
Results
SVM_team_win_stats.png
TestBestClassifier.py
checkclass.py
confusion_matrix_RF.png
confusion_matrix_linearSVC.png
confusion_matrix_logreg.png
confusion_matrix_svm.png
corr.py
corr.txt
correlation.txt
correlation_big.png
correlation_small.png
dataframe.p
dataframeTestData.p
epl0001.csv
epl0102.csv
epl0203.csv
epl0304.csv
epl0405.csv
epl0506.csv
epl0607.csv
epl0708.csv
epl0809.csv
epl0910.csv
epl1011.csv
epl1112.csv
epl1213.csv
epl1314.csv
epl1415.csv
epl1516.csv
epl1617.csv
epl9697.csv
epl9798.csv
epl9899.csv
epl9900.csv
epl_08-09.txt
epl_09-10.txt
epl_10-11.txt
epl_11-12.txt
epl_12-13.txt
epl_13-14.txt
epl_14-15.txt
epl_15-16.txt
featureCalculation.py
feature_big.png
feature_selection.py
feature_small.png
features.txt
featurextraction.py
figure_linsvc_1.png
figure_linsvc_2.png
figure_lr_1.png
figure_lr_2.png
figure_rf_1.png
figure_rf_2.png
figure_svm_1.png
figure_svm_2.png
linearSVC_team_win_stats.png
notes
players.csv
plots.py
premier-league-2012.csv
premier-league-2013.csv
premier-league-2014.csv
premier-league-2015.csv
premier-league-2016.csv
premier-league-2017.csv
team_stats.py
testFeatureXtraction.py
topnteams.txt

README

BestClassifier.py is the main file that contains the best performing classifier tuned using hyperparameter optimization. The classifier object is stored in BestClassifier.pkl and can be loaded using the following set of instructions:
####################################################################################################################
from sklearn.externals import joblib

classifier_object = joblib.load('BestClassifier.pkl')
#above classifier can now be used on any test data as follows:
classifier_object.predict(X_test_data)
####################################################################################################################

For convenience we have also dataframeTestData.p pickle file that contains the pickled dataframe object of the test data we used. It can be processed to create X_test_data and X_test_target as follows:

####################################################################################################################
dataframe1 = pd.read_pickle('dataframeTestData.p')
X_test_data = []
X_test_target = []
hometeam = list(dataframe1['HomeTeam'])
awayteam = list(dataframe1['AwayTeam'])
rating_home = list(dataframe1['rating_home'])
rating_away = list(dataframe1['rating_away'])
_5HW = list(dataframe1['5HW'])
_5HL = list(dataframe1['5HL'])
_5HD = list(dataframe1['5HD'])
_5AW = list(dataframe1['5AW'])
_5AL = list(dataframe1['5AL'])
_5AD = list(dataframe1['5AD'])
_5FTHGS = list(dataframe1['5FTHGS'])
_5FTHGC = list(dataframe1['5FTHGC'])
_5FTAGS = list(dataframe1['5FTAGS'])
_5FTAGC = list(dataframe1['5FTAGC'])
_5HST = list(dataframe1['5HST'])
_5AST = list(dataframe1['5AST'])
_5HC = list(dataframe1['5HC'])
_5AC = list(dataframe1['5AC'])
_5HR = list(dataframe1['5HR'])
_5AR = list(dataframe1['5AR'])
_H2H5FTHGS = list(dataframe1['H2H5FTHGS'])
_H2H5FTAGS = list(dataframe1['H2H5FTAGS'])
_H2H5FTHGC = list(dataframe1['H2H5FTHGC'])
_H2H5FTAGC = list(dataframe1['H2H5FTAGC'])
_H2H5HST = list(dataframe1['H2H5HST'])
_H2H5AST = list(dataframe1['H2H5AST'])
_H2H5HC = list(dataframe1['H2H5HC'])
_H2H5AC = list(dataframe1['H2H5AC'])
_H2H5HR = list(dataframe1['H2H5HR'])
_H2H5AR = list(dataframe1['H2H5AR'])
teams = list(set(dataframe1['HomeTeam']) | set(dataframe1['AwayTeam']))
for item in list(dataframe1['FTR']):
	if item == 'H': #home
		X_test_target.append(1)
	elif item == 'A': #away
		X_test_target.append(2)
	else: #draw
		X_test_target.append(3)


for index in range(len(dataframe1)):
	X_test_data.append([rating_home[index],rating_away[index],_5HW[index],_5HL[index],_5HD[index],_5AW[index],_5AL[index],_5AD[index],_5FTHGS[index],_5FTHGC[index],_5FTAGS[index],_5FTAGC[index],_5HST[index],_5AST[index],_5HC[index],_5AC[index],_5HR[index],_5AR[index],_H2H5FTHGS[index],_H2H5FTAGS[index],_H2H5FTHGC[index],_H2H5FTAGC[index],_H2H5HST[index],_H2H5AST[index],_H2H5HC[index],_H2H5AC[index],_H2H5HR[index],_H2H5AR[index]])
####################################################################################################################
Above code can be found in TestBestClassifier.py and can be run with "python3 TestBestClassifier.py" to obtain accuracy score of classifier on test data.


Classifier.py is the file that takes as input the pickled dataframe objects of training set and test set, processes it and fits the data to each classifier. Each classifier object is passed to GridSearchCV with a range of different hyperparameter values and it returns the best possible classifier out of those values. It can be run simply by:
"python3 Classifier.py".