Predict who will win the FIFA World Cup 2018
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
EDA
LE
data
pic
report
save_model
url_list
web_crawler
.gitattributes
.gitignore
README.md
experiment1-W-D-L.py
experiment2-GoalDiff.py
experiment3-WorldCup.py
library-requirement.txt
pre_processingData.ipynb
report.md

README.md

Project Description

Objective:

  • Prediction of the winner of an international matches Prediction results are "Win / Lose / Draw" or "goal difference"
  • Apply the model to predict the result of FIFA world cup 2018.

Data: Data are assembled from multiple sources, most of them are from Kaggle, others come from FIFA website / EA games.

Feature Engineering: To determine who will more likely to win a match, based on my knowledge, I come up with 4 main groups of features as follows:

  1. head-to-head match history between 2 teams
  2. recent performance of each team (10 recent matches), aka "form"
  3. bet-ratio before matches
  4. squad strength (from FIFA video game)

Feature list reflects those factors.

Lifecycle

Report

Check the Full Report to gain more insight about this Project. The report contains:

  • Exploratory Data Analysis: Investigate correlations, importance of features to results, hypothesis interesting
  • Methodology: How I carried out this project, which experiments I did.
  • Models: baseline model, logistic regression, random forest, gradient boosting tree, ADA boost tree, Neural Network.
  • Evaluation Criteria: F1, 10-fold cross validation accuracy
  • Results and Conclusion

Project Structure

  1. EDA: Data Exploratory Analysis
  2. LE: saved model for Label Encoder
  3. data: completed dataset
  4. save_model: saved Machine Learning model after training

Data

Data Source

The dataset are from all international matches from 2000 - 2018, results, bet odds, ranking, squad strengths

  1. FIFA World Cup 2018
  2. International match 1872 - 2018
  3. FIFA Ranking through Time
  4. Bet Odd
  5. Bet Odd 2
  6. Squad Strength - Sofia
  7. Squad Strength - FIFA index

Feature List

  • *difference: team1 - team2
  • *form: performance in 10 recent matches
Feature Name Description Source
team_1 Nation Code (e.g US, NZ) 1 & 2
team_2 Nation Code (e.g US, NZ) 1 & 2
date Date of match yyyy - mm - dd 1 & 2
tournament Friendly,EURO, AFC, FIFA WC 1 & 2
h_win_diff Head2Head: win difference 2
h_draw Head2Head: number of draw 2
form_diff_goalF Form: difference in "Goal For" 2
form_diff_goalA Form: difference in "Goal Against" 2
form_diff_win Form: difference in number of win 2
form_diff_draw Form: difference in number of draw 2
odd_diff_win Betting Odd: difference bet rate for win 4 & 5
odd_draw Betting Odd: bet rate for draw 4 & 5
game_diff_rank Squad Strength: difference in FIFA Rank 3
game_diff_ovr Squad Strength: difference in Overall Strength 6
game_diff_attk Squad Strength: difference in Attack Strength 6
game_diff_mid Squad Strength: difference in Midfield Strength 6
game_diff_def Squad Strength: difference in Defense Strength 6
game_diff_prestige Squad Strength: difference in prestige 6
game_diff_age11 Squad Strength: difference in age of 11 starting players 6
game_diff_ageAll Squad Strength: difference in age of all players 6
game_diff_bup_speed Squad Strength: difference in Build Up Play Speed 6
game_diff_bup_pass Squad Strength: difference in Build Up Play Passing 6
game_diff_cc_pass Squad Strength: difference in Chance Creation Passing 6
game_diff_cc_cross Squad Strength: difference in Chance Creation Crossing 6
game_diff_cc_shoot Squad Strength: difference in Chance Creation Shooting 6
game_diff_def_press Squad Strength: difference in Defense Pressure 6
game_diff_def_aggr Squad Strength: difference in Defense Aggression 6
game_diff_def_teamwidth Squad Strength: difference in Defense Team Width 6

How to Run:

python experiment1-W-D-L.py
python experiment2-GoalDiff.py
python experiment3-WorldCup.py

Reference

  1. A machine learning framework for sport result prediction
  2. t-test definition
  3. Confusion Matrix Multi-Label example
  4. Precision-Recall Multi-Label example
  5. ROC curve example
  6. Model evaluation
  7. Tuning the hyper-parameters of an estimator
  8. Validation curves
  9. Understand Bet odd format
  10. EURO 2016 bet odd

Task List

Complete

  • Add prediction for Matchday 2
  • Add feature Importance
  • Add feature of squad and player info
  • Build a web crawler for Squad each team
  • Build a web crawler for FIFA game player
  • Add a simple classification based on "bet odd".
  • Add feature group 1
    • Add h_win_diff, h_draw
    • Add rank_diff, title_diff
  • Add features group 2
  • Add features group 3
  • Simple EDA and a small story
  • Add features group 4
  • Prepare framework for running classifiers
  • Add evaluation metrics and plot
    • Add accuracy, precision, recall, F1
    • Add ROC curves
  • Build a data without player rating and squad value
  • Generate data and preform prediction for EURO 2016, ok now my story is more interesting
  • Create more data, "teamA vs teamB -> win" is equivalent to "teamB vs teamA -> lose"