Predict who will win the FIFA World Cup 2018
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Project Description


  • Prediction of the winner of an international matches Prediction results are "Win / Lose / Draw" or "goal difference"
  • Apply the model to predict the result of FIFA world cup 2018.

Data: Data are assembled from multiple sources, most of them are from Kaggle, others come from FIFA website / EA games.

Feature Engineering: To determine who will more likely to win a match, based on my knowledge, I come up with 4 main groups of features as follows:

  1. head-to-head match history between 2 teams
  2. recent performance of each team (10 recent matches), aka "form"
  3. bet-ratio before matches
  4. squad strength (from FIFA video game)

Feature list reflects those factors.



Check the Full Report to gain more insight about this Project. The report contains:

  • Exploratory Data Analysis: Investigate correlations, importance of features to results, hypothesis interesting
  • Methodology: How I carried out this project, which experiments I did.
  • Models: baseline model, logistic regression, random forest, gradient boosting tree, ADA boost tree, Neural Network.
  • Evaluation Criteria: F1, 10-fold cross validation accuracy
  • Results and Conclusion

Project Structure

  1. EDA: Data Exploratory Analysis
  2. LE: saved model for Label Encoder
  3. data: completed dataset
  4. save_model: saved Machine Learning model after training


Data Source

The dataset are from all international matches from 2000 - 2018, results, bet odds, ranking, squad strengths

  1. FIFA World Cup 2018
  2. International match 1872 - 2018
  3. FIFA Ranking through Time
  4. Bet Odd
  5. Bet Odd 2
  6. Squad Strength - Sofia
  7. Squad Strength - FIFA index

Feature List

  • *difference: team1 - team2
  • *form: performance in 10 recent matches
Feature Name Description Source
team_1 Nation Code (e.g US, NZ) 1 & 2
team_2 Nation Code (e.g US, NZ) 1 & 2
date Date of match yyyy - mm - dd 1 & 2
tournament Friendly,EURO, AFC, FIFA WC 1 & 2
h_win_diff Head2Head: win difference 2
h_draw Head2Head: number of draw 2
form_diff_goalF Form: difference in "Goal For" 2
form_diff_goalA Form: difference in "Goal Against" 2
form_diff_win Form: difference in number of win 2
form_diff_draw Form: difference in number of draw 2
odd_diff_win Betting Odd: difference bet rate for win 4 & 5
odd_draw Betting Odd: bet rate for draw 4 & 5
game_diff_rank Squad Strength: difference in FIFA Rank 3
game_diff_ovr Squad Strength: difference in Overall Strength 6
game_diff_attk Squad Strength: difference in Attack Strength 6
game_diff_mid Squad Strength: difference in Midfield Strength 6
game_diff_def Squad Strength: difference in Defense Strength 6
game_diff_prestige Squad Strength: difference in prestige 6
game_diff_age11 Squad Strength: difference in age of 11 starting players 6
game_diff_ageAll Squad Strength: difference in age of all players 6
game_diff_bup_speed Squad Strength: difference in Build Up Play Speed 6
game_diff_bup_pass Squad Strength: difference in Build Up Play Passing 6
game_diff_cc_pass Squad Strength: difference in Chance Creation Passing 6
game_diff_cc_cross Squad Strength: difference in Chance Creation Crossing 6
game_diff_cc_shoot Squad Strength: difference in Chance Creation Shooting 6
game_diff_def_press Squad Strength: difference in Defense Pressure 6
game_diff_def_aggr Squad Strength: difference in Defense Aggression 6
game_diff_def_teamwidth Squad Strength: difference in Defense Team Width 6

How to Run:



  1. A machine learning framework for sport result prediction
  2. t-test definition
  3. Confusion Matrix Multi-Label example
  4. Precision-Recall Multi-Label example
  5. ROC curve example
  6. Model evaluation
  7. Tuning the hyper-parameters of an estimator
  8. Validation curves
  9. Understand Bet odd format
  10. EURO 2016 bet odd

Task List


  • Add prediction for Matchday 2
  • Add feature Importance
  • Add feature of squad and player info
  • Build a web crawler for Squad each team
  • Build a web crawler for FIFA game player
  • Add a simple classification based on "bet odd".
  • Add feature group 1
    • Add h_win_diff, h_draw
    • Add rank_diff, title_diff
  • Add features group 2
  • Add features group 3
  • Simple EDA and a small story
  • Add features group 4
  • Prepare framework for running classifiers
  • Add evaluation metrics and plot
    • Add accuracy, precision, recall, F1
    • Add ROC curves
  • Build a data without player rating and squad value
  • Generate data and preform prediction for EURO 2016, ok now my story is more interesting
  • Create more data, "teamA vs teamB -> win" is equivalent to "teamB vs teamA -> lose"