# Chapter 4: Road to the Finals!

The finals for the World Cup this year is on Dec 18. With the finals quickly approaching, people are placing their bets on who might take home the trophy. But you don't want to bet blindly so here we're going to develop a method to calculate the odds of a country winning dependent on which round of the tournament they are in and their historical record.

## Data used for calculations

We are going to use data provided here: https://github.com/jfjelstul/worldcup

The data that we are going to use specifically is the matches.csv dataset which contains information about all the World Cup matches played since 1930 including information on which stage of the tournament and who won. 

In [1]:
#Import pandas and altair
import pandas as pd
import altair as alt

In [21]:
#Import matches.csv data from worldcup database on github
matches = pd.read_csv('https://raw.githubusercontent.com/jfjelstul/worldcup/master/data-csv/matches.csv')
matches = matches.drop(columns = ['key_id', 'tournament_id', 'match_id', 'match_name', 'group_name','group_stage', 'knockout_stage', 'replayed','replay', 'match_date', 'match_time', 'stadium_id', 'stadium_name', 'city_name', 'country_name', 'home_team_id', 'home_team_code', 'away_team_id','away_team_code', 'score', 'home_team_score_margin', 'away_team_score_margin', 'extra_time', 'penalty_shootout', 'score_penalties', 'home_team_score_penalties','away_team_score_penalties', 'draw'])
matches.head(10)

Unnamed: 0,tournament_name,stage_name,home_team_name,away_team_name,home_team_score,away_team_score,result,home_team_win,away_team_win
0,1930 FIFA World Cup,group stage,France,Mexico,4,1,home team win,1,0
1,1930 FIFA World Cup,group stage,United States,Belgium,3,0,home team win,1,0
2,1930 FIFA World Cup,group stage,Yugoslavia,Brazil,2,1,home team win,1,0
3,1930 FIFA World Cup,group stage,Romania,Peru,3,1,home team win,1,0
4,1930 FIFA World Cup,group stage,Argentina,France,1,0,home team win,1,0
5,1930 FIFA World Cup,group stage,Chile,Mexico,3,0,home team win,1,0
6,1930 FIFA World Cup,group stage,Yugoslavia,Bolivia,4,0,home team win,1,0
7,1930 FIFA World Cup,group stage,United States,Paraguay,3,0,home team win,1,0
8,1930 FIFA World Cup,group stage,Uruguay,Peru,1,0,home team win,1,0
9,1930 FIFA World Cup,group stage,Chile,France,1,0,home team win,1,0


## Calculating historical success rate
We want to use the historical record available in the matches dataset to calculate the likelihood of winning depending on which stage of the tournament the country is playing in.

Let's try to figure how what the chances are that Argentina ({numref}`argentina`) wins one of their group stage matches based on their historical record. 

```{figure} https://upload.wikimedia.org/wikipedia/commons/1/1a/Flag_of_Argentina.svg
---
:label: argentina
:height: 100px
---
The flag of Argentina.
```

To calculate this we will need to using the following formula:

```{math}
:label:hist_win
WinOdds = \frac {AwayWins+HomeWins}{TotalGames}
```

In [23]:
#Divide the data into only group stage matches based on Argentina as the home or away team
group_only = matches[matches['stage_name'] == 'group stage']
arg_home = group_only[group_only['home_team_name'] == 'Argentina']
arg_away = group_only[group_only['away_team_name'] == 'Argentina']

After subdividing the dataset, we can use {eq}`hist_win` using the following codes:

```
total_home = arg_home['home_team_win'].sum()
total_away = arg_away['away_team_win'].sum()
total_arg_win = total_home + total_away
arg_win_rate = (total_arg_win/(arg_home.shape[0] + arg_away.shape[0]))*100
arg_win_rate
```

In [36]:
total_home = arg_home['home_team_win'].sum()
total_away = arg_away['away_team_win'].sum()
total_arg_win = total_home + total_away
arg_win_rate = (total_arg_win/(arg_home.shape[0] + arg_away.shape[0]))*100
arg_win_rate

62.5

Let's create a function that automates the process by taking in a which stage of the tournament we are betting on and which country. The code for the function should:

1. parse the dataset down to only the stage of interest and only the country of interest
2. calculate the total wins from the country in that stage
3. based on the total games played, determine the chances of winning

```
formula
```

In [37]:
def chance_win(country,stage):
    stage_only = matches[matches['stage_name'] == stage]
    home = group_only[group_only['home_team_name'] == country]
    away = group_only[group_only['away_team_name'] == country]
    win_rate = (home['home_team_win'].sum() + away['away_team_win'].sum()/(arg_home.shape[0] + arg_away.shape[0]))*100
    return win_rate