# Which is the Best International Football team ?
Hi !
From this question, I have decided to explore some datasets and try to find some interesting statistics/rankings for the best (and worst) international teams: 
* [**International football results from 1872 to 2019**](https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017), by *Mart Jürisoo*
* [**FIFA World Ranking 1992-2021**](https://www.kaggle.com/cashncarry/fifaworldranking), by *Alex*
* [**Country Code**](https://www.kaggle.com/koki25ando/country-code), by *Koki Ando*

As a reminder, the current FIFA World Champion is **France**, and the country which has most won this trophy is **Brazil**.

I'm a football fan but I will try to give observations and comments as impartial as possible. 

This notebook is updated as of July, 9th 2021 (waiting for the Euro and Copa america winner !)

Feel free to leave a comment or upvote if you liked this kernel ! :)

### Table of Contents
1. [Data Preprocessing](#data-preprocessing)
2. [Ranking 0 - Official FIFA Ranking](#ranking-0)
3. [Ranking I - Unofficial World Champion](#ranking-1)
4. [Ranking II - *Joga Bonito* Champion](#ranking-2)
5. [Ranking III - ELO Ranking](#ranking-3)
6. [Some interesting facts](#facts)

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
init_notebook_mode(connected = True)
import plotly.figure_factory as ff
import plotly.express as px

from datetime import datetime

## Data Preprocessing
<a id = '#data-preprocessing'></a>

In [None]:
code_df = pd.read_csv('../input/country-code/country_code.csv',  usecols=['Country_name', 'code_3digit'])
code_df.head()

In [None]:
games = pd.read_csv('../input/international-football-results-from-1872-to-2017/results.csv')
#Drop some NA games, still to be played!
games.dropna(inplace = True)
print("Number of Null values : {}".format(games.isnull().sum().sum()))
#Nice

In [None]:
games['Winner'] = np.where((games['home_score'] > games['away_score']), games['home_team'], np.where((games['home_score'] < games['away_score']), games['away_team'], 'Draw'))
games['Loser'] = np.where((games['home_score'] > games['away_score']), games['away_team'], np.where((games['home_score'] < games['away_score']), games['home_team'], 'Draw'))

games['Year'] = games['date'].apply(lambda date : int(date.split('-')[0]))
games['Month'] = games['date'].apply(lambda date : int(date.split('-')[1]))
games['Day'] = games['date'].apply(lambda date : int(date.split('-')[2]))
games['Date'] = games['date'].apply(lambda date : pd.to_datetime(date))

In [None]:
games.tail()

## Ranking 0 - Official FIFA Ranking
<a id = '#ranking-0'></a>

This ranking is regularly criticized as biased and less accurate than ELO ratings (see ranking III), but gives us already an idea of the strength of each team.

Unfortunately, I can't find data on Kaggle for FIFA Rankings after 2018, but as of May 2021, the top 5 teams are **Belgium** (1783 points), **France** (1757 points), **Brazil** (1743 points), **England** (1687 points) and **Portugal** (1666 points).
[Source](https://www.fifa.com/fifa-world-ranking/ranking-table/men/)

In [None]:
fifa_ranking = pd.read_csv('../input/fifaworldranking/fifa_ranking-2021-05-27.csv')
fifa_ranking = fifa_ranking[fifa_ranking['rank_date'] >= '1999-12-22']

In [None]:
ranking_df = pd.pivot_table(data = fifa_ranking, 
                            values = 'total_points',
                            index = 'country_full',
                            columns = 'rank_date').fillna(0.0)
ranking_df.head()

Let's visualize the evolution of the rankings for the first 10 Teams in the 21st Century:

In [None]:
best_ranks = ranking_df.loc[ranking_df['2021-05-27'].sort_values(ascending = False)[:10].index]
fig = go.Figure()

for i in range(len(best_ranks.values)):
    fig.add_trace(go.Scatter(x = best_ranks.columns, 
                             y = best_ranks.iloc[i],
                             name = best_ranks.index[i]))
    
fig.update_layout(
    title="Evolution of the FIFA Ranking for today\'s 10 best teams",
    yaxis_title="Points"
)
fig.show()

The peaks in 2006 and 2018 are not a mistake in the data, and may be due to the rankings evaluation method, or may be due to the World Cups that happened at the same time ... Let's note how Belgium became more and more strong years after years.

Now, let's code and compare this ranking to some home-made ones !

## Ranking I - Unofficial World Champion
<a id = '#ranking-1'></a>
The First [Unofficial World Champion](https://en.wikipedia.org/wiki/Unofficial_Football_World_Championships) was England, who won a football game for the first time on the 08th of August, 1873. The team which beat it first was then Unofficial World Champion, and so on ...

In [None]:
real_world_champion = games.iloc[1]['Winner'] # Game 0 is a draw
day_one = games.iloc[1]['Date'] # Game 0 is a draw
champions = [real_world_champion]
dates = [day_one]
champions_time = {}

for i in range(len(games)):
    if games.iloc[i].Loser == real_world_champion:
        if real_world_champion in champions_time:
            champions_time[real_world_champion] += (games.loc[i, 'Date'] - dates[-1]).days
        else:
            champions_time[real_world_champion] = (games.loc[i, 'Date'] - dates[-1]).days
        real_world_champion = games.loc[i, 'Winner']
        champions.append(real_world_champion)
        dates.append(games.loc[i, 'Date'])
champions_time[real_world_champion] += (datetime.now() - dates[-1]).days
countries_df = pd.DataFrame.from_dict(champions_time, orient = 'index', columns=['Days champion'])

print(" First unofficial world champion : {} \n Current unofficial world champion : {}".format(champions[0], real_world_champion))

Congrats to **Italy** which is, as of July 2021, the Unofficial World Champion ! 

Let's see what we can learn from this ranking. Here I decided to create the '*United Kingdom*' team which is the mean of the 4 national teams (England, Northern Ireland, Scotland and Wales) results, in order to plot it on the Plotly map. I have also made a few tweaks in the countries names to make them all plottable

In [None]:
def return_country_code(con):
    if con in code_df['Country_name'].values:
        return code_df[code_df['Country_name'] == con]['code_3digit'].values[0]
    elif con == 'United States':
        return code_df[code_df['Country_name'] == 'United States of America']['code_3digit'].values[0]
    elif con == 'Russia': 
        return code_df[code_df['Country_name'] == 'Russian Federation']['code_3digit'].values[0]
    elif con == 'South Korea': 
        return code_df[code_df['Country_name'] == 'Korea (South)']['code_3digit'].values[0]
    elif con == 'Republic of Ireland': 
        return code_df[code_df['Country_name'] == 'Ireland']['code_3digit'].values[0]
    elif con == 'North Korea': 
        return code_df[code_df['Country_name'] == 'Korea (North)']['code_3digit'].values[0]
    elif con == 'Venezuela': 
        return code_df[code_df['Country_name'] == 'Venezuela (Bolivarian Republic)']['code_3digit'].values[0]
    elif con == 'China PR': 
        return code_df[code_df['Country_name'] == 'China']['code_3digit'].values[0]

countries_df['Country'] = countries_df.index
countries_df['Code'] = countries_df['Country'].apply(return_country_code)
uk_df = pd.DataFrame([[(countries_df.loc['England']['Days champion'] +  countries_df.loc['Scotland']['Days champion'] + countries_df.loc['Northern Ireland']['Days champion'] + countries_df.loc['Wales']['Days champion']) / 4,
                          'United Kingdom',
                           code_df[code_df['Country_name'] == 'United Kingdom']['code_3digit'].values[0]
                         ]], 
                         index = ['United Kingdom'], 
                         columns = ['Days champion', "Country", 'Code'])
final_uc_df = countries_df.append(uk_df).dropna().sort_values('Days champion', ascending = False)
final_uc_df.head()

In [None]:
data=dict(
    type = 'choropleth',
    locations = final_uc_df['Code'],
    z = final_uc_df['Days champion'],
    text = final_uc_df['Country'],
    colorscale = 'YlOrRd',
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = 'Number of days being unofficial Champions',
)

layout = dict(title_text='The Longest Unofficial World Champions',
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='equirectangular'
    ))

fig = go.Figure(data = [data], layout = layout)
iplot(fig)

In [None]:
top_uc_df = countries_df.sort_values('Days champion', ascending = False)[:15]
plt.figure(figsize = (14,4))
plt.title('The Longest Unofficial World Champions')
sns.barplot(x=top_uc_df['Country'], y=top_uc_df['Days champion'], palette="vlag")

We see the domination of **United Kingdom** - and particularly **Scotland**, which is historically the *longest Unofficial Champion* - with its four teams being in the top 5 of the longest champions!

Portugal has only 398 days of domination, France 1093 and Brazil 1466.

Here, I had difficulties to plot a timeline representing the order of the Unofficial World Champions. In the following Gantt chart, we can however see the variety of teams which were unofficial champions, and how quick the title can be given from one team to another : indeed, this title is given as soon as the unofficial world champion is defeated!

In [None]:
df_timeline = []
for i in range(len(champions) - 1):
    df_timeline.append(dict(Task = champions[i], Start=dates[i], Finish=dates[i + 1]))
df_timeline.append(dict(Task = champions[-1], Start=dates[-1], Finish=datetime.now()))
fig = ff.create_gantt(df_timeline, group_tasks=True, title='Timeline of the Unofficial World Champions')
fig.show()

See how the United Kingdom teams have dominated the first 60 years of this sport? Also, congratulations to **Belarus** with the longest domination, **4434** days, or more than 12 years!

In [None]:
games[(games['Winner'] == 'Belarus') & (games['date'] >= '1967-06-04') & (games['date'] <= '1979-07-25')]

Oh ...

They have only played 3 games during this period ! Obviously, this is the main disadvantage of such a ranking, not mentioning its "simplicity" ...

## Ranking II - *Joga Bonito* Champion
<a id='ranking-2'></a>
For this ranking, I will see which teams have better stats, only taking into account the scores of each game.

In [None]:
def make_stats_df(df):
    my_columns =  ['Wins', 'Draws', 'Loses', 'Total games', 'Goals scored', 'Goals taken', 
                   'Goals difference', 'World Cup games', 'World Cup wins']
    data_df = pd.DataFrame(0, index = df['home_team'].append(df['away_team']).unique(), columns = my_columns)
    
    for i in range(len(df)):
        if df.iloc[i]['Winner'] == 'Draw':
            data_df.loc[df.iloc[i]['home_team']]['Draws'] += 1
            data_df.loc[df.iloc[i]['away_team']]['Draws'] += 1
        else:
            data_df.loc[df.iloc[i]['Winner']]['Wins'] += 1
            data_df.loc[df.iloc[i]['Loser']]['Loses'] += 1
        if df.iloc[i]['tournament'] == 'FIFA World Cup':
            data_df.loc[df.iloc[i]['home_team']]['World Cup games'] += 1
            data_df.loc[df.iloc[i]['away_team']]['World Cup games'] += 1
            if df.iloc[i]['Winner'] != 'Draw':
                data_df.loc[df.iloc[i]['Winner']]['World Cup wins'] += 1
        data_df.loc[df.iloc[i]['home_team']]['Goals scored'] += df.iloc[i]['home_score']
        data_df.loc[df.iloc[i]['home_team']]['Goals taken'] += df.iloc[i]['away_score']
        data_df.loc[df.iloc[i]['away_team']]['Goals scored'] += df.iloc[i]['away_score']
        data_df.loc[df.iloc[i]['away_team']]['Goals taken'] += df.iloc[i]['home_score']

    data_df['Total games'] = data_df['Wins'] + data_df['Draws'] + data_df['Loses']
    data_df['Goals difference'] = data_df['Goals scored'] - data_df['Goals taken']
    data_df['Winning rate'] = data_df['Wins'] / data_df['Total games']
    data_df['Goals per game'] = data_df['Goals scored'] / data_df['Total games']
    data_df['Average difference'] = data_df['Goals difference'] / data_df['Total games']
    data_df['WC Winning rate'] = data_df['World Cup wins'] / data_df['World Cup games']
    return data_df

In [None]:
data_df = make_stats_df(games)

plt.figure(figsize = (12,8))
plt.subplot(311)
plt.title('International Teams by Winning Rate')
sns.barplot(x = data_df[data_df['Total games'] >= 100].sort_values('Winning rate', ascending = False).head(10).index, y = data_df[data_df['Total games'] >= 50].sort_values('Winning rate', ascending = False).head(10)['Winning rate'], palette="vlag")
plt.subplot(312)
plt.title('International Teams by Total Average Difference')
sns.barplot(x = data_df[data_df['Total games'] >= 100].sort_values('Average difference', ascending = False).head(10).index, y = data_df[data_df['Total games'] >= 50].sort_values('Average difference', ascending = False).head(10)['Average difference'], palette="vlag")
plt.subplot(313)
plt.title('International Teams by Winning Rate at the FIFA World Cup')
sns.barplot(x = data_df[data_df['World Cup games'] >= 10].sort_values('WC Winning rate', ascending = False).head(10).index, y = data_df[data_df['World Cup games'] >= 10].sort_values('WC Winning rate', ascending = False).head(10)['WC Winning rate'], palette="vlag")
plt.tight_layout()

The last ranking seems to be the more accurate one, but the fact that it only relies on World Cup Games is a potential bias, some teams are excluded, others have only played a small number of games ...

Here are two other rankings, only taking into account games of *modern football*. I chose the 7th July, 1957 to be the first day of this era, which corresponds to the first international game of Pelé with Brazil. I also excluded friendly games, as teams could sometimes make less effort or try original strategies there.

In [None]:
modern_data = make_stats_df(games[(games['Date'] >= pd.to_datetime('07-07-1957')) & (games['tournament'] != 'Friendly')])

plt.figure(figsize = (12,8))
plt.subplot(211)
plt.title('International Teams by Winning Rate since 1957')
sns.barplot(x = modern_data[modern_data['Total games'] >= 100].sort_values('Winning rate', ascending = False).head(10).index, y = modern_data[modern_data['Total games'] >= 50].sort_values('Winning rate', ascending = False).head(10)['Winning rate'], palette="vlag")
plt.subplot(212)
plt.title('International Teams by Goal Average since 1957')
sns.barplot(x = modern_data[modern_data['Total games'] >= 100].sort_values('Average difference', ascending = False).head(10).index, y = modern_data[modern_data['Total games'] >= 50].sort_values('Average difference', ascending = False).head(10)['Average difference'], palette="vlag")

We observe some teams that we aren't used to see in international top rankings (New Caledonia, Iran, Fiji). We could deduce that the regional games have a huge impact on that ranking, but not taking them into account would eliminate some teams which don't play international tournaments. Thus, this ranking should be interpreted with caution. We can however see some clear indicators of the dominance of some teams, such as Germany, Spain or Brazil.

## Ranking III - ELO Ranking
<a id='ranking-3'></a>
[ELO Ranking](https://en.wikipedia.org/wiki/World_Football_Elo_Ratings) is based on the ELO Rating system and could lead to more accurate rankings. 

To compute it, I've separated international games by competitions, distinguishing the major ones and their qualification tournaments from the other games. I'm not sure of the calculations, and I think some data is missing in the dataset (particularly the Olympic Games, which is a major competition), thus leading to some differences with the Wikipedia ranking, but the rankings I have obtained look reliable.

In [None]:
all_teams = games['home_team'].append(games['away_team']).unique()
elo_df = pd.DataFrame(0, index = all_teams, columns= range(1870,2020))

major_comp = ['UEFA Euro', 'African Cup of Nations', 'Copa América', 'AFC Asian Cup', 'UEFA Nations League',
              'Confederations Cup', 'African Nations Championship', 'CONCACAF Championship', 'Gold Cup',
             'Pan American Championship', 'Pacific Games', 'Oceania Nations Cup']
qualif = ['Copa América qualification', 'AFC Asian Cup qualification', 'UEFA Euro qualification', 
          'African Cup of Nations qualification', 'FIFA World Cup qualification', 'CONCACAF Championship qualification',
          'Gold Cup qualification', 'Oceania Nations Cup qualification']

def def_k(comp):
    if comp == 'FIFA World Cup':
        return 60
    elif comp in major_comp:
        return 50
    elif comp in qualif:
        return 40
    elif comp == 'Friendly':
        return 20
    else:
        return 30

def def_g(team_goals, enemy_goals):
    if team_goals - enemy_goals <= 1:
        return 1
    elif team_goals - enemy_goals == 2:
        return 3/2
    elif team_goals - enemy_goals == 3:
        return 7/4
    else:
        return 7/4 + (team_goals - enemy_goals - 3)/8

def def_w(team, winner):
    if team == winner:
        return 1
    elif winner == 'Draw':
        return 1/2
    else:
        return 0
    
def def_dr(team_elo, enemy_elo, neutral):
    if neutral:
        return team_elo - enemy_elo
    else: 
        return team_elo - enemy_elo + 100

for year in range(1871, 2020):
    elo_df[year] = elo_df[year - 1]
    for game in games[games['Year'] == year].values:
        game_series = pd.Series(game, index = games.columns)
        elo_df.loc[game_series['home_team'], year] += def_k(game_series['tournament']) * def_g(game_series['home_score'], game_series['away_score']) * (def_w(game_series['home_team'], game_series['Winner']) - 1/(10 **(- def_dr(elo_df.loc[game_series['home_team']][year], elo_df.loc[game_series['away_team']][year], game_series['neutral']) / 400) + 1))
        elo_df.loc[game_series['away_team'], year] += def_k(game_series['tournament']) * def_g(game_series['away_score'], game_series['home_score']) * (def_w(game_series['away_team'], game_series['Winner']) - 1/(10 **(- def_dr(elo_df.loc[game_series['away_team']][year], elo_df.loc[game_series['home_team']][year], True) / 400) + 1))

In [None]:
plt.figure(figsize=(12,4))
plt.title('Years being #1 ELO team')
plt.bar(x = elo_df.idxmax().value_counts().index, height = elo_df.idxmax().value_counts().values)
plt.tight_layout()

In [None]:
best_elos = elo_df.loc[elo_df.idxmax().unique()]
fig = go.Figure(layout = dict(title='ELO Ranking of the best international teams'))

for i in range(len(best_elos.values)):
    fig.add_trace(go.Scatter(x = best_elos.columns, 
                             y = best_elos.iloc[i],
                             name = best_elos.index[i]))
fig.show()

Here, I have plotted every team which had once the best ELO score. More than with other rankings, we see how England/Scotland dominated the first half of football history, and Brazil the second one.

As a conclusion for those rankings,  the ELO one is the most adapted to determine the real level of a team. The statistics-based ranking gives us some good indicators, but it should be interpreted with caution, and the Unofficial World Champion one is more fun than accurate.

In the modern era (1957-Today), it is clear that Brazil has dominated the football world, from this analysis it is the best international football team. Some other teams can also be cited because of their high rankings, such as Germany, Spain or France.

# Some interesting facts
<a id = 'facts'></a>
Here are some additional fun facts that I've extracted from the international games dataset.

Let's start with the teams having the lowest winning rate:

In [None]:
data_df[data_df['Total games'] >= 150].sort_values('Winning rate').head(10)

In [None]:
# The only win of San Marino, out of 167 games !
games[games['Winner'] == 'San Marino']

In [None]:
# The game with the highest number of goals
games[(games['home_score'] + games['away_score']) == (games['home_score'] + games['away_score']).max()]

For the next DataFrame, I was wondering which country were "worst enemies", ie. the country that has most beaten another one.

In [None]:
all_teams = games['Loser'].unique() #Only teams which have lost a game
enemys_df = pd.DataFrame('', index = all_teams[1:], columns = ['Worst enemy'])
for country in all_teams[1:]: #We don't take 'Draw'
    enemys_df.loc[country]['Worst enemy'] = games[games['Loser'] == country]['Winner'].value_counts().index[0]
enemys_df['Worst enemy'].value_counts().head(10)

If we exclude Padania (North of Italy), which doesn't correspond to a country, South Korea team is the worst enemy of 8 other countries ! Let's see which is the worst enemy of the teams that once were #1 in the ELO ranking:

In [None]:
enemys_df.loc[best_elos.index]

We can see that some "hate" between teams is here highlighted, such as the one between France and Belgium, or between Brazil and Argentina!

To conclude this study, let's look at the longest winning streaks for each team in the dataset, starting with strict winning streaks (ie no draws):

In [None]:
#We exclude teams that haven't win a single game
win_streaks = pd.DataFrame('', index = games['Winner'].unique()[1:], columns = ['Longest streak', 'Start of the streak', 'End of the streak', 'End of streak opponent'])
for team in games['Winner'].unique()[1:]:
    team_games = games[(games['home_team'] == team) | (games['away_team'] == team)]
    team_games['won'] = (team_games['Winner'] == team).apply(int)
    team_games['series'] = (team_games['won'] != team_games['won'].shift()).cumsum()
    team_games['streak'] = team_games.groupby(['won', 'series']).cumcount() + 1
    team_games.loc[team_games['won'] == 0, 'streak'] = 0
    #Find longest streak
    win_streaks.loc[team, 'Longest streak'] = team_games['streak'].max()
    last_win = team_games.loc[team_games['streak'].idxmax()]
    win_streaks.loc[team, 'Start of the streak'] = str(team_games.loc[(team_games['series'] == last_win['series']) & (team_games['streak'] == 1),'Date'].values[0])[:10]
    if team_games.loc[(team_games['series'] == last_win['series'] + 1) & (team_games['streak'] == 0),'Date'].values.size == 0:
        win_streaks.loc[team, 'End of the streak'] = 'Currently on streak'
        win_streaks.loc[team, 'End of streak opponent'] = 'NA'
    else:
        win_streaks.loc[team, 'End of the streak'] = str(team_games.loc[(team_games['series'] == last_win['series'] + 1) & (team_games['streak'] == 0),'Date'].values[0])[:10]
        win_streaks.loc[team, 'End of streak opponent'] = team_games.loc[(team_games['series'] == last_win['series'] + 1) & (team_games['streak'] == 0),'Winner'].values[0]

In [None]:
win_streaks.sort_values(by= 'Longest streak', ascending = False).head(10)

Congratulations to Mauritius, with 17 wins in a row from 1947 to 1955. Let's also note the impressive series of Spain, France and Brazil, at a time when each of this teams dominated the worldwide football. At a local level, Padania (North of Italy) also had a very long streak from 2008 to 2014.

However, this table doesn't seem to show which were the best teams, having local teams (Padania, Guyana) and teams winning a lot of game at a regional scale (Mauritius, Australia, ...). Let's now look at the invincibility streaks, to see if we can get some interesting insights:

In [None]:
invincibility_streaks = pd.DataFrame('', index = games['Winner'].unique()[1:], columns = ['Longest streak', 'Start of the streak', 'End of the streak', 'End of streak opponent'])
for team in games['Winner'].unique()[1:]:
    team_games = games[(games['home_team'] == team) | (games['away_team'] == team)]
    #Only the following condition changes
    team_games['won'] = (team_games['Loser'] != team).apply(int)
    team_games['series'] = (team_games['won'] != team_games['won'].shift()).cumsum()
    team_games['streak'] = team_games.groupby(['won', 'series']).cumcount() + 1
    team_games.loc[team_games['won'] == 0, 'streak'] = 0
    #Find longest streak
    invincibility_streaks.loc[team, 'Longest streak'] = team_games['streak'].max()
    last_win = team_games.loc[team_games['streak'].idxmax()]
    invincibility_streaks.loc[team, 'Start of the streak'] = str(team_games.loc[(team_games['series'] == last_win['series']) & (team_games['streak'] == 1),'Date'].values[0])[:10]
    if team_games.loc[(team_games['series'] == last_win['series'] + 1) & (team_games['streak'] == 0),'Date'].values.size == 0:
        invincibility_streaks.loc[team, 'End of the streak'] = 'Currently on streak'
        invincibility_streaks.loc[team, 'End of streak opponent'] = 'NA'
    else:
        invincibility_streaks.loc[team, 'End of the streak'] = str(team_games.loc[(team_games['series'] == last_win['series'] + 1) & (team_games['streak'] == 0),'Date'].values[0])[:10]
        invincibility_streaks.loc[team, 'End of streak opponent'] = team_games.loc[(team_games['series'] == last_win['series'] + 1) & (team_games['streak'] == 0),'Winner'].values[0]

In [None]:
invincibility_streaks.sort_values(by= 'Longest streak', ascending = False).head(10)

This ranking represents with a higher accuracy the domination periods of the worldwide, or regional football. We see how Spain has dominated football in the late 2000's (let's note that a European Championship took place during this streak!), and Brazil in the mid 90's (winning a World Cup during this streak). We can eventually note that the streak killer team is rarely a top team, but rather an intermediate one. 

**Italy might break the record if it doesn't lose its 3 next games (which include the Euro final!)**

We can show these streaks on a map:

In [None]:
invincibility_streaks['Country'] = invincibility_streaks.index
invincibility_streaks['Code'] = invincibility_streaks['Country'].apply(return_country_code)
invincibility_streaks.head()

In [None]:
invincibility_streaks['text'] = invincibility_streaks['Start of the streak'] + ' - ' + invincibility_streaks['End of the streak']
data=dict(
    type = 'choropleth',
    locations = invincibility_streaks['Code'],
    z = invincibility_streaks['Longest streak'],
    text = invincibility_streaks['text'],
    colorscale = 'YlOrRd',
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = '#games during longest streak',
)

layout = dict(title_text='The Longest Streaks of Invincibility per country',
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='equirectangular'
    ))

fig = go.Figure(data = [data], layout = layout)
iplot(fig)

That's it for this notebook, I plan to add new graphs and stats as soon as I have more ideas !!