## Challenge

As mentioned in the Compass, we will be using data from international football (soccer :)) matches that took place between 1872 and 2019 (148 years). You can download the dataset from [**this link**](https://drive.google.com/file/d/1cCn5botBKzh1XZOvrxpcLle-Ua7Fh9BR/view?usp=sharing) and find more information about it on [**Kaggle**](https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017).

We need to make sure we understand all variables and information they store before we start working on the task. It's very important to understand the dataset to create meaningful visualizations.

> #### Instruction
> Use visualizations to answer following questions. Try different Python packages.

In [96]:
import pandas as pd
import numpy as np
import plotly.express as px
from datetime import date
import datetime as dt

In [43]:
df = pd.read_csv("data/results.csv")
df

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1872-11-30,Scotland,England,0,0,Friendly,Glasgow,Scotland,False
1,1873-03-08,England,Scotland,4,2,Friendly,London,England,False
2,1874-03-07,Scotland,England,2,1,Friendly,Glasgow,Scotland,False
3,1875-03-06,England,Scotland,2,2,Friendly,London,England,False
4,1876-03-04,Scotland,England,3,0,Friendly,Glasgow,Scotland,False
...,...,...,...,...,...,...,...,...,...
41581,2020-01-10,Barbados,Canada,1,4,Friendly,Irvine,United States,True
41582,2020-01-12,Kosovo,Sweden,0,1,Friendly,Doha,Qatar,True
41583,2020-01-15,Canada,Iceland,0,1,Friendly,Irvine,United States,True
41584,2020-01-19,El Salvador,Iceland,0,1,Friendly,Carson,United States,True


## Task
Which teams scored the largest number of goals in FIFA World Cup?

In [63]:
## filter for FIFA
fifa = df[df.tournament.str.contains("FIFA")]
fifa['total_score'] = fifa['home_score'] + fifa['away_score']
fifa

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fifa['total_score'] = fifa['home_score'] + fifa['away_score']


Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral,total_score
1304,1930-07-13,Belgium,United States,0,3,FIFA World Cup,Montevideo,Uruguay,True,3
1305,1930-07-13,France,Mexico,4,1,FIFA World Cup,Montevideo,Uruguay,True,5
1306,1930-07-14,Brazil,Yugoslavia,1,2,FIFA World Cup,Montevideo,Uruguay,True,3
1307,1930-07-14,Peru,Romania,1,3,FIFA World Cup,Montevideo,Uruguay,True,4
1308,1930-07-15,Argentina,France,1,0,FIFA World Cup,Montevideo,Uruguay,True,1
...,...,...,...,...,...,...,...,...,...,...
41552,2019-11-19,Kyrgyzstan,Tajikistan,1,1,FIFA World Cup qualification,Bishkek,Kyrgyzstan,False,2
41553,2019-11-19,Vietnam,Thailand,0,0,FIFA World Cup qualification,Hanoi,Vietnam,False,0
41554,2019-11-19,Malaysia,Indonesia,2,0,FIFA World Cup qualification,Kuala Lumpur,Malaysia,False,2
41555,2019-11-19,Turkmenistan,Sri Lanka,2,0,FIFA World Cup qualification,Ashgabat,Turkmenistan,False,2


In [86]:
# sum scores for each team type
home_sum = fifa.groupby(by='home_team')['home_score'].sum().to_frame()
away_sum = fifa.groupby(by='away_team')['away_score'].sum().to_frame()


#concat and clean
fifa_scores = pd.concat([home_sum, away_sum], axis=1)
fifa_scores = fifa_scores.fillna(0)
fifa_scores['home_score'] = fifa_scores['home_score'].astype(int)


# total
fifa_scores['total_score'] = fifa_scores['home_score'] + fifa_scores['away_score']

# sort
fifa_scores = fifa_scores.sort_values(by='total_score', ascending = False)

fifa_graph = fifa_scores[0:20]

# sort again
fifa_graph = fifa_graph.sort_values(by='total_score')


In [87]:
import plotly.graph_objects as go

# fig = px.bar(fifa_graph, x=fifa_graph.index, y='total_score',
#              color='home_score',
#              labels={'pop':'Scores per FIFA game'}, height=400)
# fig.show()

fig = go.Figure(data=[
    go.Bar(name='home_score', x=fifa_graph.index, y=fifa_graph['home_score']),
    go.Bar(name='away_score', x=fifa_graph.index, y=fifa_graph['away_score'])
])
# Change the bar mode
fig.update_layout(
    barmode='stack',
    title='Top 20 teams for FIFA goals'
    )
fig.show()

## Task
What is the number of matches played in each tournament throughout history?

In [100]:
df.head()
df.dtypes

date          object
home_team     object
away_team     object
home_score     int64
away_score     int64
tournament    object
city          object
country       object
neutral         bool
dtype: object

In [105]:
df['date'] = pd.to_datetime(df.date)

In [132]:
# year as index
tournament_year = df.groupby([df.date.dt.year, df.tournament]).count()

# move tournament out of index
tournament_year = tournament_year.reset_index(level='tournament')


In [133]:
fig = px.line(tournament_year, x=tournament_year.index, y="date", color='tournament')
fig.show()

## Task 
Show the trend in number of matches per year.

In [138]:
year = df.groupby(df.date.dt.year).count()

fig = px.line(year, x=year.index, y="city")
fig.show()

In [139]:
df.head()

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,1872-11-30,Scotland,England,0,0,Friendly,Glasgow,Scotland,False
1,1873-03-08,England,Scotland,4,2,Friendly,London,England,False
2,1874-03-07,Scotland,England,2,1,Friendly,Glasgow,Scotland,False
3,1875-03-06,England,Scotland,2,2,Friendly,London,England,False
4,1876-03-04,Scotland,England,3,0,Friendly,Glasgow,Scotland,False


## Task
Which teams are the most successful ones? (winning percentage)

In [157]:
# Check who won
df['winner'] = "0"

# identify winner
df.loc[df.home_score > df.away_score, 'winner'] = df.home_team
df.loc[df.home_score < df.away_score, 'winner'] = df.away_team
df.loc[df.home_score == df.away_score, 'winner'] = "draw"

# count of winners, no draws
team_wins = df[df.winner != 'draw']['winner'].value_counts().to_frame()
team_wins['percentage'] = team_wins['winner'] / team_wins['winner'].sum()
sort_wins = team_wins.sort_values(by='percentage', ascending=False)[0:20]
sort_wins




Unnamed: 0,winner,percentage
Brazil,625,0.019526
England,572,0.017871
Germany,555,0.017339
Argentina,526,0.016433
Sweden,500,0.015621
South Korea,454,0.014184
Mexico,439,0.013715
Hungary,434,0.013559
Italy,423,0.013215
France,417,0.013028


In [158]:

fig = px.bar(sort_wins, x=sort_wins.index, y='percentage')
fig.show()

## Task
Which teams are the least successful ones? (winning percentage)

In [159]:
reverse_wins = team_wins.sort_values(by='percentage')[0:20]
reverse_wins

Unnamed: 0,winner,percentage
Micronesia,1,3.1e-05
Romani people,1,3.1e-05
Raetia,1,3.1e-05
Gozo,1,3.1e-05
Chagos Islands,1,3.1e-05
Two Sicilies,1,3.1e-05
Saarland,1,3.1e-05
San Marino,1,3.1e-05
Surrey,1,3.1e-05
Republic of St. Pauli,1,3.1e-05


In [160]:
fig = px.bar(reverse_wins, x=reverse_wins.index, y='percentage')
fig.show()

## Task
Which months through the history had more matches? Is it June, July, or others? Does the number of matches change from month to month?

In [None]:
# count values of months without year, plot values

## Task
Which teams played against each other the most?

In [None]:
# count occurance of team pairings as unique value

## Task
Apply your creativity to show some additional insights from the data.

## Task (Stretch)
Create these graphs in Tableau as well.