# Football Transfer Analysis

In this notebook I did a quick analysis for the top 250 football transfer from the year 2000 to 2018 using pandas and matplotlib.

You can also find here a visualization in Tableau [here](https://public.tableau.com/profile/mauricio3833#!/vizhome/TopFootballTransfers2000-2018/Historia1). 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import squarify

%matplotlib inline

In [None]:
df = pd.read_csv("../input/top-250-football-transfers-from-2000-to-2018/top250-00-19.csv")

### Clean data

This dataset is really clean, there is only on column with null values, the 'Market_value' with more than a thousand null values. I decide to not use this column so I droped it.

I also change some names for the 'Position' values, because there were some values with different names for the same position, and there also were two different formats in the name (using spaces, and using "-").

In [None]:
df.info()

In [None]:
df.drop('Market_value', axis=1, inplace=True)

In [None]:
df['Position'].unique()

In [None]:
df['Position'] = df['Position'].replace(['Sweeper', 'Defender'], 'Defensive Midfield')
df['Position'] = df['Position'].replace(['Forward'], 'Centre Forward')
df['Position'] = df['Position'].replace(['Midfielder'], 'Defensive Midfield')
df['Position'] = df['Position'].replace(['Centre-Forward'], 'Centre Forward')
df['Position'] = df['Position'].replace(['Centre-Back'], 'Centre Back')
df['Position'] = df['Position'].replace(['Left-Back'], 'Left Back')
df['Position'] = df['Position'].replace(['Right-Back'], 'Right Back')

I decided to change the 'Spurs' for Tottenham becuase the first one is an alias.

In [None]:
df['Team_to'] = df['Team_to'].replace(['Spurs'], 'Tottenham')

## Visualization

First we found which is the most valuable player position with a treemap using [Squarify](https://github.com/laserson/squarify) library.

In [None]:
position = df['Position'].value_counts().rename_axis('Position').reset_index(name='counts')

sizes = position['counts']
names = position['Position']
color = ['b','m','g','y','c','r','w']
plt.figure(figsize = (12,8), dpi=80)

squarify.plot(sizes = sizes, label = names, alpha = 0.5, color = color)
plt.axis('off')
plt.title('Treemap by Player Positions', fontsize = 16)
plt.show()

Now that we know that most wanted players are the Centre Forward, lets see the top 10 transfers and which league spend more money.

In [None]:
df.sort_values('Transfer_fee', ascending = False).head(10)

In [None]:
grouped_league_to = df.groupby('League_to')['Transfer_fee'].aggregate([min,np.mean,max,sum]).sort_values('sum', ascending = False).reset_index().head(10)
grouped_league_to

In [None]:
sns.set_style("darkgrid")


fig, ax = plt.subplots(figsize = (10,5))
(grouped_league_to['sum']/1000000).sort_values(ascending = False).plot(kind = 'bar', color = 'magenta')
plt.xticks(range(len(grouped_league_to['League_to'])), grouped_league_to['League_to'], rotation = 45)
plt.ylabel('Million EUR', fontsize = 12)
plt.title('Total Transfer Fee', fontsize = 14)
           
plt.show()

We found that the Premier League spend the most money in all the time from 2000 to 2018, so lets see the behavior in time of this league.

In [None]:
premier = df['League_to'] == 'Premier League'
df_premier = df[premier]
df_premier.sort_values('Transfer_fee', ascending = False).head()

In [None]:
grouped_df_premier = df_premier.groupby('Season')['Transfer_fee'].aggregate(['sum']).reset_index()

fig, ax = plt.subplots(figsize = (15,7))
(grouped_df_premier['sum']/1000000).plot(color = 'red')
(grouped_df_premier['sum']/1000000).plot(kind = 'bar', color = 'yellow', alpha = 0.5)
plt.xticks(range(len(grouped_df_premier['Season'])),grouped_df_premier['Season'], rotation = 45)
plt.ylabel('Million EUR', fontsize = 12)
plt.xlabel('Season', fontsize = 12)
plt.title('Premier League Market Behavior', fontsize = 14)

plt.show()

Finally lets take a look to the top 5 teams in the Premier Legue according to the transfer fees in time.

In [None]:
top_5_teams = df_premier.groupby('Team_to')['Transfer_fee'].aggregate('sum').sort_values(ascending = False).reset_index().head(5)
top_5_teams

In [None]:
fig, ax = plt.subplots(figsize = (10,7))

(top_5_teams['Transfer_fee']/10000).plot(kind = 'bar', color = 'purple', alpha = 0.8)
plt.xticks(range(len(top_5_teams['Team_to'])), top_5_teams['Team_to'], rotation = 'horizontal')
plt.ylabel('Million EUR', fontsize = 12)
plt.title('Total transfer over seasons by top five teams in the premier league', fontsize = 14)

plt.show()

In [None]:
fig, ax = plt.subplots(figsize = (15,7))

for team in range(len(list(top_5_teams['Team_to']))):
    top_team = df['Team_to'] == top_5_teams['Team_to'][team]
    df_top_team = df[top_team]
    grouped_df_top_team = df_top_team.groupby('Season')['Transfer_fee'].aggregate(['sum']).reset_index()
    (grouped_df_top_team['sum']/1000000).plot()
    plt.legend(list(top_5_teams['Team_to']))

plt.xticks(range(len(grouped_df_premier['Season'])),grouped_df_premier['Season'], rotation = 45)
plt.ylabel('Million EUR', fontsize = 12)
plt.xlabel('Season', fontsize = 12)
plt.title('Top 5 Premier League Buyers over seasons', fontsize = 14)    
    
plt.show()