# Analysis of Minutes Played in Premier League by Country

I want to understand which countries contribute the most minutes played in the top soccer leagues in the world: The English Premier League, The Spanish La Liga, The German Bundesiga, The Italian Serie A, and The French Ligue 1. These five leagues are universally accepted as the pinnacle in the sport of Football. Football is the world's game and each country brings a unique cultural approach to playing the game. Brazilian players are known to play with flair, German players are said to play with efficiency, and Spanish players are small and technical. Are certain countries producing players that contribute more to the sporting success of top division clubs than others?

I've chosen minutes played as the marker of sporting success because when a coach picks a player for their squad it means they believe this player is contributing to the team's success. Other statistics, goals, assists, etc. are not evenly distributed across positions, keepers are lucky to score a single goal during the span of their career while top strikers are regularly expected to score 20+ in a season. 

There are a few issues I anticipate in this analysis:

1) In the English League, I expect English players will contribute the most minutes each year. Same for Germans in the German league..etc. Each league has a limit on foreign players, in England specifically, foreign players coming to the U.K without a visa must have been capped for a national team. It is much more difficult for a foreign player to come overseas and find success in the EU because, due to the foreign player roster spot caps, they must be better value than the local talent available. Additionally, clubs recruit local talent to play in their youth academies. These kids spend years developing at the club and once they graduate, whether or not they prove to be massive successes, they make up the bulk of the players in the league of their home country.

2) Population size vs minutes played. You would expect a larger country like Germany to produce more top level players than a smaller country like the Netherlands. However, this would not necessarily mean German players are superior. How many players per capita are playing in top divisions?



In [1]:
import pandas as pd
import plotly
import plotly.graph_objs as go
import plotly.express as ps
import json

import plotly.offline as offline
from plotly.graph_objs import *
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)

In [2]:
seasons = pd.read_csv('Mins_EPL_1992-2019.csv')

In [3]:
seasons.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1640 entries, 0 to 1639
Data columns (total 6 columns):
Rk           1640 non-null int64
Nation       1640 non-null object
# Players    1640 non-null int64
Min          1627 non-null float64
List         1640 non-null object
Year         1640 non-null object
dtypes: float64(1), int64(2), object(3)
memory usage: 77.0+ KB


In [4]:
seasons['Min'].value_counts()

3420.0     5
90.0       5
360.0      4
900.0      4
79.0       3
          ..
49835.0    1
475.0      1
4707.0     1
1404.0     1
8112.0     1
Name: Min, Length: 1505, dtype: int64

In [5]:
seasons['Min'] = seasons['Min'].fillna(0) # fill Na values under the minutes column with zeros

In [6]:
seasons = seasons.drop("List", axis=1) # drop list of player names unecessary for analysis

In [7]:
seasons

Unnamed: 0,Rk,Nation,# Players,Min,Year
0,1,eng England,188,120738.0,2019-2020
1,2,fr France,31,24181.0,2019-2020
2,3,es Spain,29,24379.0,2019-2020
3,4,ie Ireland,23,11952.0,2019-2020
4,5,br Brazil,21,19652.0,2019-2020
...,...,...,...,...,...
1635,24,il Israel,1,1404.0,1992-1993
1636,25,cz Czech Republic,1,1288.0,1992-1993
1637,26,kn Saint Kitts and Nevis,1,412.0,1992-1993
1638,27,ng Nigeria,1,111.0,1992-1993


In [8]:
def clean_nations(s):  # function to remove abbreviated nation name in front of Nation
    names = s.str.split(' ')
    counter = 0
    for i in names:
        del i[0]
        i = ' '.join(i)
        names[counter]=i
        counter+=1
    return names


seasons['Nation']=clean_nations(seasons['Nation'])

In [9]:
no_ussr = seasons['Nation'].replace('Commonwealth of Independent States', "Russia")
seasons['Nation'].update(no_ussr)

In [10]:
seasons.rename({'Year': 'Season'}, axis=1, inplace=True) # rename Year column to Season

In [11]:
def season_to_year(series):          #function to turn values in season column stored as object in yyyy-yyyy format
    no_dash = series.str.split('-')  #to int in yyyy format using first year of season. Useful for graphing later on.
    counter = 0
    for i in no_dash:
        i = int(i[0])
        no_dash[counter]=i
        counter+=1
    return no_dash
                           
seasons['Year']=season_to_year(seasons['Season'])
seasons['Year']

0       2019
1       2019
2       2019
3       2019
4       2019
        ... 
1635    1992
1636    1992
1637    1992
1638    1992
1639    1992
Name: Year, Length: 1640, dtype: object

In [12]:
seasons = seasons[seasons['Season'] != '2019-2020'] #remove current season from dataset since it is still in progress
seasons = seasons.reset_index()
seasons

Unnamed: 0,index,Rk,Nation,# Players,Min,Season,Year
0,61,1,England,210,223314.0,2018-2019,2018
1,62,2,Spain,35,51487.0,2018-2019,2018
2,63,3,France,30,55568.0,2018-2019,2018
3,64,4,Brazil,19,35874.0,2018-2019,2018
4,65,5,Belgium,19,24522.0,2018-2019,2018
...,...,...,...,...,...,...,...
1574,1635,24,Israel,1,1404.0,1992-1993,1992
1575,1636,25,Czech Republic,1,1288.0,1992-1993,1992
1576,1637,26,Saint Kitts and Nevis,1,412.0,1992-1993,1992
1577,1638,27,Nigeria,1,111.0,1992-1993,1992


# Assumption #1 Proven True

In [13]:
seasons[['Min', 'Nation', 'Season']].sort_values(by='Min', ascending=False).head(50)

Unnamed: 0,Min,Nation,Season
1492,390191.0,England,1994-1995
1551,387505.0,England,1992-1993
1524,373875.0,England,1993-1994
1455,337577.0,England,1995-1996
1418,282784.0,England,1996-1997
1318,282555.0,England,1998-1999
1202,274381.0,England,2000-2001
693,271026.0,England,2008-2009
1373,270282.0,England,1997-1998
481,269031.0,England,2011-2012


In every season that this dataset covers, English players have contributed the most minutes to the Premier League. Not only are they on top, but the other countries are not even close. Further inspection demonstrates that the majority of the minutes played in the EPL outside of England are from other constituencies in the U.K (Scotland & Wales) and Ireland.

# Examining the 27 Year Minutes Played Trends of English Players in the EPL

<h4>Hypothesis:</h4> The minutes of English players in the EPL have decreased in recent years due to the influx of money into the game, making it easier for clubs to spend money on talent from across the globe as opposed to developing local players in house.

In [14]:
english = seasons[seasons['Nation']=='England']
english

Unnamed: 0,index,Rk,Nation,# Players,Min,Season,Year
0,61,1,England,210,223314.0,2018-2019,2018
67,128,1,England,219,247898.0,2017-2018,2017
136,197,1,England,236,241227.0,2016-2017,2016
202,263,1,England,233,223826.0,2015-2016,2015
270,331,1,England,229,261026.0,2014-2015,2014
336,397,1,England,167,240792.0,2013-2014,2013
406,467,1,England,175,234366.0,2012-2013,2012
481,542,1,England,188,269031.0,2011-2012,2011
553,614,1,England,188,259327.0,2010-2011,2010
621,682,1,England,174,244423.0,2009-2010,2009


In [15]:
fig = ps.scatter(english, x="Year", y="Min",
           hover_name="Season", log_x=True)
fig.update_layout(
    title="Minutes Played Each Year by English Players in EPL")
fig.show()

Indeed, it appears that the number of minutes played by English players in the EPL is on a downward trend, though not as drastic as I expected it would be. However, the Barclays Premier League was founded in 1992, replacing the previous iteration and focusing more on growing its business and increasing revenue. This increased spending power, strengthened by a TV deal, could explains the sharp decrease that took place between 1994-1996. Afterwards, the slope is more gradual but it is still apparent that the minutes contributions of English born players is on the downswing.

I am curious to find out whether the trend is also visible in the other top four leagues in Europe.

# Mapping Countries Minutes Contributions From 1992-2019

In [16]:
UK = ['England', 'Scotland', 'Ireland', 'Wales','Northern Ireland']
non_english = seasons[~seasons['Nation'].isin(UK)]
non_english

Unnamed: 0,index,Rk,Nation,# Players,Min,Season,Year
1,62,2,Spain,35,51487.0,2018-2019,2018
2,63,3,France,30,55568.0,2018-2019,2018
3,64,4,Brazil,19,35874.0,2018-2019,2018
4,65,5,Belgium,19,24522.0,2018-2019,2018
6,67,7,Netherlands,18,24636.0,2018-2019,2018
...,...,...,...,...,...,...,...
1574,1635,24,Israel,1,1404.0,1992-1993,1992
1575,1636,25,Czech Republic,1,1288.0,1992-1993,1992
1576,1637,26,Saint Kitts and Nevis,1,412.0,1992-1993,1992
1577,1638,27,Nigeria,1,111.0,1992-1993,1992


I've elected to exclude players from the UK in this Analysis as we know they are an outlier.

In [21]:
with open('worldmap.json') as worldmap:
    worldmap = json.load(worldmap)

change_map = {}

for country in worldmap["features"]: #get country codes from JSON file
    change_map[country["properties"]["name"]] = country["id"]

codes = non_english['Nation'].map(change_map) #map country codes into new column in dataframe


non_english.loc[:,'Country_Codes'] = codes
non_english = non_english.sort_values(by='Year')

In [18]:
non_english[non_english['Country_Codes'].isnull()].sort_values(by='Min', ascending=False).head(60)

Unnamed: 0,index,Rk,Nation,# Players,Min,Season,Year,Country_Codes
843,904,7,United States,11,22332.0,2006-2007,2006,
774,835,8,United States,12,21711.0,2007-2008,2007,
561,622,9,United States,10,16230.0,2010-2011,2010,
630,691,10,United States,9,15970.0,2009-2010,2009,
492,553,12,United States,8,14813.0,2011-2012,2011,
418,479,13,United States,8,13343.0,2012-2013,2012,
1043,1104,17,United States,7,13053.0,2003-2004,2003,
352,413,17,United States,7,12227.0,2013-2014,2013,
223,284,22,Serbia,5,12011.0,2015-2016,2015,
286,347,17,Serbia,6,11351.0,2014-2015,2014,


In [25]:
iso3_codes = {'United States':'USA','Serbia':'SRB','Congo DR':'COD','Congo':'COG',
              'Barbados':'BRB','China PR':'CHN', 'Montserrat':'MSR','Curaçao':'CUW',
              'Grenada': 'GRD', 'Antigua and Barbuda':'ATG'}

for key, value in iso3_codes.items():
    iso3 = non_english[non_english['Nation'] == key]
    iso3 = iso3.fillna(value)
    non_english.update(iso3)

non_english[non_english['Country_Codes'].isnull()].sort_values(by='Min', ascending=False).head(60)

Unnamed: 0,index,Rk,Nation,# Players,Min,Season,Year,Country_Codes
823,884.0,57.0,Malta,1.0,1930.0,2007-2008,2007,
1248,1309.0,47.0,Bermuda,1.0,1747.0,2000-2001,2000,
1136,1197.0,51.0,Bermuda,1.0,1315.0,2002-2003,2002,
1410,1471.0,38.0,North Macedonia,1.0,1144.0,1997-1998,1997,
1520,1581.0,29.0,Saint Kitts and Nevis,1.0,1126.0,1994-1995,1994,
900,961.0,64.0,Malta,1.0,900.0,2006-2007,2006,
470,531.0,65.0,North Macedonia,1.0,890.0,2012-2013,2012,
1254,1315.0,53.0,Yugoslavia,1.0,761.0,2000-2001,2000,
1576,1637.0,26.0,Saint Kitts and Nevis,1.0,412.0,1992-1993,1992,
762,823.0,70.0,Malta,1.0,260.0,2008-2009,2008,


In [26]:
fig = ps.choropleth(non_english, locations="Country_Codes", color="Min", hover_name="Min", animation_frame="Year")
fig.update_layout(
    title="Mapping Minutes Played Each Year by Country in the EPL")
fig.show()