## 2021: Week 13 - Premier League Statistics

Before Simon joined The Data School in the UK, he was a professional sporting performance analyst. Simon has reached into his previous professional life to come up with a football (read soccer) based challenge for this week. 

Simon is channelling his inner fanalyst to use data to understand more about the game that he enjoys. 

This week we want to create a data set that allows us to analyse 'Open Play Goals' scored. We will rank the players overall and by their position. 

### Input

5 csv files, all with a similar structure. There are a lot of columns in these data sets.

![img](https://1.bp.blogspot.com/-LgrltPPfiQI/YGGYbH3V12I/AAAAAAAACIw/9RiC8fLny98MIw2GjjagIg1DpZBfPAp7gCLcBGAsYHQ/w640-h206/Screenshot%2B2021-03-29%2Bat%2B10.05.25.png)

### Requirement
Open play goal scoring prowess in the Premier League 2015-2020
1. Input all the files
2. Remove all goalkeepers from the data set
3. Remove all records where appearances = 0	
4. In this challenge we are interested in the goals scored from open play
    - Create a new “Open Play Goals” field (the goals scored from open play is the number of goals scored that weren’t penalties or freekicks)
    - Note some players will have scored free kicks or penalties with their left or right foot
    - Be careful how Prep handles null fields! (have a look at those penalty and free kick fields) 
    - Rename the original Goals scored field to Total Goals Scored
5. Calculate the totals for each of the key metrics across the whole time period for each player, (be careful not to lose their position)
6. Create an open play goals per appearance field across the whole time period
7. Rank the players for the amount of open play goals scored across the whole time period, we are only interested in the top 20 (including those that are tied for position) – Output 1
8. Rank the players for the amount of open play goals scored across the whole time period by position, we are only interested in the top 20 (including those that are tied for position) – Output 2
9. Output the data – in your solution on twitter / the forums, state the name of the player who was the only non-forward to make it into the overall top 20 for open play goals scored

### Output

- Overall Rank
![img](https://1.bp.blogspot.com/-xc4fVyBVWO0/YGMhwxNq8LI/AAAAAAAACJA/lEK27Th2KfclGSFXaPXihdLtWUOLzuTPQCLcBGAsYHQ/w640-h126/Screenshot%2B2021-03-30%2Bat%2B14.03.36.png)

- Rank by Position
![img](https://1.bp.blogspot.com/-K97KWwGSF6A/YGMiYn0oLoI/AAAAAAAACJI/Ey8XzxlsS8E6YFVa0H7YTFHY_SPpkpLEACLcBGAsYHQ/w640-h122/Screenshot%2B2021-03-30%2Bat%2B14.06.08.png)

Two files:
1. Overall Rank
- 22 Rows (23 including headers)
- 10 Fields:
    - Open Play Goals
    - Goals with Right Foot
    - Goals with Left Foot 
    - Position
    - Appearances
    - Rank
    - Total Goals
    - Open Play Goals / Game
    - Headed Goals
    - Name
2. Rank by Position
    - 65 Rows (66 including headers)
    - 10 Fields : as per the first output file

In [78]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [79]:
pl_15_16 = pd.read_csv("./data/pl_15-16.csv")
pl_16_17 = pd.read_csv("./data/pl_16-17.csv")
pl_17_18 = pd.read_csv("./data/pl_17-18.csv")
pl_18_19 = pd.read_csv("./data/pl_18-19.csv")
pl_19_20 = pd.read_csv("./data/pl_19-20.csv")

In [80]:
df = pd.concat([pl_15_16,
                pl_16_17,
                pl_17_18,
                pl_18_19,
                pl_19_20], axis=0)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4247 entries, 0 to 973
Data columns (total 54 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Season                  4247 non-null   object 
 1   Name                    4247 non-null   object 
 2   Position                4247 non-null   object 
 3   Appearances             4247 non-null   int64  
 4   Clean sheets            1824 non-null   float64
 5   Goals conceded          1824 non-null   float64
 6   Tackles                 3768 non-null   float64
 7   Tackle success %        2883 non-null   object 
 8   Last man tackles        1345 non-null   float64
 9   Blocked shots           3768 non-null   float64
 10  Interceptions           3768 non-null   float64
 11  Clearances              3768 non-null   float64
 12  Headed Clearance        3768 non-null   float64
 13  Clearances off line     1345 non-null   float64
 14  Recoveries              2883 non-null   f

In [81]:
df.columns

Index(['Season', 'Name', 'Position', 'Appearances', 'Clean sheets',
       'Goals conceded', 'Tackles', 'Tackle success %', 'Last man tackles',
       'Blocked shots', 'Interceptions', 'Clearances', 'Headed Clearance',
       'Clearances off line', 'Recoveries', 'Duels won', 'Duels lost',
       'Successful 50/50s', 'Aerial battles won', 'Aerial battles lost',
       'Own goals', 'Errors leading to goal', 'Assists', 'Passes',
       'Passes per match', 'Big chances created', 'Crosses',
       'Cross accuracy %', 'Through balls', 'Accurate long balls',
       'Yellow cards', 'Red cards', 'Fouls', 'Offsides', 'Goals',
       'Headed goals', 'Goals with right foot', 'Goals with left foot',
       'Hit woodwork', 'Goals per match', 'Penalties scored',
       'Freekicks scored', 'Shots', 'Shots on target', 'Shooting accuracy %',
       'Big chances missed', 'Saves', 'Penalties saved', 'Punches',
       'High Claims', 'Catches', 'Sweeper clearances', 'Throw outs',
       'Goal Kicks'],
     

In [82]:
# Remove all goalkeepers from the data set
goalkeepers = df.loc[df["Position"] == "Goalkeeper"].index
df = df.drop(goalkeepers, axis=0)
df.shape

(2428, 54)

In [83]:
# Remove all records where appearances = 0
df = df.loc[df["Appearances"] !=0]
df = df.reset_index(drop=True)

In [84]:
# In this challenge we are interested in the goals scored from open play
# Create a new “Open Play Goals” field (the goals scored from open play is the number of goals scored that weren’t penalties or freekicks)
# Note some players will have scored free kicks or penalties with their left or right foot
# Be careful how Prep handles null fields! (have a look at those penalty and free kick fields) 
# Rename the original Goals scored field to Total Goals Scored
df.columns

Index(['Season', 'Name', 'Position', 'Appearances', 'Clean sheets',
       'Goals conceded', 'Tackles', 'Tackle success %', 'Last man tackles',
       'Blocked shots', 'Interceptions', 'Clearances', 'Headed Clearance',
       'Clearances off line', 'Recoveries', 'Duels won', 'Duels lost',
       'Successful 50/50s', 'Aerial battles won', 'Aerial battles lost',
       'Own goals', 'Errors leading to goal', 'Assists', 'Passes',
       'Passes per match', 'Big chances created', 'Crosses',
       'Cross accuracy %', 'Through balls', 'Accurate long balls',
       'Yellow cards', 'Red cards', 'Fouls', 'Offsides', 'Goals',
       'Headed goals', 'Goals with right foot', 'Goals with left foot',
       'Hit woodwork', 'Goals per match', 'Penalties scored',
       'Freekicks scored', 'Shots', 'Shots on target', 'Shooting accuracy %',
       'Big chances missed', 'Saves', 'Penalties saved', 'Punches',
       'High Claims', 'Catches', 'Sweeper clearances', 'Throw outs',
       'Goal Kicks'],
     

In [45]:
(df["Headed goals"] + df["Goals with right foot"] + df["Goals with left foot"] + df["Own goals"]
 + df[""]

)

Unnamed: 0,Season,Name,Position,Appearances,Clean sheets,Goals conceded,Tackles,Tackle success %,Last man tackles,Blocked shots,...,Saves,Penalties saved,Punches,High Claims,Catches,Sweeper clearances,Throw outs,Goal Kicks,check_0,Open Play Goals
0,2015-16,Rolando Aarons,Midfielder,10,,,13.0,77%,,0.0,...,,,,,,,,,10,1.0
1,2015-16,Almen Abdi,Midfielder,32,,,83.0,78%,,10.0,...,,,,,,,,,6,2.0
2,2015-16,Abdul Rahman Baba,Defender,15,2.0,13.0,47.0,83%,0.0,1.0,...,,,,,,,,,10,0.0
5,2015-16,Charlie Adam,Midfielder,22,,,18.0,78%,,9.0,...,,,,,,,,,8,1.0
8,2015-16,Ibrahim Afellay,Midfielder,31,,,27.0,81%,,10.0,...,,,,,,,,,5,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
969,2019-20,Christoph Zimmermann,Defender,17,3.0,27.0,32.0,66%,1.0,1.0,...,,,,,,,,,13,0.0
970,2019-20,Oleksandr Zinchenko,Defender,19,,,31.0,61%,,7.0,...,,,,,,,,,11,0.0
971,2019-20,Richairo Zivkovic,Forward,5,,,2.0,,,1.0,...,,,,,,,,,18,0.0
972,2019-20,Nabili Zoubdi Touaizi,Forward,0,,,0.0,,,0.0,...,,,,,,,,,25,0.0


In [87]:
df.sort_values(by="Appearances", ascending=False)

Unnamed: 0,Season,Name,Position,Appearances,Clean sheets,Goals conceded,Tackles,Tackle success %,Last man tackles,Blocked shots,...,Shooting accuracy %,Big chances missed,Saves,Penalties saved,Punches,High Claims,Catches,Sweeper clearances,Throw outs,Goal Kicks
895,2018-19,Yannick Bolasie,Forward,119,,,0.0,,,0.0,...,0%,0.0,,,,,,,,
452,2016-17,Dean Marney,Midfielder,96,,,181.0,71%,,28.0,...,20%,0.0,,,,,,,,
1063,2018-19,Alex Pritchard,Midfielder,48,,,59.0,59%,,27.0,...,25%,0.0,,,,,,,,
181,2015-16,Juan Mata,Midfielder,38,,,30.0,73%,,18.0,...,39%,1.0,,,,,,,,
954,2018-19,Ryan Fraser,Midfielder,38,,,19.0,58%,,15.0,...,49%,2.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
677,2017-18,Freddie Ladapo,Forward,1,,,0.0,,,0.0,...,0%,0.0,,,,,,,,
1224,2019-20,Tommy Doyle,Midfielder,1,,,0.0,0%,,0.0,...,100%,0.0,,,,,,,,
239,2015-16,Patrick Roberts,Midfielder,1,,,0.0,0%,,0.0,...,0%,0.0,,,,,,,,
245,2015-16,Jordan Rossiter,Midfielder,1,,,1.0,100%,,0.0,...,0%,0.0,,,,,,,,


In [88]:
df.loc[895]

Season                             2018-19
Name                      Yannick Bolasie 
Position                           Forward
Appearances                            119
Clean sheets                           NaN
Goals conceded                         NaN
Tackles                                0.0
Tackle success %                       NaN
Last man tackles                       NaN
Blocked shots                          0.0
Interceptions                          0.0
Clearances                             0.0
Headed Clearance                       0.0
Clearances off line                    NaN
Recoveries                             NaN
Duels won                              NaN
Duels lost                             NaN
Successful 50/50s                      NaN
Aerial battles won                     NaN
Aerial battles lost                    NaN
Own goals                              NaN
Errors leading to goal                 NaN
Assists                                  0
Passes     

In [42]:
df["Goals conceded"]

0       NaN
1       NaN
2      13.0
5       NaN
8       NaN
       ... 
969    27.0
970     NaN
971     NaN
972     NaN
973    37.0
Name: Goals conceded, Length: 2428, dtype: float64