# English Premier League(2020-21)
**Statistics of EPL 2020-21 season Players**<br><br>

Dataset Author: Rajat Chaudhari<br>
Dataset Source: https://www.kaggle.com/rajatrc1705/english-premier-league202021<br><br>

### Context:
This dataset is a collection of basic but crucial stats of the English Premier League 2020-21 season. The dataset has all the players that played in the EPL and their standard stats such as Goals, Assists, xG, xA, Passes Attempted, Pass Accuracy and more! Do upvote if you like it!

### Content:
**Position:**	Each player has a certain position, in which he plays regularly. The position in this dataset are, FW - Forward, MF - Midfield, DF - Defensive, GK - Goalkeeper<br>
**Starts:**	The number of times the player was named in the starting 11 by the manager.<br>
**Mins:**	The number of minutes played by the player.<br>
**Goals:**	The number of Goals scored by the player.<br>
**Assists:**	The number of times the player has assisted other player in scoring the goal.<br>
**Passes_Attempted:**	The number of passes attempted by the player.<br>
**PercPassesCompleted:**	The number of passes that the player accurately passed to his teammate.<br>
**xG:**	Expected number of goals from the player in a match.<br>
**xA:**	Expected number of assists from the player in a match.<br>
**Yellow_Cards:**	The players get a yellow card from the referee for indiscipline, technical fouls, or other minor fouls.<br>
**Red Cards:**	The players get a red card for accumulating 2 yellow cards in a single game, or for a major foul.<br>

### Objectives:

1. Find out which team has the most aggressive players: red + yellow cards
1. Which nation had the most aggressive players?
1. Which team had more players in the top 10 most assists
1. Who were the players with most attempted passes
1. Which players had the most accurate passes: all players and players by position
1. Top 10 goal scorers: all players and players by position 
1. Top 10 assists: all players and players by position 
1. Top 10 Players due to Ages
1. Show all teams with average age
1. Nationalities of the League
1. Correlation Graph

atasaygin points (https://www.kaggle.com/atasaygin/premier-league-player-analysis)


In [None]:
import pandas as pd

In [None]:
import matplotlib.pyplot as plt
# import squarify # pip install squarify

In [None]:
df = pd.read_csv('archive.zip')
df.head()

In [None]:
df.info()

We have 18 columns and 532 records without null values

### Find out which team has the most aggressive players.
Red and yellow cards

In [None]:
df.columns

In [None]:
df_1 = df[['Club','Yellow_Cards','Red_Cards']].groupby('Club').sum().sort_values(by=['Red_Cards','Yellow_Cards'],ascending=False)

In [None]:
df_1.head()

In [None]:
countries = df_1.index.tolist()
red_cards = df_1['Red_Cards'].tolist()
yellow_cards = df_1['Yellow_Cards'].tolist()
width = 0.75

fig, ax = plt.subplots(figsize=(16, 8))

ax.bar(countries, yellow_cards, width, label='Yellow Cards', color='yellow')
ax.bar(countries, red_cards, width, bottom=yellow_cards, label='Red Cards', color='red')

ax.set_ylabel('Amount of Cards')
ax.set_title('Number of Yellow and Read Cards in Premier League (2020/21)')
plt.xticks(countries, rotation=45)

for index, data in enumerate(red_cards):
    plt.text(x=index , y=data + yellow_cards[index] + 1 , s=f"{data}" , fontdict=dict(fontsize=14), horizontalalignment='center')

for index, data in enumerate(yellow_cards):
    plt.text(x=index , y=20 , s=f"{data}" , fontdict=dict(fontsize=14), horizontalalignment='center')
    
ax.legend()

plt.tight_layout()
plt.show()

I have sorted the chart descending for red cards. Brighton had the biggest amount of red cards in Premier League. However, Sheffield United had the biggest amount of yellow cards and 4 red card what gave the highest total number of cards. 

---
All red cards by player:

In [None]:
df.loc[df.Red_Cards>0,['Name','Club','Yellow_Cards','Red_Cards']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Red_Cards','Yellow_Cards'],ascending=False)

Player names and club they played with cards they got

In [None]:
cards_per_min = df[['Name','Club','Yellow_Cards','Red_Cards','Mins']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Red_Cards','Yellow_Cards'],ascending=False)
cards_per_min['Total_Cards'] = cards_per_min.Yellow_Cards + cards_per_min.Red_Cards
cards_per_min['Mins_Card'] = round(cards_per_min.Mins / cards_per_min.Total_Cards)
cards_per_min.head(15).sort_values(by='Mins_Card',ascending=True)

Lamela got a card in every 102 minute

In [None]:
df_2 = df[['Nationality','Yellow_Cards','Red_Cards']].groupby('Nationality').sum().sort_values(by=['Red_Cards','Yellow_Cards'],ascending=False).head(15)
df_2

In [None]:
countries = df_2.index.tolist()
red_cards = df_2['Red_Cards'].tolist()
yellow_cards = df_2['Yellow_Cards'].tolist()
width = 0.75

fig, ax = plt.subplots(figsize=(16, 8))

ax.bar(countries, yellow_cards, width, label='Yellow Cards', color='yellow')
ax.bar(countries, red_cards, width, bottom=yellow_cards, label='Red Cards', color='red')

ax.set_ylabel('Amount of Cards')
ax.set_title('Number of Yellow and Read Cards in Premier League (2020/21)')
plt.xticks(countries, rotation=0)

ax.legend()

plt.tight_layout()
plt.show()

As we can see above majority of cards come from English players. But lets check how many players do we have per each nationality and then check the result taking the number into consideration

In [None]:
cards_per_nation = df[['Name','Nationality','Yellow_Cards','Red_Cards']].groupby('Nationality').agg({'Yellow_Cards':'sum','Red_Cards':'sum','Name':'count'})
cards_per_nation['Total_Cards'] = cards_per_nation.Yellow_Cards + cards_per_nation.Red_Cards
cards_per_nation.rename(columns={'Name':'#_Players'},inplace=True)
cards_per_nation['Cards_per_Player'] = cards_per_nation['Total_Cards'] / cards_per_nation['#_Players']
cards_per_nation = cards_per_nation.sort_values(by=['Cards_per_Player','#_Players'],ascending=False).reset_index().head(25)

In [None]:
countries = cards_per_nation['Nationality']
cards = cards_per_nation['Cards_per_Player']
 
fig, ax = plt.subplots(figsize =(16, 9))
ax.barh(countries, cards)
 
# Remove axes splines
for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)
    
# Remove x, y Ticks
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')

# Add padding between axes and labels
ax.xaxis.set_tick_params(pad = 5)
ax.yaxis.set_tick_params(pad = 10)
 
# Add x, y gridlines
ax.grid(b = True, color ='grey',
        linestyle ='-.', linewidth = 0.5,
        alpha = 0.2)
 
# Show top values
ax.invert_yaxis()
 
# Add annotation to bars
for i in ax.patches:
    plt.text(i.get_width()+0.2, i.get_y()+0.5,
             str(round((i.get_width()), 2)),
             fontsize = 10, fontweight ='bold',
             color ='grey')

# Add Plot Title
ax.set_title('Number of cards (yellow + red) per 1 player within nationality',
             loc ='left', )
 
# Add Text watermark
fig.text(0.9, 0.15, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)
 
# Show Plot
plt.show()

In [None]:
df[df.Nationality=='MLI']

In [None]:
df[df.Nationality=='MKD']

Now we can see that English players are on 25th place. On the first place we have footballers from Mali and from North Macedonia. In case of Mali we have 2 players Moussa Djenepo from Southampton and Yves Bissouma from Brighton and the got together 14 cards like 1 footballer from North Macedonia Ezgjan Alioski	who also got 14 cards but he did it by self.

---

In [None]:
df.Position.unique()

In [None]:
# Assumption: first position is valid
old_pos = ['MF,FW', 'GK', 'FW', 'DF', 'MF', 'FW,MF', 'FW,DF', 'DF,MF', 'MF,DF', 'DF,FW']
new_pos = ['Midfielder', 'Goalkeeper', 'Forward', 'Defender', 'Midfielder', 'Forward', 'Forward', 'Defender', 'Midfielder', 'Defender']

for pos1, pos2 in zip(old_pos,new_pos):
    df.Position.replace(pos1, pos2, inplace=True)

In [None]:
cards_per_position = df[['Name','Position','Yellow_Cards','Red_Cards']].groupby('Position').agg({'Yellow_Cards':'sum','Red_Cards':'sum','Name':'count'}).reset_index()
cards_per_position['Total_Cards'] = cards_per_position.Yellow_Cards + cards_per_position.Red_Cards
cards_per_position.rename(columns={'Name':'#_Players'},inplace=True)
cards_per_position['Cards_per_Position'] = cards_per_position['Total_Cards'] / cards_per_position['#_Players']
cards_per_position = cards_per_position.reset_index(drop=True).sort_values(by=['Cards_per_Position','#_Players'],ascending=False)
cards_per_position

In [None]:
positions = cards_per_position['Position']
cards = cards_per_position['Cards_per_Position']
 
fig, ax = plt.subplots(figsize =(16, 9))
ax.barh(positions, cards)

# Add annotation to bars
for i in ax.patches:
    plt.text(i.get_width(), i.get_y()+0.4,
             str(round((i.get_width()), 2)),
             fontsize = 10, fontweight ='bold',
             color ='grey')

# Add Plot Title
ax.set_title('Number of cards (yellow + red) per 1 player within position',
             loc ='left', )
 
# Add Text watermark
fig.text(0.9, 0.10, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.show()

What might be interesting, midfielders have more cards per player than defenders. 

---

Below there are goals analysis (without own goals)

In [None]:
df.columns

In [None]:
df[['Club','Goals','Penalty_Goals','Assists']].groupby('Club').sum().reset_index().sort_values(by=['Assists'],ascending=False)

**Which team had more players in the top 10 most assists**

In [None]:
df[['Name','Club','Assists']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Assists'],ascending=False).head(10)

In [None]:
df[['Name','Club','Assists']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Assists'],ascending=False).head(10).groupby('Club')['Name'].count().sort_values(ascending=False)

Who were the players with most attempted passes

In [None]:
df.columns

In [None]:
df[['Name','Club','Passes_Attempted']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Passes_Attempted'],ascending=False).head(10)

Which players had the most accurate passes

In [None]:
df[['Name','Club','Perc_Passes_Completed']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Perc_Passes_Completed'],ascending=False).head(10)

In [None]:
df[['Name','Club','Passes_Attempted','Perc_Passes_Completed']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Perc_Passes_Completed'],ascending=False).head(10)

In [None]:
df.loc[df.Passes_Attempted>1000,['Name','Club','Passes_Attempted','Perc_Passes_Completed']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Perc_Passes_Completed'],ascending=False).head(10)

**Top 10 goal scorers: all players and players by position**

In [None]:
df[['Name','Club','Position','Goals']].groupby(['Name','Club','Position']).sum().reset_index().sort_values(by=['Goals'],ascending=False).head(10)

In [None]:
#Num Goals scored for Players that played more than 10 games
df2 = df[df['Matches']>10].sort_values('Goals',ascending=False)[['Name','xG','Goals','Club','Position']].head(10)
print(df2)
fig = px.bar(df2, x='Name', y='Goals',hover_data=['Club'],color='Position') #
# category_orders=dict(group=d)
fig.update_layout(
    title={
        'text': "Top 10 Goalscorers",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    font_family="Calibri",
    title_font_family="Times New Roman",
    title_font_color="black",
    legend_title_font_color="red",
    xaxis_categoryorder = 'total descending'
)
fig.update_layout(yaxis_categoryorder = 'total ascending')
fig.show()

In [None]:
df[['Position','Goals']].groupby(['Position']).sum().reset_index().sort_values(by=['Goals'],ascending=False).head(10)

**Top 10 assists: all players and players by position**

In [None]:
df[['Name','Club','Position','Assists']].groupby(['Name','Club','Position']).sum().reset_index().sort_values(by=['Assists'],ascending=False).head(10)

In [None]:
df[['Position','Assists']].groupby(['Position']).sum().reset_index().sort_values(by=['Assists'],ascending=False).head(10)

canadian points

In [None]:
df_canadian = df[['Name', 'Club', 'Position','Goals', 'Assists']].copy()
df_canadian['Cacadian_Points'] = df_canadian.Goals + df_canadian.Assists
df_canadian[['Name','Club','Cacadian_Points','Goals', 'Assists']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Cacadian_Points'],ascending=False).head(10)

**Top 10 Players due to Ages**

In [None]:
df[['Name','Club','Position','Age']].sort_values(by=['Age'],ascending=False).head(10)

In [None]:
df[['Position','Age']].groupby(['Position']).mean().round(0).reset_index().sort_values(by=['Age'],ascending=False).head(10)

**Show all teams with average age**

In [None]:
df[['Club','Age']].groupby('Club').mean().round(1).reset_index().sort_values(by=['Age'],ascending=False)

In [None]:
plt.figure(figsize=(18,8))
sns.boxplot(x='Club',y='Age',data=df)
plt.xticks(rotation=90)

**Nationalities of the League**

**Correlation Graph**

In [None]:
import seaborn as sns

In [None]:
plt.figure(figsize=(15, 8))

correlation = sns.heatmap(df.corr(), vmin=-1, vmax=1, annot=True, linewidths=1, linecolor='black', cmap='YlGnBu') #,cmap='Pastel1'
sns.color_palette("flare", as_cmap=True)
correlation.set_title('Correlation Graph of the Dataset', fontdict={'fontsize': 24})