# **Importing the Libraries**

> **Football is a game of mistakes, whoever makes the fewest, Wins!**

**This indeed is a really nice dataset giving us few cheeky stats for players across the most popular and competitive football league around the globe -- The Premier League!**

**I did some Exploratory Data Analysis (EDA) and performed some visualizations to gain important insights about the Premier League from the dataset!**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import squarify
import folium
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.express as px

pd.maxdisplaycolums = None

**Importing the Dataset**

In [None]:
data = pd.read_csv('../input/english-premier-league202021/EPL_20_21.csv')

# **Performing initial checks on the dataset**

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
data.head()

In [None]:
data.isna().sum()

# **Adding extra columns and segregating Positions**

In [None]:
data['Mins/Match'] = (data['Mins']) / (data['Matches'])
data['Mins/Match']  = data['Mins/Match'].astype(int)

In [None]:
data['Position'].unique()

In [None]:
position = pd.DataFrame(data.Position.str.split(',',1).tolist(),
                                 columns = ['Position','Secondary_Pos'])

In [None]:
data = data.drop(columns='Position')
position = position.drop(columns='Secondary_Pos')

In [None]:
data = pd.concat([data,position],axis =1)

In [None]:
df_GK = data.where(data['Position']=='GK')
df_GK = df_GK.dropna()
df_DF = data.where(data['Position']=='DF')
df_DF = df_DF.dropna()
df_MF = data.where(data['Position']=='MF')
df_MF = df_MF.dropna()
df_FW = data.where(data['Position']=='FW')
df_FW = df_FW.dropna()

# **Number of players according to position**

In [None]:
print("Number of Goalkeepers in the league : ",len(df_GK))
print("Number of Defenders in the league : ",len(df_DF))
print("Number of Midfielders in the league : ",len(df_MF))
print("Number of Forwards in the league : ",len(df_FW))

# **Plot of Goals per position**

In [None]:
goal_team = pd.DataFrame(data.groupby('Position', as_index=False)['Goals'].sum() )
ax =sns.barplot(x='Position', y='Goals', data=goal_team.sort_values(by="Goals"))
sns.set_theme(style="whitegrid")
plt.xticks(rotation=0)
plt.title('Plot of Goals per position')

* **Forwards scoring maximum number of goals as expected.**

# **Plot of Players per position**

In [None]:
sns.set_theme(style="darkgrid")
ax = sns.countplot(x="Position",data=data, order = data['Position'].value_counts(ascending = True).index)
plt.xticks(rotation=0)
plt.title('Plot of Players per position')
plt.ylabel('Number of players')

* **Maximum number of defenders are there in the league, followed by Midfielders and Strikers**

# **Plot of Total goals scored by each club.**

In [None]:
goals = pd.DataFrame(data.groupby('Club', as_index=False)['Goals'].sum() )
sns.set_theme(style="whitegrid",color_codes=True)
ax = sns.barplot(x='Club',y='Goals',data=goals.sort_values(by="Goals"))
ax.set_xlabel("Club",fontsize=30)
ax.set_ylabel("Goals",fontsize=20)
plt.xticks(rotation=75)
plt.rcParams["figure.figsize"] = (20,8)
plt.title('Plot of Clubs vs Total goals scored',fontsize = 20)

* **Manchester City does score maximum number of goals followed by Manchester United and Tottenham Hotspur**

In [None]:
# Net goals for every player
data['net_goals'] = data['Goals'] - data['Penalty_Goals']

# **Plotting Net Goals per club (Total Goals - Penalty Goals)**

In [None]:
#Plotting Net goals per club
goals = pd.DataFrame(data.groupby('Club', as_index=False)['net_goals'].sum() )
sns.set_theme(style="darkgrid",color_codes=True)
ax = sns.barplot(x='Club',y='net_goals',data=goals.sort_values(by="net_goals"))
ax.set_xlabel("Club",fontsize=30)
ax.set_ylabel("Net Goals",fontsize=20)
plt.xticks(rotation=90)
plt.rcParams["figure.figsize"] = (20,8)
plt.title('Plot of Clubs vs Non penalty goals scored',fontsize = 20)

* **Considering non penalty goals, Tottenham Hotspur do leave Manchester United behind.**
* **Manchester City still leads the way for the maximum number of open play goals**

In [None]:
df_age1 = data.where(data.Age<=20)
df_age1 = df_age1.dropna()
df_age2 = data[(data['Age']>20) & (data['Age']<=30)]
df_age2 = df_age2.dropna()
df_age3 = data[(data['Age']>30) & (data['Age']<=40)]
df_age3 = df_age3.dropna()

# **Number of players within different Age brackets**

In [None]:
print("Number of Players under the age of 20 : ",len(df_age1))
print("Number of Players who are in between 20 and 30 yo : ",len(df_age2))
print("Number of Players over 30 : ",len(df_age3))

In [None]:
y = np.array([df_age1['Name'].count(),df_age2['Name'].count(),df_age3['Name'].count()])
mylabels = ["<=20", ">20 & <=30", ">30 & <=40"]
plt.title('Plot of Number of Players with Age',fontsize = 20)
plt.pie(y, labels = mylabels, autopct="%.1f%%")
plt.show()

* **Average age of players in the premier league is in between 20-30 years. The future of the league is definitely bright**

# **Plot of Age range for each Club**

In [None]:
plt.figure(figsize=(18,8))
b = sns.boxplot(x='Club',y='Age',data=data)
b.set_xlabel("Club",fontsize=25)
b.set_ylabel("Age",fontsize=20)
plt.xticks(rotation=90)
plt.title('Plot of Age range for each Club',fontsize = 20)

* **Crystal Palace have the oldest average age.**
* **Manchester United are in with the youngest squad in the league**

# **Plot of Players vs Total and Non Penalty goals scored**

In [None]:
sns.set_theme(style="dark")
ax = sns.barplot(x='Name',y='Goals',data=data.sort_values(by="Goals",ascending= False)[:10],palette='rocket')
plt.xticks(rotation=90)
plt.title('Plot of Players vs Total and Non Penalty goals scored',fontsize = 20)
width = 0.5
for bar in ax.containers[0]:
    bar.set_width(bar.get_width() * width)
ax.set_xlabel("Name",fontsize=30)
ax.set_ylabel("Goals",fontsize=20)
    
ax2 = ax.twinx()
ax2 = sns.barplot(x='Name',y='net_goals',data=data.sort_values(by="net_goals",ascending= False)[:10],palette='rocket',alpha = 0.7,hatch = '//')
for bar in ax2.containers[0]:
    x = bar.get_x()
    w = bar.get_width()
    bar.set_x(x + w * (1- width))
    bar.set_width(w * width)

* **Considering open play goals, Bruno Fernandes is nowhere to be seen. You surely dont want him in your fantasy squad unless its about penalties**
* **Harry Kane leading the line as always**

# **Plot of Players per Nationality**

In [None]:
size = data.groupby('Nationality',)['Name'].count().sort_values(ascending= False).tolist() 
label=data.groupby('Nationality',)['Name'].count().sort_values(ascending= False).index.values.tolist()

label = [i+" "+j for i, j in zip(label, [str(x) for x in size])]
squarify.plot(sizes=size, label=label, alpha=.6, text_kwargs={'fontsize':12})

* **Maximum number of players from England, quite obvious as it is the local league**
* **Brazil be the only non European nation with so many players in the league**

# **Plot of correlation between variables**

In [None]:
plt.figure(figsize=(12,8))
sns.heatmap(data.corr(), annot=True, cmap='Pastel2')

# * **Furthermore,You would want to have the below mentioned players playing for your fantasy team if you want to score every week**

# **Plot of Forwards with 30+ starts vs Mins/Match**

In [None]:
sns.set_theme(style="darkgrid")
ax = sns.barplot(x='Name',y='Mins/Match',data=data[(data['Starts'] > 30) & (data['Position'] == 'FW')].sort_values(by ='Mins/Match',ascending= False)[:10],palette='magma')
plt.xticks(rotation=45)
plt.title('Plot of Forwards with 30+ starts vs Mins/Match',fontsize = 20)
width = 0.75
for bar in ax.containers[0]:
    bar.set_width(bar.get_width() * width)
ax.set_xlabel("Name",fontsize=20)
ax.set_ylabel("Mins/Match",fontsize=20)


# **Plot of Midfielders with 30+ starts vs Mins/Match**

In [None]:
sns.set_theme(style="dark")
ax = sns.barplot(x='Name',y='Mins/Match',data=data[(data['Starts'] > 30) & (data['Position'] == 'MF')].sort_values(by ='Mins/Match',ascending= False)[:10],palette='magma',alpha = 0.8)
plt.xticks(rotation=45)
plt.title('Plot of Midfielders with 30+ starts vs Mins/Match',fontsize = 20)
width = 0.75
for bar in ax.containers[0]:
    bar.set_width(bar.get_width() * width)
ax.set_xlabel("Name",fontsize=20)
ax.set_ylabel("Mins/Match",fontsize=20)

# **Plot of Players vs Penalties attempted and scored**

In [None]:
sns.set_theme(style="dark")
ax = sns.barplot(x='Name',y='Penalty_Attempted',data=data.sort_values(by="Penalty_Attempted",ascending= False)[:10],palette='viridis')
plt.xticks(rotation=90)
plt.title('Plot of Players vs Penalties attempted and scored',fontsize = 20)
width = 0.5
for bar in ax.containers[0]:
    bar.set_width(bar.get_width() * width)
ax.set_xlabel("Name",fontsize=20)
ax.set_ylabel("Penalty_Attempted",fontsize=20)
    
ax2 = ax.twinx()
ax2 = sns.barplot(x='Name',y='Penalty_Goals',data=data.sort_values(by="Penalty_Goals",ascending= False)[:10],palette='viridis',alpha = 0.7,hatch = '//')
for bar in ax2.containers[0]:
    x = bar.get_x()
    w = bar.get_width()
    bar.set_x(x + w * (1- width))
    bar.set_width(w * width)

* **When it is about penalties there is nowhere near Bruno Fernandes**

# **Plot of Goals vs xG for each position(Players who scored 2+ goals)**

In [None]:
sns.lmplot(x='Goals', y='xG', data = data[(data['Goals'] > 2)],markers=["+", "x", "1"],hue = 'Position',height = 10)
plt.title('Plot of Goals vs xG ',fontsize = 20)

In [None]:
data['Total Cards'] = data['Yellow_Cards'] + data['Red_Cards']

# **Plot of Most agressive Clubs**

In [None]:
cards = pd.DataFrame(data.groupby('Club', as_index=False)['Total Cards'].sum() )
sns.set_theme(style="dark")
ax = sns.barplot(x='Club',y='Total Cards',data = cards.sort_values(by = 'Total Cards', ascending = False),palette='RdYlGn')
plt.xticks(rotation=90)
plt.title('Plot of Clubs vs Total Number of cards',fontsize = 20)
width = 0.75
for bar in ax.containers[0]:
    bar.set_width(bar.get_width() * width)
ax.set_xlabel("Name",fontsize=20)
ax.set_ylabel("Mins/Match",fontsize=20)


# **Plot of most agressive Players**

In [None]:
cards = pd.DataFrame(data.groupby('Name', as_index=False)['Total Cards'].sum() )
sns.set_theme(style="dark")
deep = sns.color_palette('deep')
ax = sns.barplot(x='Name',y='Total Cards',data = cards.sort_values(by = 'Total Cards', ascending = False)[:10],palette='RdYlGn',alpha = 0.7)
plt.xticks(rotation=90)
plt.title('Plot of Clubs vs Total Number of cards',fontsize = 20)
width = 0.75
for bar in ax.containers[0]:
    bar.set_width(bar.get_width() * width)
ax.set_xlabel("Name",fontsize=20)
ax.set_ylabel("Mins/Match",fontsize=20)


* **Should be very careful picking these players up, they might just bring negative hits to your lineup**

# **Final Thoughts**

* **With the New season starting in almost 3 weeks, this is probably the best time to stay connected with football and start building starter squads for Fantasy Teams**
* **Do make sure to lookout for 3 new teams promoted from Championship**
* **Rest assured you can make a calculated decision on which players to keep in for the longer run**
* **Will be doing more specific Fantasy League analysis in near future**
* **Please have a look, any kinds of feedback is welcome**
* **I do have a much detailed overall football dataset uploaded, have a look and drop in your thoughts on that too**