# Exploratory Data Analysis

---

## Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Loading Dataset

In [5]:
epl_df = pd.read_csv('D:\git_repositories\PythonProject2-Exploratory_Data_Analysis_Using_Python\Data\EPL_20_21.csv')

In [6]:
epl_df.head()

Unnamed: 0,Name,Club,Nationality,Position,Age,Matches,Starts,Mins,Goals,Assists,Passes_Attempted,Perc_Passes_Completed,Penalty_Goals,Penalty_Attempted,xG,xA,Yellow_Cards,Red_Cards
0,Mason Mount,Chelsea,ENG,"MF,FW",21,36,32,2890,6,5,1881,82.3,1,1,0.21,0.24,2,0
1,Edouard Mendy,Chelsea,SEN,GK,28,31,31,2745,0,0,1007,84.6,0,0,0.0,0.0,2,0
2,Timo Werner,Chelsea,GER,FW,24,35,29,2602,6,8,826,77.2,0,0,0.41,0.21,2,0
3,Ben Chilwell,Chelsea,ENG,DF,23,27,27,2286,3,5,1806,78.6,0,0,0.1,0.11,3,0
4,Reece James,Chelsea,ENG,DF,20,32,25,2373,1,2,1987,85.0,0,0,0.06,0.12,3,0


**Information about the dataset**

In [7]:
epl_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 532 entries, 0 to 531
Data columns (total 18 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Name                   532 non-null    object 
 1   Club                   532 non-null    object 
 2   Nationality            532 non-null    object 
 3   Position               532 non-null    object 
 4   Age                    532 non-null    int64  
 5   Matches                532 non-null    int64  
 6   Starts                 532 non-null    int64  
 7   Mins                   532 non-null    int64  
 8   Goals                  532 non-null    int64  
 9   Assists                532 non-null    int64  
 10  Passes_Attempted       532 non-null    int64  
 11  Perc_Passes_Completed  532 non-null    float64
 12  Penalty_Goals          532 non-null    int64  
 13  Penalty_Attempted      532 non-null    int64  
 14  xG                     532 non-null    float64
 15  xA    

**Checking null values**

In [9]:
epl_df.isna().sum()

Name                     0
Club                     0
Nationality              0
Position                 0
Age                      0
Matches                  0
Starts                   0
Mins                     0
Goals                    0
Assists                  0
Passes_Attempted         0
Perc_Passes_Completed    0
Penalty_Goals            0
Penalty_Attempted        0
xG                       0
xA                       0
Yellow_Cards             0
Red_Cards                0
dtype: int64

In this English Premier League (2020-2021) dataset, we can see that there are 18 columns, 532 rows and no missing values.

In [8]:
epl_df.describe()

Unnamed: 0,Age,Matches,Starts,Mins,Goals,Assists,Passes_Attempted,Perc_Passes_Completed,Penalty_Goals,Penalty_Attempted,xG,xA,Yellow_Cards,Red_Cards
count,532.0,532.0,532.0,532.0,532.0,532.0,532.0,532.0,532.0,532.0,532.0,532.0,532.0,532.0
mean,25.5,19.535714,15.714286,1411.443609,1.853383,1.287594,717.75,77.823872,0.191729,0.234962,0.113289,0.07265,2.114662,0.090226
std,4.319404,11.840459,11.921161,1043.171856,3.338009,2.095191,631.372522,13.011631,0.850881,0.975818,0.148174,0.090072,2.269094,0.293268
min,16.0,1.0,0.0,1.0,0.0,0.0,0.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,22.0,9.0,4.0,426.0,0.0,0.0,171.5,73.5,0.0,0.0,0.01,0.0,0.0,0.0
50%,26.0,21.0,15.0,1345.0,1.0,0.0,573.5,79.2,0.0,0.0,0.06,0.05,2.0,0.0
75%,29.0,30.0,27.0,2303.5,2.0,2.0,1129.5,84.625,0.0,0.0,0.15,0.11,3.0,0.0
max,38.0,38.0,38.0,3420.0,23.0,14.0,3214.0,100.0,9.0,10.0,1.16,0.9,12.0,2.0


### Create two new columns

In [11]:
# Creating a column minutes per match

epl_df['MinsPerMatch'] = (epl_df['Mins'] / epl_df['Matches']).astype(int)

In [12]:
# Creating second column goals per match

epl_df['GoalsPerMatch'] = (epl_df['Goals'] / epl_df['Matches']).astype(float)

In [13]:
epl_df.head()

Unnamed: 0,Name,Club,Nationality,Position,Age,Matches,Starts,Mins,Goals,Assists,Passes_Attempted,Perc_Passes_Completed,Penalty_Goals,Penalty_Attempted,xG,xA,Yellow_Cards,Red_Cards,MinsPerMatch,GoalsPerMatch
0,Mason Mount,Chelsea,ENG,"MF,FW",21,36,32,2890,6,5,1881,82.3,1,1,0.21,0.24,2,0,80,0.166667
1,Edouard Mendy,Chelsea,SEN,GK,28,31,31,2745,0,0,1007,84.6,0,0,0.0,0.0,2,0,88,0.0
2,Timo Werner,Chelsea,GER,FW,24,35,29,2602,6,8,826,77.2,0,0,0.41,0.21,2,0,74,0.171429
3,Ben Chilwell,Chelsea,ENG,DF,23,27,27,2286,3,5,1806,78.6,0,0,0.1,0.11,3,0,84,0.111111
4,Reece James,Chelsea,ENG,DF,20,32,25,2373,1,2,1987,85.0,0,0,0.06,0.12,3,0,74,0.03125


**Total Goals of English Premier League**

In [16]:
Total_goals = epl_df['Goals'].sum()
print("Total goals of English Premier League is " + str(Total_goals))

Total goals of English Premier League is 986
