## 1. Introduction

In this project, I’ll explore the English Premier League and Championship datasets. The goal is to analyze how the game has evolved across seasons and answer four key questions:
  1. How have average goals per match changed across seasons in the Premier League compared to the Championship?

  2. Do disciplinary actions (yellow/red cards) show any noticeable trend over time?

  3. What is the relationship between shots taken and goals scored—does more shooting always mean more scoring?

  4. Which teams historically stand out as the most offensively productive (most goals per match)?

## 2. Imports and Data Loading

I’ll use pandas for data wrangling, numpy for numerical calculations, matplotlib and seaborn for visualization.

In [1]:
# Dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#Load the Dataset
pl = pd.read_csv('England CSV.csv')
ch = pd.read_csv('England 2 CSV.csv')

In [None]:
pl.head()

Unnamed: 0,Date,Season,HomeTeam,AwayTeam,FTH Goals,FTA Goals,FT Result,HTH Goals,HTA Goals,HT Result,...,H Fouls,A Fouls,H Corners,A Corners,H Yellow,A Yellow,H Red,A Red,Display_Order,League
0,16/01/2025,2024/25,Ipswich Town,Brighton & Hove Albion,0,2,A,0.0,1.0,A,...,13.0,14.0,1.0,9.0,2.0,2.0,0.0,0.0,20250116,Premier League
1,16/01/2025,2024/25,Man United,Southampton,3,1,H,0.0,1.0,A,...,7.0,10.0,4.0,4.0,1.0,3.0,0.0,0.0,20250116,Premier League
2,15/01/2025,2024/25,Everton,Aston Villa,0,1,A,0.0,0.0,D,...,17.0,10.0,8.0,5.0,2.0,1.0,0.0,0.0,20250115,Premier League
3,15/01/2025,2024/25,Leicester,Crystal Palace,0,2,A,0.0,0.0,D,...,7.0,6.0,4.0,3.0,0.0,0.0,0.0,0.0,20250115,Premier League
4,15/01/2025,2024/25,Newcastle,Wolves,3,0,H,1.0,0.0,H,...,10.0,13.0,4.0,2.0,0.0,2.0,0.0,0.0,20250115,Premier League


In [5]:
ch.head()

Unnamed: 0,Date,Season,HomeTeam,AwayTeam,FTH Goals,FTA Goals,FT Result,HTH Goals,HTA Goals,HT Result,...,H Fouls,A Fouls,H Corners,A Corners,H Yellow,A Yellow,H Red,A Red,Display_Order,League
0,15/01/2025,2024/25,Blackburn,Portsmouth,3,0,H,0.0,0.0,D,...,15.0,19.0,5.0,6.0,1.0,3.0,0.0,0.0,20250115,English Second
1,14/01/2025,2024/25,Cardiff,Watford,1,1,D,0.0,0.0,D,...,8.0,14.0,7.0,3.0,1.0,2.0,0.0,0.0,20250114,English Second
2,14/01/2025,2024/25,Plymouth,Oxford,1,1,D,0.0,1.0,A,...,8.0,10.0,6.0,3.0,1.0,1.0,0.0,0.0,20250114,English Second
3,6/01/2025,2024/25,QPR,Luton,2,1,H,1.0,1.0,D,...,10.0,9.0,3.0,4.0,1.0,1.0,0.0,0.0,20250106,English Second
4,5/01/2025,2024/25,Sunderland,Portsmouth,1,0,H,1.0,0.0,H,...,9.0,10.0,7.0,3.0,1.0,2.0,0.0,1.0,20250105,English Second


In [5]:
# 2. Inspect the dataset
print(df.shape)
print(df.columns)
print(df.dtypes)
print(df.head())

(12153, 25)
Index(['Date', 'Season', 'HomeTeam', 'AwayTeam', 'FTH Goals', 'FTA Goals',
       'FT Result', 'HTH Goals', 'HTA Goals', 'HT Result', 'Referee',
       'H Shots', 'A Shots', 'H SOT', 'A SOT', 'H Fouls', 'A Fouls',
       'H Corners', 'A Corners', 'H Yellow', 'A Yellow', 'H Red', 'A Red',
       'Display_Order', 'League'],
      dtype='object')
Date              object
Season            object
HomeTeam          object
AwayTeam          object
FTH Goals          int64
FTA Goals          int64
FT Result         object
HTH Goals        float64
HTA Goals        float64
HT Result         object
Referee           object
H Shots          float64
A Shots          float64
H SOT            float64
A SOT            float64
H Fouls          float64
A Fouls          float64
H Corners        float64
A Corners        float64
H Yellow         float64
A Yellow         float64
H Red            float64
A Red            float64
Display_Order      int64
League            object
dtype: object
   

In [6]:
# 3. Make sure date is parsed correctly
if 'Date' in df.columns:
    df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
    df['Season'] = df['Date'].dt.year

  df['Date'] = pd.to_datetime(df['Date'], errors='coerce')


In [7]:
# 4. Example: average goals per match by season
if 'FTHG' in df.columns and 'FTAG' in df.columns:
    df['TotalGoals'] = df['FTHG'] + df['FTAG']
    avg_goals = df.groupby('Season')['TotalGoals'].mean()

    plt.figure(figsize=(10,5))
    avg_goals.plot(kind='line', marker='o')
    plt.title("Average Goals per Match by Season")
    plt.xlabel("Season")
    plt.ylabel("Average Goals")
    plt.grid(True)
    plt.show()