# **La Liga 2023/24 Match Analysis**

## **Overview**
This notebook focuses on analyzing match data from the La Liga 2023/24 season to uncover insights into team performance, match outcomes, and season trends. By processing and enriching the dataset, we aim to provide valuable insights for fans, analysts, and stakeholders in European football.

---

## **Data Source**
The dataset includes detailed information about all matches played during the La Liga 2023/24 season. Key fields include:
- **Matchday**: The round of the league.
- **Match ID**: A unique identifier for each match.
- **Teams**: Home and Away teams.
- **Scores**: Final scorelines for each match.
- **UTC Time**: The date and time of the match in Coordinated Universal Time.

---

## **Key Objectives**
1. **Data Cleaning & Transformation**:
   - Process raw data to extract meaningful insights.
   - Split aggregate scores into home and away goals.
   - Add relevant columns such as total goals, goal differences, and match outcomes.

2. **Feature Engineering**:
   - Calculate cumulative goals and points for each team.
   - Categorize matches as "Top Clashes", "Relegation Battles", or "Regular Matches".
   - Extract temporal features (day of the week, month).

3. **Statistical Analysis**:
   - Analyze match outcomes (Home Wins, Away Wins, Draws).
   - Identify high-scoring matches and trends by matchday.

4. **Visualization**:
   - Plot distributions of match outcomes and goals.
   - Highlight key matches based on their importance.

---

## **Key Features Added**
- **Home and Away Scores**: Goals scored by each team in a match.
- **Total Goals**: Sum of goals scored in a match.
- **Goal Difference**: Difference between home and away goals.
- **Match Outcome**: Categorized as Home Win, Away Win, or Draw.
- **Cumulative Goals**: Running total of goals scored by teams throughout the season.
- **Matchday Average**: Average number of goals scored per matchday.
- **Above Matchday Average**: Indicator of matches exceeding the matchday average.
- **Match Importance**: Categorization of matches based on team quality and stakes.
- **Temporal Features**: Day of the week and month of the match.

---

## **Getting Started**
This notebook begins with importing the dataset, cleaning the data, and engineering new features. It then proceeds to explore match outcomes and team performance through statistical summaries and visualizations.


In [35]:
import datetime as dt  # For working with dates and times
import os  # For file and directory handling
import pandas as pd  # For data manipulation
import numpy as np  # For numerical computations
import matplotlib.pyplot as plt  # For creating plots
import seaborn as sns  # For advanced visualization
from sklearn.preprocessing import StandardScaler  # For scaling data
from sklearn.decomposition import PCA  # For Principal Component Analysis
from sklearn.cluster import KMeans  # For clustering
from statsmodels.tsa.arima.model import ARIMA  # For ARIMA forecasting
from statsmodels.tsa.holtwinters import ExponentialSmoothing  # For time series smoothing
from pmdarima import auto_arima  # For auto ARIMA model selection (optional)





In [36]:
# File path for the dataset
file_path = "FIFA_datasets/laliga2023_34/Laliga_matches_23_24.csv"

# Read and process the dataset
matches_df = (
    pd.read_csv(file_path)
    .drop(columns=['Finished', 'Started', 'Cancelled', 'Awarded', 'Match Status', 'Unnamed: 12', 'Round Name'], errors='ignore')  # Drop unnecessary columns
    .assign(
        # Split the Score column into Home Score and Away Score
        Home_Score=lambda df: df['Score'].str.split('_', expand=True)[0].astype(int),
        Away_Score=lambda df: df['Score'].str.split('_', expand=True)[1].astype(int),
        # Add Total Goals
        Total_Goals=lambda df: df['Home_Score'] + df['Away_Score'],
        # Add Home and Away Goal Differences
        Home_Goal_Difference=lambda df: df['Home_Score'] - df['Away_Score'],
        Away_Goal_Difference=lambda df: df['Away_Score'] - df['Home_Score'],
        # Determine Match Outcome
        Match_Outcome=lambda df: df.apply(
            lambda row: 'Home Win' if row['Home_Goal_Difference'] > 0 else (
                'Away Win' if row['Away_Goal_Difference'] > 0 else 'Draw'
            ), axis=1
        )
    )
    .drop(columns=['Score'])  # Drop the Score column after processing
    .rename(columns={'Round': 'Matchday'})  # Rename Round to Matchday
)

# Convert the 'UTC Time' column to a datetime object and set as index
matches_df['UTC Time'] = pd.to_datetime(matches_df['UTC Time'])
matches_df.set_index('UTC Time', inplace=True)

# Dynamically calculate team performance metrics
performance_data = []

# Iterate through each row to aggregate performance data
for _, row in matches_df.iterrows():
    if row['Match_Outcome'] == 'Home Win':
        performance_data.append({'Team': row['Home Team'], 'Wins': 1, 'Losses': 0, 'Draws': 0,
                                 'Goals Scored': row['Home_Score'], 'Goals Conceded': row['Away_Score'],
                                 'Home Matches': 1, 'Away Matches': 0, 'Home Wins': 1, 'Away Wins': 0})
        performance_data.append({'Team': row['Away Team'], 'Wins': 0, 'Losses': 1, 'Draws': 0,
                                 'Goals Scored': row['Away_Score'], 'Goals Conceded': row['Home_Score'],
                                 'Home Matches': 0, 'Away Matches': 1, 'Home Wins': 0, 'Away Wins': 0})
    elif row['Match_Outcome'] == 'Away Win':
        performance_data.append({'Team': row['Home Team'], 'Wins': 0, 'Losses': 1, 'Draws': 0,
                                 'Goals Scored': row['Home_Score'], 'Goals Conceded': row['Away_Score'],
                                 'Home Matches': 1, 'Away Matches': 0, 'Home Wins': 0, 'Away Wins': 0})
        performance_data.append({'Team': row['Away Team'], 'Wins': 1, 'Losses': 0, 'Draws': 0,
                                 'Goals Scored': row['Away_Score'], 'Goals Conceded': row['Home_Score'],
                                 'Home Matches': 0, 'Away Matches': 1, 'Home Wins': 0, 'Away Wins': 1})
    else:  # Draw
        performance_data.append({'Team': row['Home Team'], 'Wins': 0, 'Losses': 0, 'Draws': 1,
                                 'Goals Scored': row['Home_Score'], 'Goals Conceded': row['Away_Score'],
                                 'Home Matches': 1, 'Away Matches': 0, 'Home Wins': 0, 'Away Wins': 0})
        performance_data.append({'Team': row['Away Team'], 'Wins': 0, 'Losses': 0, 'Draws': 1,
                                 'Goals Scored': row['Away_Score'], 'Goals Conceded': row['Home_Score'],
                                 'Home Matches': 0, 'Away Matches': 1, 'Home Wins': 0, 'Away Wins': 0})

# Convert performance data to a DataFrame
performance_df = pd.DataFrame(performance_data)

# Calculate aggregated performance metrics for each team
team_performance = (
    performance_df.groupby('Team')
    .agg(
        Wins=('Wins', 'sum'),
        Losses=('Losses', 'sum'),
        Draws=('Draws', 'sum'),
        Goals_Scored=('Goals Scored', 'sum'),
        Goals_Conceded=('Goals Conceded', 'sum'),
        Home_Matches=('Home Matches', 'sum'),
        Away_Matches=('Away Matches', 'sum'),
        Home_Wins=('Home Wins', 'sum'),
        Away_Wins=('Away Wins', 'sum')
    )
    .assign(
        Goal_Difference=lambda df: df['Goals_Scored'] - df['Goals_Conceded'],
        Total_Points=lambda df: df['Wins'] * 3 + df['Draws'],
        Home_Win_Percentage=lambda df: (df['Home_Wins'] / df['Home_Matches'] * 100).fillna(0),
        Away_Win_Percentage=lambda df: (df['Away_Wins'] / df['Away_Matches'] * 100).fillna(0)
    )
    .sort_values(by='Total_Points', ascending=False)
)



In [37]:
team_performance

Unnamed: 0_level_0,Wins,Losses,Draws,Goals_Scored,Goals_Conceded,Home_Matches,Away_Matches,Home_Wins,Away_Wins,Goal_Difference,Total_Points,Home_Win_Percentage,Away_Win_Percentage
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Real Madrid,29,1,8,87,26,19,19,16,13,61,95,84.210526,68.421053
Barcelona,26,5,7,79,44,19,19,15,11,35,85,78.947368,57.894737
Girona,25,7,6,85,46,19,19,15,10,39,81,78.947368,52.631579
Atletico Madrid,24,10,4,70,43,19,19,16,8,27,76,84.210526,42.105263
Athletic Club,19,8,11,61,37,19,19,12,7,24,68,63.157895,36.842105
Real Sociedad,16,10,12,51,39,19,19,8,8,12,60,42.105263,42.105263
Real Betis,14,9,15,48,45,19,19,9,5,3,57,47.368421,26.315789
Villarreal,14,13,11,65,65,19,19,7,7,0,53,36.842105,36.842105
Valencia,13,15,10,40,45,19,19,8,5,-5,49,42.105263,26.315789
Deportivo Alaves,12,16,10,36,46,19,19,9,3,-10,46,47.368421,15.789474


In [38]:
matches_df

Unnamed: 0_level_0,Matchday,Match ID,Home Team,Away Team,Home_Score,Away_Score,Total_Goals,Home_Goal_Difference,Away_Goal_Difference,Match_Outcome
UTC Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2023-08-11 17:30:00+00:00,1,4205343,Almeria,Rayo Vallecano,0,2,2,-2,2,Away Win
2023-08-11 20:00:00+00:00,1,4205347,Sevilla,Valencia,1,2,3,-1,1,Away Win
2023-08-12 15:00:00+00:00,1,4205351,Real Sociedad,Girona,1,1,2,0,0,Draw
2023-08-12 17:30:00+00:00,1,4205348,Las Palmas,Mallorca,1,1,2,0,0,Draw
2023-08-12 19:30:00+00:00,1,4205344,Athletic Club,Real Madrid,0,2,2,-2,2,Away Win
...,...,...,...,...,...,...,...,...,...,...
2024-05-25 19:00:00+00:00,38,4205721,Real Madrid,Real Betis,0,0,0,0,0,Draw
2024-05-26 12:00:00+00:00,38,4205715,Getafe,Mallorca,1,2,3,-1,1,Away Win
2024-05-26 14:15:00+00:00,38,4205714,Celta Vigo,Valencia,2,2,4,0,0,Draw
2024-05-26 14:15:00+00:00,38,4205722,Las Palmas,Deportivo Alaves,1,1,2,0,0,Draw


---

### Match Outcome Percentages

Based on the analysis of the La Liga 2023/24 season matches, the outcomes are distributed as follows:

- **Home Wins**: 43.95% of matches were won by the home team.
- **Away Wins**: 27.89% of matches were won by the away team.
- **Draws**: 28.16% of matches ended in a draw.

This highlights the advantage of playing at home, as home teams secured nearly 44% of the victories.


---