# Analyzing Performance and Market Value of Top 5 German Bundesliga Clubs (Season 2023-2024)

This portfolio project provides an in-depth analysis of *FC Bayern Munich*'s performance and market dynamics during the Bundesliga 2023-2024 season. It leverages a combination of Python for data manipulation and analysis, SQL for data retrieval, and R for advanced statistical analysis, aiming to provide actionable insights that highlight key trends and drivers of success.
Last Updated: April 14, 2024

Author: Moritz Philipp Haaf, BSc (WU) MA

Contact Information:

Email:   moritz_haaf@outlook.com
GitHub:  https://github.com/itzmore-mph/itzmore-mph-portfolio

# Initial Setup

All necessary libraries are imported at the beginning of the notebook to ensure clarity, ease debugging, and optimize performance.

In [None]:
# Importing data manipulation libraries
import pandas as pd
import numpy as np

# Importing visualization libraries
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

# Configuring Plotly for Jupyter
pio.renderers.default = 'notebook'

# Importing machine learning and other necessary libraries
from sklearn.linear_model import LinearRegression
import requests
from datetime import datetime


1. Data Collection

Data was meticulously sourced from reputable platforms including the Bundesliga's official website, diverse Kaggle datasets, and sports APIs. This ensures a robust dataset that spans player statistics, match results, and market values.

In [None]:
# Function to fetch data from a specified URL
def fetch_data(url):
    """Fetches data from the provided URL and returns a pandas DataFrame."""
    response = requests.get(url)
    data = response.json()
    return pd.DataFrame(data)

# Example URL for Bundesliga data
url = "https://github.com/itzmore-mph/itzmore-mph-portfolio"
bayern_stats = fetch_data(url)


2. Data Preparation
   
We applied various preprocessing steps to enhance data quality and usability. This included addressing missing values and standardizing the formats of date fields.

# Handling missing values and standardizing date format
bayern_stats.fillna(0, inplace=True)  # Replacing missing values with 0
bayern_stats['date'] = pd.to_datetime(bayern_stats['date'])  # Ensuring 'date' column is in datetime format


3. Exploratory Data Analysis (EDA)
   
We conducted an exploratory data analysis using summary statistics and visualization techniques to explore player performance metrics and market value trends, and to identify relationships between variables.

In [None]:
# Displaying summary statistics for the dataset
print(bayern_stats.describe())

# Generating a correlation heatmap
sns.heatmap(bayern_stats.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix of Player Performance Metrics')
plt.show()

4. Performance Analysis
   
We performed a comparative analysis of wins, losses, and draws to assess the team's relative performance within the league.

In [None]:
# Visual comparison of team performance results
sns.barplot(x='result', y='count', data=bayern_stats['result'].value_counts().reset_index())
plt.title('Team Performance Overview')
plt.xlabel('Match Result')
plt.ylabel('Number of Matches')
plt.show()


In [None]:
5. Market Value Analysis
Interactive visualizations were created to understand the fluctuations in player market values over the season, analyzing how performance impacts these valuations.

In [None]:
# Creating a line plot to track market value trends over time
fig = px.line(bayern_stats, x='date', y='market_value', title='Market Value Trend of Players')
fig.show()

6. SQL Analysis
   
SQL is used to efficiently extract and analyze specific data segments from our extensive database.

In [None]:
%%sql
-- SQL query to fetch average goals per player for Bayer 04 bayern
SELECT player_id, AVG(goals) AS average_goals
FROM player_performance
WHERE team_id = 'bayern'
GROUP BY player_id
ORDER BY average_goals DESC
LIMIT 10;

7. R Programming Analysis
   
In-depth statistical analysis is performed using R to complement the insights gained from Python. This section focuses on the relationship between goals scored and market values of the players.

In [None]:
%load_ext rpy2.ipython

%%R
library(ggplot2)
player_stats <- read.csv("https://github.com/itzmore-mph/itzmore-mph-portfolio/player_stats.csv")
ggplot(player_stats, aes(x=goals, y=market_value)) + geom_point() + geom_smooth(method="lm") +
    ggtitle("Relationship Between Goals and Market Value")


8. Predictive Modeling

Predictive models are developed to forecast future market values based on players' historical performance data. These models are pivotal for strategic planning and investment decisions.

In [None]:
# Preparing data for modeling
X = bayern_stats[['goals', 'assists', 'minutes_played']]
y = bayern_stats['market_value']

# Building and fitting the linear regression model
model = LinearRegression().fit(X, y)


# Expected Goal Value (xG) Analysis
<b>The Expected Goal (xG)</b> metric provides insights into both the quantity and quality of shots taken by a team or player. It estimates the likelihood that a given shot will result in a goal based on several factors such as shot angle, distance from goal, type of assist, and whether it was a headed shot, among others. This section will analyze Bayer 04 Leverkusen's xG throughout the season and compare it with actual goals scored to assess their scoring efficiency and performance under pressure.

Data Preparation for xG Analysis
First, ensure your dataset includes xG values for each shot or scoring opportunity. If the dataset does not include these, you might need to calculate or estimate them using available shot data, which can be complex without detailed shot information.


In [None]:
# Assuming 'bayern_stats' has columns for 'xG' and 'goals'
bayern_stats['xG'] = bayern_stats['shot_detail'].apply(calculate_xG)  # This is a placeholder function


Exploratory Data Analysis of xG
Visualize the distribution of xG in comparison to actual goals to understand the team's efficiency.

In [None]:
# Plotting xG vs. actual goals
plt.figure(figsize=(10, 6))
sns.scatterplot(x='xG', y='goals', data=bayern_stats)
plt.title('xG vs. Actual Goals for Bayer 04 bayern')
plt.xlabel('Expected Goals (xG)')
plt.ylabel('Actual Goals')
plt.show()

Comparative Analysis of xG and Actual Goals
Calculate the total xG and actual goals over the season to evaluate performance.

In [None]:
# Summing up xG and actual goals
total_xG = bayern_stats['xG'].sum()
total_goals = bayern_stats['goals'].sum()

print(f"Total Expected Goals (xG): {total_xG:.2f}")
print(f"Total Actual Goals Scored: {total_goals}")

# Visualizing the cumulative xG and actual goals over the season
bayern_stats['cumulative_xG'] = bayern_stats['xG'].cumsum()
bayern_stats['cumulative_goals'] = bayern_stats['goals'].cumsum()

plt.figure(figsize=(12, 7))
plt.plot(bayern_stats['date'], bayern_stats['cumulative_xG'], label='Cumulative xG')
plt.plot(bayern_stats['date'], bayern_stats['cumulative_goals'], label='Cumulative Goals')
plt.legend()
plt.title('Cumulative xG vs. Goals Over the Season')
plt.xlabel('Date')
plt.ylabel('Total xG/Goals')
plt.show()


9. Conclusion
    
This analysis provides a detailed and comprehensive overview of Bayer 04 Leverkusen's performance and market dynamics throughout the 2023-2024 Bundesliga season. Our findings furnish the club with actionable insights to enhance performance and maximize financial outcomes. Key takeaways include:

Performance Insights: The correlation and comparative analyses help identify key performance metrics that directly impact the team's success.
Market Value Dynamics: Visualization and predictive modeling elucidate how player performances influence their market value, guiding potential investment strategies.
Strategic Recommendations: Based on predictive outcomes, strategic recommendations are provided to optimize player acquisitions and development efforts.
Future Directions:

Data Integration: Incorporating additional data sources such as fan engagement metrics and economic factors could provide a more holistic view of influences on market values.
Advanced Analytics: Employing more sophisticated machine learning models and ensemble techniques could enhance the accuracy of predictions.
Operational Efficiency: Streamlining data collection and processing workflows to enable real-time analytics could provide Bayer 04 Leverkusen with timely insights during the season.
This portfolio project not only demonstrates a deep analytical capability across Python, SQL, and R but also showcases the ability to translate complex data into strategic insights, which is crucial for roles in data science, particularly within the sports analytics industry.

From the xG analysis, we can draw conclusions about the team's offensive performance:

Efficiency: If actual goals significantly exceed the xG, it suggests that the team has been more efficient than expected in converting chances.
Potential Issues: Conversely, if xG substantially exceeds actual goals, it might indicate issues in finishing or an over-reliance on low-quality chances.
Including xG analysis enriches the data-driven insights provided by the project, offering a more nuanced view of team performance that traditional metrics might overlook. This aspect of the analysis is particularly appealing to potential employers or collaborators with an interest in advanced sports analytics.
