## A Brief Recap of Data, Goals, and Tasks

- The dataset used for this project is an English soccer dataset sourced from Kaggle, containing match results, goals scored, cards issued, and other performance metrics across multiple seasons. The primary goal of this visualization project is to explore key trends and insights in English soccer, focusing on:

- The total number of goals scored per season.

- The number of wins per team.

- The distribution of yellow and red cards across teams.

- The relationship between goals scored and goals conceded per team.

- These tasks directly influenced the choice of visualizations, ensuring that the insights are presented in an intuitive and interactive manner.

In [None]:
import polars as pl
import pyarrow as pa
import altair as alt
import voila


data = pl.read_csv('/Users/siamakasemiesfahani/Downloads/EPL Dataset.csv')
data = (
    data.with_columns(
        [
            pl.col("Date").str.to_date("%d/%m/%Y"),
            (pl.col('FTH Goals') + pl.col('FTA Goals')).alias('Total Goals')
        ]
    )
)

In [2]:
data.head()

Date,Season,HomeTeam,AwayTeam,FTH Goals,FTA Goals,FT Result,HTH Goals,HTA Goals,HT Result,Referee,H Shots,A Shots,H SOT,A SOT,H Fouls,A Fouls,H Corners,A Corners,H Yellow,A Yellow,H Red,A Red,Display_Order,League,Total Goals
date,str,str,str,i64,i64,str,i64,i64,str,str,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,str,i64
2025-01-16,"""2024/25""","""Ipswich Town""","""Brighton & Hove Albion""",0,2,"""A""",0,1,"""A""","""T Harrington""",5,11,3,5,13,14,1,9,2,2,0,0,20250116,"""Premier League""",2
2025-01-16,"""2024/25""","""Man United""","""Southampton""",3,1,"""H""",0,1,"""A""","""J Brooks""",23,13,9,5,7,10,4,4,1,3,0,0,20250116,"""Premier League""",4
2025-01-15,"""2024/25""","""Everton""","""Aston Villa""",0,1,"""A""",0,0,"""D""","""S Barrott""",10,11,3,3,17,10,8,5,2,1,0,0,20250115,"""Premier League""",1
2025-01-15,"""2024/25""","""Leicester""","""Crystal Palace""",0,2,"""A""",0,0,"""D""","""A Madley""",21,9,4,4,7,6,4,3,0,0,0,0,20250115,"""Premier League""",2
2025-01-15,"""2024/25""","""Newcastle""","""Wolves""",3,0,"""H""",1,0,"""H""","""D England""",17,13,5,7,10,13,4,2,0,2,0,0,20250115,"""Premier League""",3


In [3]:
def plot_goals_per_season():
    """Plots the total goals scored per season."""
    goals_per_season = data.group_by("Season").agg(pl.col("Total Goals").sum())
    
    return alt.Chart(goals_per_season.to_pandas()).mark_line(point=True).encode(
        x=alt.X('Season:O', title='Season'),
        y=alt.Y('Total Goals:Q', title='Total Goals Scored'),
        tooltip=['Season', 'Total Goals']
    ).properties(title='Total Goals Scored per Season').interactive()

def plot_team_wins():
    """Plots the number of wins per team."""
    wins = data.filter(pl.col("FT Result") == "H").group_by("HomeTeam").agg(pl.count().alias("Wins"))
    
    return alt.Chart(wins.to_pandas()).mark_bar().encode(
        x=alt.X('Wins:Q', title='Number of Wins'),
        y=alt.Y('HomeTeam:N', sort='-x', title='Team'),
        tooltip=['HomeTeam', 'Wins']
    ).properties(title='Total Wins per Team').interactive()

def plot_cards():
    """Plots yellow and red cards per team."""
    cards = data.group_by("HomeTeam").agg(
        pl.col("H Yellow").sum().alias("Yellow Cards"),
        pl.col("H Red").sum().alias("Red Cards")
    )
    
    melted_cards = cards.melt(id_vars=['HomeTeam'], variable_name='Card Type', value_name='Count')
    
    return alt.Chart(melted_cards.to_pandas()).mark_bar().encode(
        x=alt.X('Count:Q', title='Number of Cards'),
        y=alt.Y('HomeTeam:N', sort='-x', title='Team'),
        color=alt.Color('Card Type:N', title="Type of Card"),
        tooltip=['HomeTeam', 'Card Type', 'Count']
    ).properties(title='Yellow & Red Cards per Team').interactive()

def plot_goals_scored_vs_conceded():
    """Plots total goals scored vs goals conceded per team."""
    goals = data.group_by("HomeTeam").agg(
        pl.col("FTH Goals").sum().alias("Total Goals Scored"),
        pl.col("FTA Goals").sum().alias("Total Goals Conceded")
    )
    
    return alt.Chart(goals.to_pandas()).mark_circle(size=80).encode(
        x=alt.X('Total Goals Scored:Q', title='Total Goals Scored'),
        y=alt.Y('Total Goals Conceded:Q', title='Total Goals Conceded'),
        size="Total Goals Scored",
        color=alt.Color('HomeTeam:N', legend=None),
        tooltip=['HomeTeam', 'Total Goals Scored', 'Total Goals Conceded']
    ).properties(title='Goals Scored vs Goals Conceded per Team').interactive()


### Line Chart (Goals per Season): This chart was chosen to visualize trends in goal-scoring across seasons, helping to identify increases or declines over time.

### Scatter Plot (Goals Scored vs. Goals Conceded): A scatter plot effectively shows the relationship between goals scored and conceded, helping identify teams with strong offensive or defensive performances.

In [5]:
plot_goals_per_season() | plot_goals_scored_vs_conceded()

### Bar Chart (Wins per Team): A bar chart effectively displays categorical data, making it easy to compare teams based on their win counts.

### Stacked Bar Chart (Yellow & Red Cards per Team): This format clearly differentiates between the types of disciplinary actions teams have received.

In [6]:
plot_cards()|plot_team_wins()

  melted_cards = cards.melt(id_vars=['HomeTeam'], variable_name='Card Type', value_name='Count')
  wins = data.filter(pl.col("FT Result") == "H").group_by("HomeTeam").agg(pl.count().alias("Wins"))


### Final Evaluation Approach

To evaluate the effectiveness of the visualizations, I recruited three individuals: a classmate, a colleague with a casual interest in soccer, and a friend who is an avid soccer fan. The evaluation followed this procedure:

- Participants were given access to the visualizations and a brief description of their purpose.

- They were asked to explore the interactive elements and interpret key trends.

- Feedback was collected regarding clarity, ease of use, and insights gained.

### Results:

- All participants found the visualizations intuitive and easy to navigate.

- The line chart was praised for showing season-wide trends clearly.

- The scatter plot was well-received, but one participant suggested adding a color legend for better team identification.

- Some feedback suggested adding filters to allow users to focus on specific seasons or teams.



### Synthesis of Findings and Future Refinements

#### What Worked Well:

- Interactivity enhanced user engagement and allowed for deeper exploration.

- The choice of charts effectively conveyed key insights.

- Tooltips provided additional information, making the data more accessible.

#### Areas for Improvement:

- Implementing dropdown filters to allow users to select specific teams or seasons.

- Improving the color scheme and legends for better differentiation between teams.

- Adding annotations to highlight key events or trends in the data.