In [1]:
import pandas as pd
import altair as alt
import plotly.express as px
import plotly.io as pio

In [2]:
df = pd.read_csv("top5-players.csv")

In [3]:
df.columns

Index(['Rk', 'Player', 'Nation', 'Pos', 'Squad', 'Comp', 'Age', 'Born', 'MP',
       'Starts', 'Min', '90s', 'Gls', 'Ast', 'G+A', 'G-PK', 'PK', 'PKatt',
       'CrdY', 'CrdR', 'xG', 'npxG', 'xAG', 'npxG+xAG', 'PrgC', 'PrgP', 'PrgR',
       'Gls_90', 'Ast_90', 'G+A_90', 'G-PK_90', 'G+A-PK_90', 'xG_90', 'xAG_90',
       'xG+xAG_90', 'npxG_90', 'npxG+xAG_90'],
      dtype='object')

In [4]:
fig = px.bar(df, x = "Age", y = "Gls_90", title = "Goals per 90 min vs Player Age", 
                 hover_data = ["Player"])
fig.show()

In [5]:
# interactivity
alt.Chart(df).mark_bar().encode(
    alt.X("Pos"),
    alt.Y("Gls_90:Q"),
).properties(
    title = "Position vs Goals per 90 Min"
)

In [6]:
# filter it, get rid of foreigners
for league in df["Comp"].unique():
    temp = df[df["Comp"] == league]
    fig3 = alt.Chart(temp).mark_bar().encode(
        alt.X("Nation"),
        alt.Y("G+A_90:Q"),
    ).properties(
        title = f"{league} Goal and Assit Contributions per Game by Nationality"
    )
    display(fig3)

* At least three static data visualization, each with clear title, labels, legend (if need) and a paragraph to explain the takeaway of this visualization. (30 pts)

In [12]:
fig1 = px.scatter(
    df, 
    x="Age", 
    y="Gls_90", 
    title="Goals per 90 Minutes by Age",
    labels={"Age": "Player Age", "Gls_90": "Goals per 90 Minutes"},
    color="Age",
    hover_data=["Player", "Comp"]
)
fig1.update_layout(
    xaxis_title="Player Age",
    yaxis_title="Goals per 90 Minutes",
    showlegend=False
)
fig1.show()
pio.write_html(fig1, file="fig1_diego.html", auto_open=False)

This visualization highlights the relationship between player age and goals per 90 minutes. It reveals that the majority of players fall within the age range of 20 to 30, which also corresponds to the period of peak performance for scoring goals. Players in this range exhibit higher variability in goals per 90 minutes, with a few outliers achieving exceptional goal-scoring rates of over 4 or even 6 goals per 90 minutes. In contrast, younger players (under 20) and older players (over 35) tend to have lower goal-scoring rates, clustering around 0–1 goals per 90 minutes. This suggests that players in their mid-20s to early-30s are at their most productive stage, while performance tends to decline with age or is limited by inexperience at younger ages. This analysis underscores the importance of age in influencing performance but also suggests the presence of exceptional outliers who defy these general trends.

In [13]:
top_scorers = df.nlargest(10, "Gls_90")  # Select top 10 players based on Gls_90
fig2 = px.bar(
    top_scorers, 
    x="Player", 
    y="Gls_90", 
    title="Top Goal Scorers (Goals per 90 Minutes)",
    labels={"Player": "Player", "Gls_90": "Goals per 90 Minutes"},
    color="Player",
    hover_data=["Nation", "Comp", "Squad"]
)
fig2.update_layout(
    xaxis_title="Player",
    yaxis_title="Goals per 90 Minutes",
    showlegend=False
)
fig2.show()
pio.write_html(fig2, file="fig2_diego.html", auto_open=False)


This visualization highlights the top goal scorers in terms of goals per 90 minutes among players across various leagues. The two highest-performing players, Federico Di Francesco and Miloš Pantović, both achieved an exceptional scoring rate of nearly 6 goals per 90 minutes, significantly outperforming others. The next highest performers, such as Chaka Traorè and Nolan Mbemba, show a noticeable drop, scoring around 3.5–4 goals per 90 minutes. The remaining players on the list have lower, but still commendable, scoring rates between 2.5 and 3 goals per 90 minutes. This chart underscores the standout performance of the top two scorers while illustrating the diminishing gap between other players' scoring rates. It also reflects that, while some players are highly efficient in their time on the pitch, such performance is rare and exceptional.

In [14]:
top_by_nation = df.groupby("Nation", as_index=False)["Gls_90"].mean().nlargest(10, "Gls_90")
fig3 = px.bar(
    top_by_nation, 
    x="Nation", 
    y="Gls_90", 
    title="Top Scorers in Top 5 Leagues by Nation",
    labels={"Nation": "Nation", "Gls_90": "Average Goals per 90 Minutes"},
    color="Nation",
    hover_data=["Gls_90"]
)
fig3.update_layout(
    xaxis_title="Nation",
    yaxis_title="Average Goals per 90 Minutes",
    showlegend=False
)
fig3.show()
pio.write_html(fig3, file="fig3_diego.html", auto_open=False)

This visualization showcases the top scorers in the top five leagues, grouped by their nations, based on average goals per 90 minutes. Players from Cyprus (CYP) lead with the highest average, exceeding 1.6 goals per 90 minutes, significantly outperforming all other nations. Congo (CGO) and Zambia (ZAM) follow, with averages close to 1.2 and 1.0 goals per 90 minutes, respectively. Other nations, such as Grenada (GRN), Canada (CAN), and Egypt (EGY), have lower averages, ranging from 0.4 to 0.8 goals per 90 minutes. This analysis emphasizes the standout performance of players from smaller footballing nations, such as Cyprus, who exhibit extraordinary efficiency in scoring, possibly reflecting the dominance of a few key players within these nations. It also highlights the diversity in scoring ability across a wide range of countries.