Filtrera Japan + anonymisera namn med SHA-256

In [27]:
import hashlib
import pandas as pd

athletes = pd.read_csv("assets/athlete_events.csv")
regions = pd.read_csv("assets/noc_regions.csv")

# Slå ihop data för att få regionsnamn
df = athletes.merge(regions, on="NOC", how="left")

# Filtrera ut Japans prestationer ur datan
jpn = df[df["NOC"] == "JPN"].copy()

# SHA-256 hashing av namn
def sha256_hash(x):
    if pd.isna(x):
        return None
    return hashlib.sha256(str(x).encode("utf-8")).hexdigest()

jpn["Name_hash"] = jpn["Name"].apply(sha256_hash)

# Ta bort originalnamnet ur det som publiceras
jpn = jpn.drop(columns=["Name"])

jpn[["Name_hash", "Sex", "Age", "Sport", "Year", "Season", "Medal"]].head()


Unnamed: 0,Name_hash,Sex,Age,Sport,Year,Season,Medal
625,cc4d28e52d69f2daad3f9598ba6a82df12a4c664cb36d8...,M,24.0,Athletics,1936,Summer,
626,55ce63306a0ae7a626034481bd985157246dbed736785f...,M,24.0,Bobsleigh,1972,Winter,
627,55ce63306a0ae7a626034481bd985157246dbed736785f...,M,24.0,Bobsleigh,1972,Winter,
628,55ce63306a0ae7a626034481bd985157246dbed736785f...,M,28.0,Bobsleigh,1976,Winter,
629,55ce63306a0ae7a626034481bd985157246dbed736785f...,M,28.0,Bobsleigh,1976,Winter,


Japans medaljer totalt i OS.

In [28]:
jpn_medals = jpn[jpn["Medal"].notna()].copy()

jpn_medals["Medal"].value_counts()


Medal
Bronze    357
Silver    309
Gold      247
Name: count, dtype: int64

Vilka olympiska sporter har Japan fått flest medaljer i?

In [29]:
import plotly.express as px

sport_medals = (
    jpn_medals.groupby("Sport")
    .size()
    .sort_values(ascending=False)
    .head(20)
    .reset_index(name="Antal medaljer")
)

fig = px.bar(
    sport_medals,
    x="Sport",
    y="Antal medaljer",
    title="Japans medaljer i OS uppdelat per sport"
)

fig.update_layout(
    xaxis_title=None,
    yaxis_title=None
)

fig.show()


Japans antal medaljer per OS, uppdelat på vinter och sommar

In [30]:
medals_per_games = (
    jpn_medals.groupby(["Year", "Season"])
    .size()
    .reset_index(name="Antal medaljer")
    .sort_values(["Year", "Season"])
)

fig = px.bar(
    medals_per_games,
    x="Year",
    y="Antal medaljer",
    color="Season",
    barmode="group",
    title="Japan: Antal medaljer per OS utifrån sommar eller vinter OS"
)

fig.update_layout(
    xaxis_title=None,
    yaxis_title=None
)

fig.show()


Histogram över japanska atleters åldrar

In [31]:
fig = px.histogram(
    jpn.dropna(subset=["Age"]),
    x="Age",
    nbins=30,
    title="Japan: Åldersfördelning kvinnor och män"
)

fig.update_layout(
    xaxis_title=None,
    yaxis_title=None
)

fig.show()


Skillnader mellan sporter där kvinnor/män tagit medalj?

In [35]:
sex_sport = (
    jpn_medals.groupby(["Sex", "Sport"])
    .size()
    .reset_index(name="Antal medaljer")
)

top10_by_sex = (
    sex_sport.sort_values(["Sex", "Antal medaljer"], ascending=[True, False])
    .groupby("Sex")
    .head(10)
)

fig = px.bar(
    top10_by_sex,
    x="Sport",
    y="Antal medaljer",
    color="Sex",
    facet_col="Sex",
    title="Japan: Top 10 sporter med medaljer per kön"
)
fig.show()


Japans deltagande i Art Competitions och hur det gått för dom

In [37]:
import pandas as pd
import plotly.express as px

art = jpn[jpn["Sport"] == "Art Competitions"].copy()
art_medals = art[art["Medal"].notna()].copy()

art.shape, art_medals.shape

summary = pd.DataFrame({
    "Kategori": ["Deltaganden", "Medaljer"],
    "Antal": [art.shape[0], art_medals.shape[0]]
})

fig = px.bar(
    summary,
    x="Kategori",
    y="Antal",
    title="Japan i Art Competitions: deltaganden och medaljer"
)

fig.update_layout(
    xaxis_title=None,
    yaxis_title=None,
    showlegend=False
)

fig.show()


Vilka event i Art Competitions gav flest medaljer?

In [None]:
art_event_medals = (
    art_medals.groupby("Event")
    .size()
    .sort_values(ascending=False)
    .head(15)
    .reset_index(name="Antal medaljer")
)

fig = px.bar(
    art_event_medals,
    x="Event",
    y="Antal medaljer",
    title="Japan: Flest medaljer per Art Competition-event (Top 15)"
)
fig.show()


Japans antal deltagare i Art Competitions delat på kvinnor och män

In [None]:
art_participants = (
    art.dropna(subset=["Name_hash"])
    .groupby(["Year", "Sex"])["Name_hash"]
    .nunique()
    .reset_index(name="Antal unika deltagare")
    .sort_values(["Year", "Sex"])
)

fig = px.bar(
    art_participants,
    x="Year",
    y="Antal unika deltagare",
    color="Sex",
    barmode="group",
    title="Japan: Deltagare i Art Competitions per år och kön (unika)"
)
fig.show()
