# Analyze Tokyo 2020 Olympics Medals Rating

Hello everyone! In this notebook we will analyze Tokyo 2021 Olympics Medals Rating dataset. So, let`s start.

# 1) Import Libraries and Load Data

Firstly, lets import all useful libraries. Secondly, load data.

In [None]:
import os

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

plt.style.use("ggplot")

In [None]:
# Load Data
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
PATH = "/kaggle/input/olympic-games-2021-medals/Tokyo 2021 dataset.csv"
data = pd.read_csv(PATH)

# 2) Fast looking on data
Let`s see head of our data frame, list of columns, size and nan/null values in this dataset.

In [None]:
data

In [None]:
data.info()

In [None]:
print(f"There are {data.shape[0]} rows in dataframe.")
print(f"And {data.shape[1]} columns.")

print("\n")

print(f"Columns: {data.columns}")

print("\n")

print(f"Percentage of Null values: \n {data.isnull().sum() / data.shape[0]}")
print("\n")
print(f"Percentage of NA values: \n {data.isna().sum() / data.shape[0]}")

Here we can see that:
* There are 93 rows and 8 columns ('Rank' - Ranks by Gold Medals, 'Team/NOC' - Teams Names, 'Gold Medal', 'Silver Medal', 'Bronze Medal', 'Total', 'Rank by Total' - Ranks by Total count of all Medals, 'NOCCode' - Teams Codes: ROC, USA, CHN and etc.);
* Values are sorted by Rank (Rank by Gold Medals) column; 
* Also, fortunately, there are not NaN or Null values here, so we can skip this thing in cleaning data stage.


# 3) Data preprocessing.

On the second step of this notebook we saw that this dataset does not include NaN, Null values or even some mistakes, so, thats why we do not need in data clearing. But, in my opinion, it is better if we make two datasets: With sorting by Ranks column and With sorting by Total Ranks column. This will help in our future EDA step.  

In [None]:
gold_ranks_data = data

total_ranks_data = data.sort_values(by = "Rank by Total", ascending = True)

In [None]:
gold_ranks_data.head(3)

In [None]:
total_ranks_data.head(3)

# 4) Analyzing.

In this step our goal is Analyze these datasets and find some interesting things.

# 4.1) Review Vis.

**Review visualization of all numeric values in data:**

In [None]:
sns.pairplot(data = gold_ranks_data, corner = True, kind = "scatter", diag_kind = "kde")

# 4.2) Teams Rating.

**Olympic Teams rating.**

In [None]:
labels = total_ranks_data["Team/NOC"]

total = [medal for medal in total_ranks_data["Total"]]
gold_medals = [gold for gold in total_ranks_data["Gold Medal"]]
silver_medals = [silver for silver in total_ranks_data["Silver Medal"]]
bronze_medals = [bronze for bronze in total_ranks_data["Bronze Medal"]]

fig = go.Figure(data = [
    go.Bar(name = 'Gold Medals', x = labels, y = gold_medals, marker_color = "#FFBF00"),
    go.Bar(name = 'Silver Medals', x = labels, y = silver_medals, marker_color = "#8A8686"), 
    go.Bar(name = 'Bronze Medals', x = labels, y = bronze_medals, marker_color= "#874913"),
    go.Bar(name = 'Total Medals Count', x = labels, y = total, marker_color = "blue")])

fig.update_layout(title = "Olympic Teams Total rating", xaxis_title = "Teams", yaxis_title = "Count", legend_title="Legend")

fig.update_layout(barmode = 'group',
    xaxis = dict(rangeslider = dict(visible = True)),
    height = 500)

fig.show()

*Tip: in this plot you can use slider by your computer mouse

**Top 5 teams` Medals count.**

In [None]:
labels = data["Team/NOC"][:5]

total = [medal for medal in data["Total"][:5]]
gold_medals = [gold for gold in data["Gold Medal"][:5]]
silver_medals = [silver for silver in data["Silver Medal"][:5]]
bronze_medals = [bronze for bronze in data["Bronze Medal"][:5]]

fig = go.Figure(data = [
    go.Bar(name = 'Gold Medals', x = labels, y = gold_medals, marker_color = "#FFBF00"),
    go.Bar(name = 'Silver Medals', x = labels, y = silver_medals, marker_color = "#8A8686"), 
    go.Bar(name = 'Bronze Medals', x = labels, y = bronze_medals, marker_color= "#874913"),
    go.Bar(name = 'Total Medals Count', x = labels, y = total, marker_color = "blue"),])

fig.update_layout(title = "Top 5 Teams` Gold Medals Count", xaxis_title = "Teams", yaxis_title = "Count", legend_title="Legend")
fig.update_layout(barmode = "group")

fig.show()

In these two graphs we can see top 5 Olympic Teams in two rating:

1) Gold Medals Rating:

1. USA;
2. People's Republic of China (Chinese olympic team has one gold medal less than the USA);
3. Japan;
4. Great Britain;
5. Russia.
* (Last places Ghana, Grenada, Kuwait, Republic of Moldova, Syrian Arab Republic)

2) Total Medals Rating:

1. USA;
2. People's Republic of China;
3. Russia;
4. Great Britain;
5. Japan.
* (Last places Ghana, Grenada, Kuwait, Republic of Moldova, Syrian Arab Republic)

# 4.3) Total Medals count rating.

**Count of different medals types.**

In [None]:
gold_medals_count = np.sum(total_ranks_data["Gold Medal"])
silver_medals_count = np.sum(total_ranks_data["Silver Medal"])
bronze_medals_count = np.sum(total_ranks_data["Bronze Medal"])

fig = px.pie(labels = ["Gold Medals Count", "Silver Medals Count", "Bronze Medals Count"], values = [gold_medals_count, silver_medals_count, bronze_medals_count],
             names = ["Gold Medals Count", "Silver Medals Count", "Bronze Medals Count"], title = "Count of different medals types Pie Chart",
             hole = 0.1,
             color = ["Gold", "Silver", "Bronze"],
             color_discrete_map = {"Bronze" : "#874913",
                                   "Silver" : "8A8686",
                                   "Gold" : "#FFBF00"})

fig.update_traces(textposition = "inside", textinfo = "label+percent+value", hoverinfo = "label+percent", textfont_size = 13)
fig.update_layout(legend_title = "Legend")

fig.show()

From this pieplot we can see that:

1. A total of 1080 medals were awarded at these Olympic Games.

2. Of these 1080 medals: 
* 340 Gold Medals (31.5%);
* 338 Silver Medals (31.3%);
* 402 Bronze Medals (37.2%).


**Distribution of meanings of Total, Gold, Silver and Bronze medals count.**

In [None]:
noccode = total_ranks_data["Team/NOC"]

gold_medals_count = total_ranks_data["Gold Medal"]
silver_medals_count = total_ranks_data["Silver Medal"]
bronze_medals_count = total_ranks_data["Bronze Medal"]
total_medals = total_ranks_data["Total"]

fig = go.Figure()

fig.add_trace(go.Box(customdata = noccode, y = total_medals, 
                     boxmean = True, boxpoints = "all",
                     name = "Total", marker_color = "blue",
                     hoverinfo = "all", hovertemplate = "Team: %{customdata}; Count: %{y}"))

fig.add_trace(go.Box(customdata = noccode, y = gold_medals_count,
                     boxmean = True, boxpoints = "all",
                     name = "Gold Medals", marker_color = "#FFBF00",
                     hoverinfo = "all", hovertemplate = "Team: %{customdata}; Count: %{y}"))

fig.add_trace(go.Box(customdata = noccode, y = silver_medals_count,
                     boxmean = True, boxpoints = "all",
                     name = "Silver Medals", marker_color = "#8A8686",
                     hoverinfo = "all", hovertemplate = "Team: %{customdata}; Count: %{y}"))

fig.add_trace(go.Box(customdata = noccode, y = bronze_medals_count,
                     boxmean = True, boxpoints = "all",
                     name = "Bronze Medals", marker_color = "#874913",
                     hoverinfo = "all", hovertemplate = "Team: %{customdata}; Count: %{y}"))

fig.update_layout(title = "Boxplot of meanings of Total, Gold, Silver and Bronze medals count", legend_title = "Legend", xaxis_title = "Boxplots", yaxis_title = "Count",
                  yaxis=dict(autorange = True, showgrid = True, zeroline = True, gridcolor = 'rgb(255, 255, 255)', gridwidth = 1, zerolinecolor = 'rgb(255, 255, 255)', zerolinewidth = 2))


fig.show()

Here we can see:

1. Means and Medians of medals count:

* Total count mean is about 11.6 medals, Gold Medals count mean is 3.6, Silver 3.6 and Bronze Medals count mean is 4.3;
* Total count median is 4, Gold Medals count median is 1, Silver medals count median is 1, Bronze 2.

2. Max and Min values in medals count:

* Total count max value is 113, min is 1, Gold Medals count max value is 39, min is 0, Silver max is 41, min is 0 and Bronze medals count maximum is 33 and minimum is 0.

3. Other math things like quartiles and fences you can see on plot.

# 5) Conclusion.

**Final visualization. Interactive Map.**

In [None]:
total_ranks_data["Team/NOC"][4] = "Russia"

total = total_ranks_data["Total"]
labels = total_ranks_data["Team/NOC"]
ranks = total_ranks_data["Rank"]

 
fig = px.scatter_geo(locations = labels, hover_name = labels, locationmode = 'country names', 
                      size = total, color = labels, 
                      hover_data = {"Gold Medals" : gold_medals_count, "Silver Medals" : silver_medals_count, "Bronze Medals" : bronze_medals_count, "Rank" : ranks}, #hover_name = ["Gold Medals", "Silver Medals", "Bronze Medals", "Rank"],
                      projection = "natural earth")

fig.update_layout(title = "Olympic teams rating interactive map.", legend_title = "Legend")

fig.show()

Finally, in this notebook we analyzed Tokyo 2020 Olympics Medals Rating and actually found some interesting facts.

**Thank you everyone who check this notebook. If you like my notebook upvote it and if you dislike, please, write your comments it will help me to improve my skills.**