# Bar Charts

In this tutorial we will visualize medal counts from the 2021 Summer Olympics in Tokyo, Japan. 
* The data for this tutorial is available at https://www.kaggle.com/datasets/berkayalan/2021-olympics-medals-in-tokyo
* Tutorial is partly adapted from https://www.youtube.com/watch?v=trMfzrun9FA

To get started we will import pandas and pyplot, and read in the data set.

In [None]:
# imports

import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# read in data

df = pd.read_csv("tokyo_medals_2021.csv")
df.head()

We can use the `plot` method from pandas to create an initial bar chart to explore the data:

In [None]:
# Initial chart

df.plot.bar("Country", "Total")

<b>What issues does this chart have?</b>

Some problems include:
* Too many countries 
* Can't read labels 
* Most of the picture is blank

Another thing to consider:  what story are we trying to tell?

## Question:  Which countries received the most medals?

Suppose we want to meausre a country's success by their total medal count, and compare how various countries performed in the Olympics. For now, let's just focus on the top 10 countries by medal count:

In [None]:
top_ten = df["Rank By Total"] <= 10  # Filtering criterion
new_df = df[top_ten]  # Create new data frame based on filtering criterion

new_df.plot.bar("Country", "Total")

<b>One problem</b>:  the country labels are hard to read! Let's rotate the chart using `barh`:

In [None]:
new_df.plot.barh("Country", "Total")

So far so good, but to make things easier we will now transfer over to using a `matplotlib` subfigure (yes, this can all be called directly from Pandas, but might be harder to read!). The formatting will look a little different:
* `plt.subplots()` is another way of generating Axes objects, and can be used to create many of them at once
* https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html

In [None]:
fig, ax = plt.subplots()
bars = ax.barh(new_df["Country"], new_df["Total"])

plt.show()

<b>Another problem</b>: it would be nice to see totals in sorted order, descending from greatest to least. We can sort the data frame and replot:

In [None]:
sorted_df = new_df.sort_values(by="Total") # will sort the entire dataframe by the values in the Total column

fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"])

plt.show()

### Chart Junk

This looks much better already! Now for some additional cleanup:

<b>Remove borders</b>: let's take off the chart borders on the bottom, top, and right side using `ax.spines`:

In [None]:
fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"])
ax.spines[["bottom", "top", "right"]].set_visible(False)

plt.show()

<b>Remove ticks</b>: Additionally, we will remove the ticks on the x-axis. We will label the individual bars later with their specific medal totals for easier reading:

In [None]:
fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"])
ax.spines[["bottom", "top", "right"]].set_visible(False)
ax.xaxis.set_visible(False)

plt.show()

To add in bar labels, use `ax.bar_label`:

In [None]:
fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"])
ax.spines[["bottom", "top", "right"]].set_visible(False)
ax.xaxis.set_visible(False)
ax.bar_label(bars)

plt.show()

Much better! Now we can see the specific medal count for each country. 

We can do some additional formatting to the bar labels. Some useful parameters include:
* `padding`: determines horizontal location
* `color`: for text color
* `fontsize`

See https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.bar_label.html for additional parameters:

In [None]:
fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"])
ax.spines[["bottom", "top", "right"]].set_visible(False)
ax.xaxis.set_visible(False)
ax.bar_label(bars, padding=-30, color="white", fontsize=12, fontweight="bold")

plt.show()

Note that the bar labels are left-aligned, so getting the 113 to show at the top necessarily introduces a space to the right on all the other numbers. We can fix that using `fmt`:

In [None]:
fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"])
ax.spines[["bottom", "top", "right"]].set_visible(False)
ax.xaxis.set_visible(False)
ax.bar_label(bars, padding=-30, color="white", fontsize=12, fontweight="bold", 
             fmt=lambda x: "  " + str(int(x)) if x < 100 else int(x))

plt.show()

### Adding Color

Since the 2021 olympics were hosted in Tokyo, let's highlight Japan to make it stand out from the rest. 

We can add a new column to our dataframe specifying which color to apply to each row, and then add a `color` parameter to the `ax.barh` method call:

In [None]:
# Add new column to dataframe

sorted_df["colors"] = sorted_df["Country"].apply(lambda x: "red" if x == "Japan" else "darkgray")

fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"], color = sorted_df["colors"])
ax.spines[["bottom", "top", "right"]].set_visible(False)
ax.xaxis.set_visible(False)
ax.bar_label(bars, padding=-30, color="white", fontsize=12, fontweight="bold", 
             fmt=lambda x: "  " + str(int(x)) if x < 100 else int(x))

plt.show()

More about matplotlib colors can be found here:  https://matplotlib.org/stable/gallery/color/named_colors.html



### Adding Countries

Now suppose we want to add more than just the top ten countries. How will this affect our chart? Let's grab all the countries that received at least ten medals:


In [None]:
ten_medals = df["Total"] >= 10  # Filtering criterion
new_df = df[ten_medals]  # Create new data frame based on filtering criterion

sorted_df = new_df.sort_values(by="Total") # will sort the entire dataframe by the values in the Total column

sorted_df["colors"] = sorted_df["Country"].apply(lambda x: "red" if x == "Japan" else "darkgray")

fig, ax = plt.subplots()
bars = ax.barh(sorted_df["Country"], sorted_df["Total"], color = sorted_df["colors"])
ax.spines[["bottom", "top", "right"]].set_visible(False)
ax.xaxis.set_visible(False)
ax.bar_label(bars, padding=-30, color="white", fontsize=12, fontweight="bold", 
             fmt=lambda x: "  " + str(int(x)) if x < 100 else int(x))

plt.show()


Now the chart is getting a bit crammed again! One way to fix this is to adjust the figure size in `subplots`:
* Default `figsize` is 6.4 inches by 4.8 inches (width, height)
* https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html

We can also adjust the font size and the padding on the bar labels:

In [None]:
fig, ax = plt.subplots(figsize=(6.4, 6.4))

bars = ax.barh(sorted_df["Country"], sorted_df["Total"], color = sorted_df["colors"])
ax.spines[["bottom", "top", "right"]].set_visible(False)
ax.xaxis.set_visible(False)
ax.bar_label(bars, padding=-25, color="white", fontsize=10, fontweight="bold", 
             fmt=lambda x: "  " + str(int(x)) if x < 100 else int(x))

plt.show()