# Bar Charts

The data for this tutorial is available at https://www.kaggle.com/datasets/berkayalan/2021-olympics-medals-in-tokyo

In [None]:
# imports

import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# read in data

df = pd.read_csv("tokyo_medals_2021.csv")

## Grouped Bar Charts

In the previous tutorial, we compared each country's performance based on total medal count. Another question to ask is how each country's total medal count breaks down into gold, silver, and bronze medals. Recall that this information is included in the data table:

In [None]:
df.head()

Let's grab the top ten countries by total medal count, and visualize their medal categories. Here's what `pandas` gives us straight out of the box:

In [None]:
top_ten = df["Rank By Total"] <= 10  # Filtering criterion
new_df = df[top_ten]  # Create new data frame based on filtering criterion

new_df.plot.bar("Country", ["Gold Medal", "Silver Medal", "Bronze Medal"])
plt.show()

We can apply some of our tricks from before:
* Rotate the chart
* Sort by total medal count
* Maybe even make the bar colors match the medal type!

In [None]:
sorted_df = new_df.sort_values(by="Total") # will sort the entire dataframe by the values in the Total column

sorted_df.plot.barh("Country", ["Gold Medal", "Silver Medal", "Bronze Medal"], color = ["gold", "silver", "#CD7F32"])
plt.show()

Infuriatingly, the default Pandas plot puts Gold on the bottom with Bronze on top. Reordering the inputs will affect the order of the legend, so we will need to retrieve the handles and labels and reverse them:

In [None]:
sorted_df = new_df.sort_values(by="Total") # will sort the entire dataframe by the values in the Total column

ax = sorted_df.plot.barh("Country", ["Bronze Medal", "Silver Medal", "Gold Medal"], color = ["#CD7F32", "silver", "gold"])

h, l = ax.get_legend_handles_labels()
ax.legend(h[::-1], l[::-1]) # reverse order of legend

plt.show()

In order to create the same plot using matplotlib, we will need to specify the location of each bar on the y-axis, plus the thickness of each bar. Note that the y-coordinate specifies where the **center** of the bar goes:

In [None]:
fig, ax = plt.subplots()

y_vals_gold = [i + 0.2 for i in range(10)]
y_vals_silver = list(range(10))
y_vals_bronze = [i - 0.2 for i in range(10)]

ax.barh(y_vals_gold, sorted_df["Gold Medal"], height=0.2, color = "gold")
ax.barh(y_vals_silver, sorted_df["Silver Medal"], height=0.2, color = "silver")
ax.barh(y_vals_bronze, sorted_df["Bronze Medal"], height=0.2, color = "#CD7F32")

plt.show()

Another good way to specify the y-coordinates is by using the `np.arange` function from numpy:

In [None]:
import numpy as np

fig, ax = plt.subplots()

y_vals_silver = np.arange(10) # similar to the range function
y_vals_gold = y_vals_silver + 0.2 # easy to shift values using numpy arrays
y_vals_bronze = y_vals_silver - 0.2

ax.barh(y_vals_gold, sorted_df["Gold Medal"], height=0.2, color = "gold")
ax.barh(y_vals_silver, sorted_df["Silver Medal"], height=0.2, color = "silver")
ax.barh(y_vals_bronze, sorted_df["Bronze Medal"], height=0.2, color = "#CD7F32")

plt.show()

We can also fix the y-labels, and add in the legend:

In [None]:
import numpy as np

fig, ax = plt.subplots()

y_vals_silver = np.arange(10) # similar to the range function
y_vals_gold = y_vals_silver + 0.2 # easy to shift values using numpy arrays
y_vals_bronze = y_vals_silver - 0.2

ax.barh(y_vals_gold, sorted_df["Gold Medal"], height=0.2, color = "gold", label = "Gold")
ax.barh(y_vals_silver, sorted_df["Silver Medal"], height=0.2, color = "silver", label = "Silver")
ax.barh(y_vals_bronze, sorted_df["Bronze Medal"], height=0.2, color = "#CD7F32", label = "Bronze")

ax.set_yticks(y_vals_silver)
ax.set_yticklabels(sorted_df["Country"])

ax.legend()

plt.show()


## Stacked Bar Charts

Side-by-side categories might be a bit hard for comparison. We can pass in `stacked = True` to `plot.barh` to get them all on one line:

In [None]:
sorted_df.plot.barh("Country", ["Gold Medal", "Silver Medal", "Bronze Medal"], color = ["gold", "silver", "#CD7F32"], stacked = True)


Now we can see the total medal counts again, and we can easily compare which countries received the most gold medals (since they all begin at the axis). 

In order to create this chart directly from matplotlib, we will plot all three bars at the same coordinates on the y-axis. Notice what happens directly out-of-the-box:

In [None]:
fig, ax = plt.subplots()
ax.barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold", label = "Gold")
ax.barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver", label = "Silver")
ax.barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32", label = "Bronze")

ax.legend()

plt.show()

Each subsequent bar is plotted **on top** of the previous bars! One way to fix this is add the values of the other medal counts into the subcategories. So if we want to plot gold, then silver, then bronze, we will first 
* Plot the value of bronze equal to the total medal count, then
* Plot silver = number of golds plus number of silvers, and finally
* Plot the gold medals with its own value on top of the rest

That way the bars left on the graph will match the actual value of each subcategory:

In [None]:
fig, ax = plt.subplots()
ax.barh(sorted_df["Country"], sorted_df["Bronze Medal"] + sorted_df["Silver Medal"] + sorted_df["Gold Medal"], color = "#CD7F32", label = "Bronze")
ax.barh(sorted_df["Country"], sorted_df["Silver Medal"] + sorted_df["Gold Medal"], color = "silver", label = "Silver")
ax.barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold", label = "Gold")

ax.legend()

plt.show()

We can adjust the bar thickness and order of the legend as desired (default height is 0.8):

In [None]:
fig, ax = plt.subplots()
ax.barh(sorted_df["Country"], sorted_df["Bronze Medal"] + sorted_df["Silver Medal"] + sorted_df["Gold Medal"], height = 0.6, color = "#CD7F32", label = "Bronze")
ax.barh(sorted_df["Country"], sorted_df["Silver Medal"] + sorted_df["Gold Medal"], height = 0.6, color = "silver", label = "Silver")
ax.barh(sorted_df["Country"], sorted_df["Gold Medal"], height = 0.6, color = "gold", label = "Gold")

h, l = ax.get_legend_handles_labels()
ax.legend(h[::-1], l[::-1]) # reverse order of legend

plt.show()

## Multiple Bar Charts

The stacked bar chart does show the total medal counts, but makes it difficult to compare categories that are not aligned along an axis. We can apply the **small multiples** approach for plotting each category in a separate chart:

In [None]:
fig, ax = plt.subplots(1, 3)
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")
plt.show()


**Notice that the values along the x-axis differ on each chart!** We need to make sure the scale is the same across all images:

In [None]:
fig, ax = plt.subplots(1, 3)
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")

print(ax[0].get_xlim())
print(ax[1].get_xlim())
print(ax[2].get_xlim())

plt.show()


We will set the x-axis to range from 0 to 43.05 on each chart. Notice how this shrinks the length of the bars on the bronze chart:

In [None]:
fig, ax = plt.subplots(1, 3)
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")

ax[0].set_xlim(0, 43.05)
ax[1].set_xlim(0, 43.05)
ax[2].set_xlim(0, 43.05)

plt.show()


Now we can remove axis lines and axis labels to yield the following:

In [None]:
fig, ax = plt.subplots(1, 3)
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")

ax[0].set_xlim(0, 43.05)
ax[1].set_xlim(0, 43.05)
ax[2].set_xlim(0, 43.05)

ax[0].spines[['top', 'right', 'bottom']].set_visible(False)
ax[1].spines[['top', 'right', 'bottom']].set_visible(False)
ax[2].spines[['top', 'right', 'bottom']].set_visible(False)

ax[0].set_xticks([])
ax[1].set_xticks([])
ax[2].set_xticks([])

ax[1].set_yticks([])
ax[2].set_yticks([])

plt.show()


Finally, add in chart labels and titles

In [None]:
fig, ax = plt.subplots(1, 3)
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")

ax[0].set_xlim(0, 43.05)
ax[1].set_xlim(0, 43.05)
ax[2].set_xlim(0, 43.05)

ax[0].spines[['top', 'right', 'bottom']].set_visible(False)
ax[1].spines[['top', 'right', 'bottom']].set_visible(False)
ax[2].spines[['top', 'right', 'bottom']].set_visible(False)

ax[0].set_xticks([])
ax[1].set_xticks([])
ax[2].set_xticks([])

ax[1].set_yticks([])
ax[2].set_yticks([])

ax[0].text(21.525, 10, "Gold", horizontalalignment="center", color="gold")
ax[1].text(21.525, 10, "Silver", horizontalalignment="center", color="silver")
ax[2].text(21.525, 10, "Bronze", horizontalalignment="center", color="#CD7F32")

plt.suptitle("Tokyo 2020:  top ten countries by total medal count")

plt.show()


One more thing we can do is add a **total** column in our small multiples chart:

In [None]:
fig, ax = plt.subplots(1, 4)
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")
ax[3].barh(sorted_df["Country"], sorted_df["Total"])

ax[0].set_xlim(0, 43.05)
ax[1].set_xlim(0, 43.05)
ax[2].set_xlim(0, 43.05)

for i in range (4):
    ax[i].spines[['top', 'right', 'bottom']].set_visible(False)
    ax[i].set_xticks([])
    if not i == 0:
        ax[i].set_yticks([])

ax[0].text(21.525, 10, "Gold", horizontalalignment="center", color="gold")
ax[1].text(21.525, 10, "Silver", horizontalalignment="center", color="silver")
ax[2].text(21.525, 10, "Bronze", horizontalalignment="center", color="#CD7F32")
ax[3].text(62, 10, "Total", horizontalalignment="center", color="tab:blue")

plt.suptitle("Tokyo 2020:  top ten countries by total medal count")

plt.show()


Note that the the largest total medal count is 113, whereas the largest value in a subcategory (silver medals) is just 41. Let's adjust the x-scale to be the same across all charts:

In [None]:
fig, ax = plt.subplots(1, 4)
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")
ax[3].barh(sorted_df["Country"], sorted_df["Total"])

x_total = 115

for i in range (4):
    ax[i].set_xlim(0, x_total)
    ax[i].spines[['top', 'right', 'bottom']].set_visible(False)
    ax[i].set_xticks([])
    if not i == 0:
        ax[i].set_yticks([])

ax[0].text(x_total / 2, 10, "Gold", horizontalalignment="center", color="gold")
ax[1].text(x_total / 2, 10, "Silver", horizontalalignment="center", color="silver")
ax[2].text(x_total / 2, 10, "Bronze", horizontalalignment="center", color="#CD7F32")
ax[3].text(x_total / 2, 10, "Total", horizontalalignment="center", color="tab:blue")

plt.suptitle("Tokyo 2020:  top ten countries by total medal count")

plt.show()


There is a lot of unused space between the first three charts (gold, silver, and bronze). We can **halve** the scale there to reduce wasted space, but then we will need to **double** the width of the total chart (set by `width_ratios` in the call to `subplots`). The imporant thing is to keep the spacing between the charts the same, and preserve the meaning of the bar lengths across all figures:

In [None]:
fig, ax = plt.subplots(1, 4, width_ratios=[1,1,1,2]) # final plot is twice as wide as the others
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")
ax[3].barh(sorted_df["Country"], sorted_df["Total"])

x_total = 115

for i in range (4):
    ax[i].set_xlim(0, x_total / 2)
    ax[i].spines[['top', 'right', 'bottom']].set_visible(False)
    ax[i].set_xticks([])
    if not i == 0:
        ax[i].set_yticks([])
    if i == 3:
        ax[i].set_xlim(0, x_total)


ax[0].text(x_total / 4, 10, "Gold", horizontalalignment="center", color="gold")
ax[1].text(x_total / 4, 10, "Silver", horizontalalignment="center", color="silver")
ax[2].text(x_total / 4, 10, "Bronze", horizontalalignment="center", color="#CD7F32")
ax[3].text(x_total / 2, 10, "Total", horizontalalignment="center", color="tab:blue")

plt.suptitle("Tokyo 2020:  top ten countries by total medal count")

plt.show()


Using 115 as the max x value in the total chart, we see that 115 is just over 2.8 times the max number of silver medals. We can adjust the chart again to more closely align to this value:

In [None]:
fig, ax = plt.subplots(1, 4, width_ratios=[1,1,1,2.8]) # final plot is twice as wide as the others
ax[0].barh(sorted_df["Country"], sorted_df["Gold Medal"], color = "gold")
ax[1].barh(sorted_df["Country"], sorted_df["Silver Medal"], color = "silver")
ax[2].barh(sorted_df["Country"], sorted_df["Bronze Medal"], color = "#CD7F32")
ax[3].barh(sorted_df["Country"], sorted_df["Total"])

x_total = 115
x_reduced = x_total / 2.8

for i in range (4):
    ax[i].set_xlim(0, x_reduced)
    ax[i].spines[['top', 'right', 'bottom']].set_visible(False)
    ax[i].set_xticks([])
    if not i == 0:
        ax[i].set_yticks([])
    if i == 3:
        ax[i].set_xlim(0, x_total)


ax[0].text(x_reduced / 2, 10, "Gold", horizontalalignment="center", color="gold")
ax[1].text(x_reduced / 2, 10, "Silver", horizontalalignment="center", color="silver")
ax[2].text(x_reduced / 2, 10, "Bronze", horizontalalignment="center", color="#CD7F32")
ax[3].text(x_total / 2, 10, "Total", horizontalalignment="center", color="tab:blue")

plt.suptitle("Tokyo 2020:  top ten countries by total medal count")

plt.show()
