# Multi-Barcharts in Pandas

In this notebook, we'll explore how to generate multi-barcharts using pandas. These are especially useful when we want to compare the average values of multiple numerical variables across different categories. Let's prep the page by importing the pandas library:

In [None]:
import pandas as pd

## Load the dataset
Load and preview the sample dataset with the code below:

In [2]:
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv'
df = pd.read_csv(url)
df.head()

## Non-Stacked Multi-Barchart

We'll start by creating a multi-barchart that is not stacked. This type of plot is useful when you want to compare absolute values across categories. Here we're comparing the average horsepower and miles per gallon (mpg) for cars from different origins.

In [None]:
(pd
 .pivot_table(data = df, 
              index = ["origin"], 
              values = ['horsepower', 'mpg'], 
              aggfunc = 'mean' )
 .plot
 .bar(stacked = False))

## Stacked Multi-Barchart

Now, let's create a stacked multi-barchart. This type of plot is useful when you want to compare the proportions of sub-categories across different main categories. Here we compare the total number of cars per region per year (after 1975) that had more than 100 horsepower. This time, however, we're stacking the bars. From this chart, you can immediately see how dominant US cars were when it came to high horsepower - at least until the second oil crisis hit the industry in the 1980s.

In [None]:
(pd.pivot_table(
    df.query('model_year >= 75').assign(hp_is_above_100=df['horsepower'] > 100),
    index=['model_year', 'origin'],
    values='hp_is_above_100',
    aggfunc='sum')
 .unstack()
 .plot
 .bar(stacked=True)
 .legend(loc='center left', bbox_to_anchor=(1.0, 0.5)))

In addition to displaying absolute values, multi-barcharts are also a great tool for comparing relative proportions (i.e., percentages). In this case, we first need to calculate the percentage of each category within the group `model_year`.

To do that, we can use the `div` function to divide the counts of each 'origin' by the total counts within each `model_year`. This results in a dataframe where the values represent the percentage of each `origin` within each `model_year`:

In [None]:
pivot = pd.pivot_table(
    df.query('model_year >= 75').assign(hp_is_above_100=df['horsepower'] > 100),
    index='model_year',
    columns='origin',
    values='hp_is_above_100',
    aggfunc='sum'
)

# Calculate the percentage of each category within each group
pivot = pivot.div(pivot.sum(axis=1), axis=0) * 100

# Plot
pivot.plot(kind='bar', stacked=True).legend(loc='center left', bbox_to_anchor=(1.0, 0.5))


Congratulations! You have learned how to use multi-barcharts to visualize and compare multiple numeric variables across different categories. Multi-barcharts are a powerful tool in your data visualization arsenal!