# 06 - Box Plots

Let's start this lesson as we always do: by importing our libraries and data sets.

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('..//data/fuel-econ.csv')
df.shape

In [None]:
df.head(5)

## Introduction to Box Plots

You saw how violin plots can be used to depict the relationship between a quantitative variable and a qualitative variable. An alternative plot that you might want to use, is the box plot.

Both matplotlib and Seaborn have a method `boxplot`.

In [None]:
# Types of sedan cars
sedan_classes = ['Minicompact Cars', 'Subcompact Cars', 'Compact Cars', 'Midsize Cars', 'Large Cars']

# Returns the types for sedan_classes with the categories and orderedness
# Refer - https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.api.types.CategoricalDtype.html
vclasses = pd.CategoricalDtype(ordered=True, categories=sedan_classes)

# Use pandas.astype() to convert the "VClass" column from a plain object type into an ordered categorical type 
df['VClass'] = df['VClass'].astype(vclasses);

In [None]:
ax1 = sns.boxplot(data=df, x='VClass', y='comb', color='tab:blue')
plt.xticks(rotation=15);
plt.ylim(ax1.get_ylim())

In a box plot, descriptive statistics within each category are computed and shown using a box and whiskers. Each line denotes a quartile range (25th, 50th, and 75th percentiles). Whiskers are extended from the top and bottom of the boxes to indicate the largest and smallest values.

Box plots often have outliers plotted as points beyond the ends of the whiskers. The most common upper bound on the length of the whiskers is 1.5 times the interquartile range or box length.