# <b>boxplot<b>

Boxplots in <b><font color="blue" style="font-family:'Courier New'">fivecentplots </font></b> are modeled after the "Variability Chart" in JMP.  Data can be broken into multiple subsets for easy visualization by simply listing the DataFrame column names of interest in the <font style="font-family:'Courier New'">groups</font> keyword.

## Setup

### Imports

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import sys
sys.path = [r'C:\GitHub\fivecentplots'] + sys.path
import fivecentplots as fcp
import pandas as pd
import numpy as np
import os, sys, pdb
osjoin = os.path.join
st = pdb.set_trace
fcp

### Sample data

Read some fake boxplot data

In [None]:
df = pd.read_csv(osjoin(os.path.dirname(fcp.__file__), 'tests', 'fake_data_box.csv'))
df.head()

### Set theme

Optionally set the design theme

In [None]:
#fcp.set_theme('gray')
#fcp.set_theme('white')

### Other

In [None]:
SHOW = False

## Groups

Consider the following boxplot of made-up data:

In [None]:
fcp.boxplot(df=df, y='Value', show=SHOW)

### Single group

Rather than lumping the data into a single box, we can separate them into categories to get more information.  First, set a single group column of "Batch":

In [None]:
fcp.boxplot(df=df, y='Value', groups='Batch', show=SHOW, legend='Sample')

### Multiple groups

We can dive deeper by specifying more than one value for <font style="font-family:'Courier New'">groups</font>:

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], show=SHOW)

### Groups + legend

Boxplots also support legending for another level of visualization:

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], legend='Region', show=SHOW)

## Grid plots

Like the <font style="font-family:'Courier New'">plot</font> function, boxplots can be broken into subplots based on "row" and/or "col" values or "wrap" values.

### Column plot

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], row='Region', show=SHOW, ax_size=[300, 300])

### Wrap plot

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Sample', 'Region'], wrap='Batch', show=SHOW, ax_size=[300, 300])

## Other options

### Stat line

In addition to displaying boxes with a median line and interquartile ranges, a connecting line can be drawn between boxes at some statistical value.  By default, the line connects the mean value of each distribution but other DataFrame stat values can be selected.  The stat line accepts the typical styling keywords of any line object with the prefix `box_stat_line_` (i.e., `box_stat_line_color` or `box_stat_line_width`)

#### Mean

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], show=SHOW, box_stat_line='mean', ax_size=[300, 300])

#### Median

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], show=SHOW, box_stat_line='median', ax_size=[300, 300])

#### Std dev

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], show=SHOW, box_stat_line='std', ax_size=[300, 300])

### Dividers

Using the keyword `box_divider`, lines can be drawn on the boxplot to visually segrate main groups of boxes.  These lines are enabled by default but can be turned off easily:

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], show=SHOW, box_divider=False, ax_size=[300, 300])

### Range lines

Because outlier points by definition fall outside of the span of the box, we can draw lines that span the entire range of the data.  This is particularly useful to indicate when there are data points that fall outside of the limits of the y-axis.  These lines are enabled by default but can be disabled or styled through keywords with the prefix `box_range_lines_`:

In [None]:
fcp.boxplot(df=df, y='Value', groups=['Batch', 'Sample'], show=SHOW, box_range_lines=False, ax_size=[300, 300])