# Box Plotting
## Learning Goal

After completing this tutorial, the interns should be able to:

- Create, understand, and dissect a box plot for meaning

To explore these, we'll use the `iris` dataset:

| Variable    | Type    | Description           |
|:-------------|:---------|:-----------------------|
| SepalLength | Ratio   | the sepal length (cm) |
| SepalWidth  | Ratio   | the sepal width (cm)  |
| PetalLength | Ratio   | the petal length (cm) |
| PetalWidth  | Ratio   | the petal width (cm)  |
| Species     | Nominal | the flower species    |

<div style="text-align:center;font-size: smaller">
 <b>Source:</b> This dataset was taken from the <a href="https://archive.ics.uci.edu/ml/datasets/iris">UCI Machine Learning Repository library
    </a></div>
<br>

We can calculate central tendency and spread using `pandas dataframes`.
Let's start by importing `pandas`:

- `import pandas as pd`

In [None]:
import pandas as pd

#<xml xmlns="https://developers.google.com/blockly/xml"><variables><variable id="zq]t,,UR`GWj6?Hub9%a">pd</variable></variables><block type="importAs_Python_Python" id="o[3wY[w:R*b$p^Ow/ZZw" x="125" y="352"><field name="libraryName">pandas</field><field name="libraryAlias" id="zq]t,,UR`GWj6?Hub9%a">pd</field></block></xml>

Import plotly, a Python library that produces interactive plots. To use plotly,
- `import plotly.express as px`



In [None]:
import plotly.express as px

#<xml xmlns="https://developers.google.com/blockly/xml"><variables><variable id="Z-4`)[K74~7.ItDZuutK">px</variable></variables><block type="importAs_Python" id="@YS7IWvZu]1zr0E,r1NX" x="80" y="191"><field name="libraryName">plotly.express</field><field name="libraryAlias" id="Z-4`)[K74~7.ItDZuutK">px</field></block></xml>

Load the csv file `"datasets/flower-data-2020.csv"` into a dataframe

In [None]:
dataframe = pd.read_csv('datasets/flower-data-2020.csv')

#<xml xmlns="https://developers.google.com/blockly/xml"><variables><variable id="[V~uW+0L/4GW;45ulv+l">dataframe</variable><variable id="zq]t,,UR`GWj6?Hub9%a">pd</variable></variables><block type="variables_set" id="i*U..F)p9r]#n*e(./*x" x="-9" y="195"><field name="VAR" id="[V~uW+0L/4GW;45ulv+l">dataframe</field><value name="VALUE"><block type="varDoMethod_Python" id=":~$/ssC/m-m@jj8K*O4W"><field name="VAR" id="zq]t,,UR`GWj6?Hub9%a">pd</field><field name="MEMBER">read_csv</field><data>pd:read_csv</data><value name="INPUT"><block type="text" id="~Mspn1jJRE!J8ISd3!#V"><field name="TEXT">datasets/flower-data-2020.csv</field></block></value></block></value></block></xml>

## Parts of a Box Plot 

**Median**
The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less.

**Inter-quartile range**
The middle “box” represents the middle 50% of scores for the group. The range of scores from lower to upper quartile is referred to as the inter-quartile range. The middle 50% of scores fall within the inter-quartile range.

**Upper quartile**
Seventy-five percent of the scores fall below the upper quartile.

**Lower quartile**
Twenty-five percent of scores fall below the lower quartile.

**Whiskers**
The upper and lower whiskers represent scores outside the middle 50%. Whiskers often (but not always) stretch over a wider range of scores than the middle quartile groups.



![image.png](attachment:image.png)

![image.png](attachment:image.png)
[from https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51]

You can create a visualization of the box plot using the five number summary using:

- Get a `with px do box using` block
- Inside that block put a `create list with` block, and inside that block put
    - `dataframe` (from VARIABLES)
    - a freestyle block with `y = "PetalColor"` in it
    - a freestyle block with `x = "Size"` in it

In [None]:
px.box(dataframe, y = "PetalColor", x = "Size")

#<xml xmlns="https://developers.google.com/blockly/xml"><variables><variable id="Z-4`)[K74~7.ItDZuutK">px</variable><variable id="[V~uW+0L/4GW;45ulv+l">dataframe</variable></variables><block type="varDoMethod_Python" id="O^S!`+w2Z,Y0?=;mBp[C" x="8" y="184"><field name="VAR" id="Z-4`)[K74~7.ItDZuutK">px</field><field name="MEMBER">box</field><data>px:box</data><value name="INPUT"><block type="lists_create_with" id=")f7s,%3I40L-=OZ=(k1}"><mutation items="3"></mutation><value name="ADD0"><block type="variables_get" id="3Qp4QGW}@L`_mri7ea(a"><field name="VAR" id="[V~uW+0L/4GW;45ulv+l">dataframe</field></block></value><value name="ADD1"><block type="dummyOutputCodeBlock_Python" id="?]DR[fTYYjaX}L5X/8gc"><field name="CODE">y = "SepalLength"</field></block></value><value name="ADD2"><block type="dummyOutputCodeBlock_Python" id="RRPLnm4@[EZ$$Rlxo8LG"><field name="CODE">x = "SepalWidth"</field></block></value></block></value></block></xml>