## Why tidy data is useful for exploratory analysis

- Visualizing individuals, distributions or aggregations of numerical measures
- Splitting by categorical variables
    - separating subsets spatially along an axis, 
    - distinguishing by color,
    - or making separate plots in columns or rows
    
*The tips dataset is really nice for exploring differences between numerical values and distributions across a population distinguished by lots of categorical variables.*

In [None]:
import seaborn as sns
sns.set_style("whitegrid")

tips = sns.load_dataset("tips")

tips.head(10)

### Individual variables & distributions

The most basic form of exploration is to visualize the distribution of values in a numerical column. A histogram is the most classic, but there are some nice alternatives for smaller data sets.

#### Swarm plot 

One interesting alternative is a `swarmplot()`. Points are stacked at their data value rather than overlapping. This doesn't scale very well to huge datasets, but with small data it's nice to see each individual point as a mark.

In [None]:
ax = sns.swarmplot(y="total_bill", data=tips)

### Splitting by a categorical variable

Now we can start seeing the power of splitting / subsetting the data (in space and/or color) by the values of a categorial variable. Here we split in space by "day". 

*(It's not clear why, but Seaborn's default is to also vary the color for each day, so I'm forcing it to all one color.)*

Colors can be specified through 
[RBGA values](https://matplotlib.org/users/colors.html), or 
[names](https://python-graph-gallery.com/100-calling-a-color-with-seaborn/), or they will be sequentially chosen from the default or specified
[color palette](https://seaborn.pydata.org/tutorial/color_palettes.html).

In [None]:
ax = sns.swarmplot(x="day", y="total_bill", color='grey', data=tips)

#### Splitting by space and hue

Then we can also split by hue at the same time, either mixed together

In [None]:
ax = sns.swarmplot(x="day", y="total_bill", hue="sex", data=tips)

or in separate strips with "dodge"

In [None]:
ax = sns.swarmplot(x="day", y="total_bill", hue="sex", dodge=True, data=tips)

#### More complex splits

Then with other plot types we can split and aggregate (here calculating means and confidence intervals) in even more complex ways.

**If we the data wasn't tidy, we wouldn't have this flexibility to split or aggregate numerical variables by categorical ones!**

In [None]:
ax = sns.catplot(x="day", y="total_bill", hue="sex", 
                 kind="point", col="smoker", dodge=True,
                 data=tips)