# Lab - Data Visualizations with Seaborn & Pandas

## Introduction

In this lab, you will apply your knowledge of the advanced visualization library Seaborn to generate plots that provide insight.

## Objectives

You will be able to:
    
- Create a boxplot using Seaborn
- Label plots with appropriate axis labels and titles
- Create data visualizations with Pandas

## Part I: Seaborn

We will use a randomly generated data set to practice using Seaborn. Begin by running the below code without change.

In [None]:
# CodeGrade step0
# Run this cell without changes

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# The seed must be 42 for the data to replicate
seed = 42

# Data
data = np.random.normal(size=(20, 10)) + np.arange(10) / 2

### Step 1

Create a boxplot and store the object returned in the variable boxplot1.

In [None]:
# CodeGrade step1
# Replace None with your code

boxplot1 = None

### Step 2
Repeat step 1 creating another boxplot, but now also call the boxplot object's set() method in order to set the title and axis labels as follow:
* X axis should be labeled 'X Label'
* y axis should be labeled 'Y Label'
* Title should be labeled 'Example Boxplot'

In [None]:
# CodeGrade step2
# Replace None with your code

boxplot2 = None
boxplot2.set(None)

### Step 3

Repeat step 2, this time also utilizing Seaborn to set the style to be 'darkgrid'. Still include title and axis labels.

In [None]:
# CodeGrade step3
# Replace None with your code

# Set style
None

# Plot
boxplot3 = None
boxplot3.set(None)

### Step 4

Recreate the labeled boxplot that we made in Step 3
* Utilizing Seaborn's context setting, adjust the size and font style of text so that it is more legible for presentations and large screen format
* Use 'poster' from Seaborns preconfigured options



In [None]:
# CodeGrade step4
# Replace None with your code

# Context
None

# Plot
boxplot4 = None
boxplot4.set(None)

### Step 5

You are now going to take a look at the canonical Seaborn Penguins dataset. This dataset contains biometric measurements and categorical information regarding three species of penguins.

In [None]:
# CodeGrade step0
# Run this cell without changes

penguins = pd.read_csv("penguins.csv")
penguins.head()

Now use Seaborn to create a histogram of the 'body_mass_g' column with the following parameters:
*   Context is set to 'talk'
*   Color of the bars should distinguish male and female from each other
*   The labels should be:
    - X Axis: 'Body Mass (g)'
    - Y Axis: 'Number of Penguins'
    - Title: 'Penguin Mass Distribution by Sex'

In [None]:
# CodeGrade step5
# Replace None with your code

# Context
None

#Plot
histplot1 = None
histplot1.set(None)


### Step 6

Create a scatter plot of the bill length (horizontal axis) vs. the bill depth (vertical axis) with the following parameters:

* The context set to 'paper'
* Have the points color represent the sex
* Have the points shape/style represent the species
* The labels should be:
    - X Axis: 'Bill Length (mm)'
    - y Axis: 'Bill Depth (mm)'
    - Title: 'Scatterplot of Bill Sizes of Penguins'

In [None]:
sns.set_context('paper')
scatterplot1 = sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="sex", style="species")
scatterplot1.set(xlabel = 'Bill Length (mm)', ylabel='Bill Depth (mm)', title='Scatterplot of Bill Sizes of Penguins')

In [None]:
# CodeGrade step6
# Replace None with your code

# Context
None

# Plot
scatterplot1 = None
scatterplot1.set(None)

## Part II: Pandas

### Visualizing High Dimensional Data

You are now going to take a look at the canonical iris dataset. This dataset is a classic multivariate dataset, which includes the sepal length, sepal width, petal length, and petal width for hundreds of samples of three species of the iris flower.

In [None]:
# CodeGrade step0
# Run this cell without changes

iris = pd.read_csv("iris.csv")
iris.head()

### Step 7

A primary question you might ask regarding this data is if there is any difference in average measurements across species? You can help answer this via a simple bar chart visual. In order to do so you must first create your aggregated data. You are interestred in seeing how much the mean sepal width differs across the three species.
* X Axis should be labeled as 'Species'
* Y Axis should be labeled as 'Mean Sepal Width'
* Title should be labeled as 'Distribution of Sepal Width'

In [None]:
# CodeGrade step7
# Replace None with your code

# Create your grouped by means, should be a pandas series
species_mean_sepal_width = iris.groupby('species').mean()['sepal_width']

# Plot from the species_mean_width series
barplot1 = species_mean_sepal_width.plot(kind='bar', title='Distribution of Sepal Width', ylabel='Mean Sepal Width')

### Step 8

Utilize pandas plotting (.plot) to create a histogram plot to show the distribution of the 'sepal_length' column separated and grouped by 'species'. You should end up with a chart showing three histograms, one on top of the other. You can accomplish this with one .plot() call.

In [None]:
# CodeGrade step8
# Replace None with your code

# Plot
histplot2 = None