# Sea-ing the Data!

There are many great tools like Pandas and Numpy that allow you to easily manipulate data.
But what's often more important is VISUALIZING the data.

Matplotlib provides a great way to create graphs from data, but the syntax is often very cumbersome.

Seaborn is essentially a "high-level" library that uses matplotlib to make graphs with less required code!

Where does the name "Seaborn" come from? We are open to suggestions...

In [None]:
# We will need data in order to make graphs! We will use pandas
import pandas as pd

# matplotlib is an essential whenever we are making graphs! 
# Seaborn is simply a shortcut for using matplotlib!
import matplotlib.pyplot as plt

# Our 'magic' function to display graphs nicely
%matplotlib inline

# Import Seaborn!
import seaborn as sns

In [None]:
# First we will import our data using pandas
iris = pd.read_csv('./week3/iris.csv')
iris

In [None]:
# To illustrate the beauty of data visualization...
# Let's start with perhaps the most powerful type of plot applicable to this data set: a pair plot

# The POWER of seaborn: one-line code
PairPlot = sns.pairplot(data=iris, hue='Species')
# Making graphs has never been easier!

# Unpacking this...
# pairplot() is the function that - you guessed it - makes the pairplot
# 'data' is ... where the data comes from (our pandas data frame)
# 'hue' colors the dots based on values in the designated ('Species') column in the data frame

# What on Earth is a pair plot?
# A scatter plot compares two values (i.e. length and width) 
# A pair plot simply creates a scatter plot of every possible pair of values
# You can see which values are being compared by looking at the labels!
# The diagonal plots however are simply the distribution of a single value (a 'univariate' distribution)

# Why did we name the plot?
# Without a name, an ugly storage location (?) name gets printed at the top of the graph...

In [None]:
# Now we know how striking Seaborn and data visualization can be
# Let's explore the relationship between petal length and petal width using a single scatter plot

ScatterPlot = sns.scatterplot(x='Petal length', y='Petal width', data=iris, hue='Species')

# Unpacking this...
# scatterplot() is the function that - you guessed it - makes the scatterplot
# 'x' is ... the data for the x axis
# 'y' is ... the data for the y axis

# Can you find this graph in the pair plot?

In [None]:
# We see that there may be some kind of linear relationship between these variables!
# We can use a lmplot (linear model) to add regression lines to the data

LmPlot = sns.lmplot(x='Petal length', y='Petal width', data=iris)

# If you add the 'hue' parameter, the data will become separated by species and you can view the best-fit line for each species

# For whatever reason, there is no built in feature to find the equation of the best-fit line
# Feel free to GOOGLE how to do that using another library like NumPy or SciPy

In [None]:
# Let's try something else now...
# Which flowers have the longest petals?
# You could create something like a dot plot or a box plot, but let's try something else

ViolinPlot = sns.violinplot(x='Species', y='Petal length', data=iris, inner='stick')

# Unpacking this...
# 'inner' draws a stick for each data point
# Notice that the plots get wider in areas with more sticks!

# A violin plot is very similar to a box-and-whiskers plot, but has more detail (and is much cooler)

As you can see, seaborn provides an extremely easy way to make graphs

Check out https://seaborn.pydata.org/examples/index.html for more ideas!