# V03: Getting Started with Seaborn

Before we start you will all need to install Seaborn! You can do this by opening up the command line and typing:

    conda install seaborn
    
This will start the automatic process of downloading and installing seaborn.

We've already seen how we can create some basic charts in Matplotlib. this is a really useful way of quickly and easily eyeballing your data on the fly, but as we've seen it takes a lot of work to make the charts more presentable. Additionally we might also want to make more detailed and complex charts of a statistical nature and that's where Seaborn comes in.

As with Pandas, Seaborn is built on top of matplotlib so this is a dependency you have to bear in mind when installing Seaborn. Seaborn's main strength is it's ability to produce attractive and useful statistical visualisations at very little effort on the users part. We'll start by importing all the libraries we need.

In [None]:
import pandas as pd
import numpy as np
from numpy.random import randn
import matplotlib.pyplot as plt 
import seaborn as sns # Standard convention for Seaborn
%matplotlib inline

In this example we're going to be using Numpy Arrays for convinience, but Seaborn integrates very well with pandas and will accept both Series and Dataframe columns as arguments.

In [None]:
data1 = (randn(1000))
data2 = (randn(1000)+2)
data3 = (randn(1000)+4)
data4 = (randn(1000)+6)

Now lets have a look at some of the charts that Seaborn can generate to analyse the distribution of data. Most of these can be created by using just a couple of lines of code and are pre-formatted to look much better than the basic plots produced by Matplotlib.

We'll not be looking into how to fully customise charts creted in Seaborn. However, as Seaborn is built on top of Matplotlib, we can customise charts in exactly the same way as we learned in the previous two lessons. Additionlly, the <a href = "https://stanford.edu/~mwaskom/software/seaborn/api.html">Seaborn API reference</a> is excellent and makes this a simple process.

## JointPlot

A Seaborn Jointplot will produce a scatter, bar and column chart for the input data:

In [None]:
chart1 = sns.jointplot(data1,data2)

A hexy jointplot:

In [None]:
sns.jointplot(data1,data2,kind='hex')

## Kernel Density Estimation (KDE)

A good way to estimate clusters in data is to use a desntity smoothing function, in this case <a href = "http://www.mvstat.net/tduong/research/seminars/seminar-2001-05/">Kernel Density Estimation (KDE)</a>. A KDE plot works by placing a value of density referred to as a 'Kernel' on each data point and then summing these to create a density for a specific region. visualising this. They are a kind of 'heatmap' although as we shall see below, Seaborn can produce traditional heatmaps as well.

In [None]:
chart = sns.kdeplot(data1,data2,shade=True)

Combining a Jointplot with KDE

In [None]:
sns.jointplot(data1,data2,kind='kde')

## Box and Violin Plots

<a href = "http://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/representingdata3hirev6.shtml">Boxplots</a> visualise the shape and variability of data. They're good for showing whether data is centred or 'skewed' to either the left or the right via the <a href = "http://stattrek.com/statistics/dictionary.aspx?definition=Interquartile%20range">Inter Quartile Range</a>.

In [None]:
sns.boxplot([data1])

<a href = "https://en.wikipedia.org/wiki/Violin_plot">Violinplots</a> are a cross bewteen boxplots and Kernel Density Estimation. Typically violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots, with a KDE overlaid.

In [None]:
sns.violinplot(data2)

You can also do grouped Box and Violin Plots with more than one output. This is best illustrated with a pandas dataframe so we're going to load the tips dataset that comes as part of the Seaborn package.

In [None]:
tips = sns.load_dataset("tips")
tips.head(5)

In [None]:
sns.violinplot(x="sex", y="total_bill", data=tips)

In [None]:
sns.violinplot(x="day", y="total_bill", data=tips)

## Heatmaps

Seaborn also comes with some datasets which we can use to explore some of the other types of charts, including heatmaps.

In [None]:
flight_dframe = sns.load_dataset('flights')                      # Importing the dataset
flight_dframe = flight_dframe.pivot("month","year","passengers") # Pivoting the dataset
sns.heatmap(flight_dframe)                                       # Creating the heatmap

## Regression & Pair Plots

Seaborn can also do some pretty powerful Regression and Pair plots, however we'll look at those in more depth in the section on Regression and Machine Learning!