## Introduction to Seaborn ##

We will learn:

a) Understand how to do data visualizations in the Python

b) Use matplotlib to draw plots - lineplot, histograms, scatterplots

c) Use Seaborn to create statistically meaningful plots 

In [None]:
#Remember these will become the standard imports from now on!
import numpy as np
from numpy.random import randn
from pandas import Series,DataFrame
import pandas as pd

# New import for matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

** Line Charts **

Line charts is for connecting (x,y) co-ordinates using lines. Hence, can be used for sine, cosine functions too!

In [None]:
#Generate x,y co-ordinates 

# Linearly spaced data - starting point, ending point, no.of points in between.
x = np.linspace(0,10,100)

y = np.sin(x)

plt.plot(x,y)
plt.show()

In [None]:
# Let's try it together :
# Try changing the x =  np.linspace(0,10,10) to generate 10 points between 500 to 1000.
# Plot the values again and print
#
# Add X-axis label, Y-axis-label, Title
# Use the following functions:
# plt.xlabel("X") 
# plt.ylabel("Sin(X)") 
# plt.title("sine curve")
# plt.plot(x,y)
# plt.show()

** Scatter Plots **

Scatter plots are used to identify the spread of the data.

In [None]:
#Load the data using pandas
array = pd.read_csv("scatter_plot.csv").as_matrix()

In [None]:
## Print the array
print(array)

In [None]:
#Lets assign x and y variables
x = array[:,0]
y = array[:,1]

plt.scatter(x,y)

In [None]:
## You can add title and other labels and then run the cell!!

In [None]:
# plot histogram on x
plt.hist(x)

In [None]:
## plot the histogram on y

In [None]:
# assign bins
plt.hist(x, bins=20)

In [None]:
# plot histogram for y with bins=50


## Seaborn ##

b) creates aesthetically pleasing plots by default

c) creates statistically meaningful plots

d) understands pandas DataFrame so the two work well together

In [None]:
# New import for seaborn
import seaborn as sns

In [None]:
# Plotting univariate distributions
x = randn(100)
sns.distplot(x, kde=True)

In [None]:
# Try changing the kde parameter to True and plot again


In [None]:
# Plotting bivariate distributions

# Scatterplots
tips = sns.load_dataset("tips")
print tips.head()

In [None]:
# joint plot
sns.jointplot("total_bill", "tip", data=tips)

In [None]:
# Hexbin plots for more concise picture
# Try setting kind="hex" and plot again


In [None]:
#Visualizing pairwise relationships between all variables in a dataset
sns.pairplot(tips)

In [None]:
# Build conditional plots : this let's us see what the data looks like when segmented by one or more variables
sns.factorplot('sex', 'tip', data=tips)

In [None]:
# plot factor plot with smoker variable


In [None]:
# Best fitting linear regression line 
sns.lmplot("total_bill", "tip", data=tips)

In [None]:
# We might also check if smokers tip differently than non-smokers
sns.lmplot("total_bill", "tip", data=tips, col="smoker")

In [None]:
# Try with different genders
