### Pandas: visualizing with Seaborn

We need some data. For this we use a well-known R dataset: Diamonds. We'll grab the dataset from Github.

In [None]:
import pandas as pd

df = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/ggplot2/diamonds.csv')
df.head()

We do not need the former index column 'Unnamed: 0'

In [None]:
df = df.drop(['Unnamed: 0'], axis = 1)

Another way of getting a grip on the dataframe one is working with is to use the info() method.

In [None]:
df.info()

Visualizing or plotting is a field of its own. Basically one uses visualizing either to get a firmer grip on the data one is working with, or, once you did that, to get an argument across.

What library one uses to visualize stuff depends on your choice what is the best tool to get the job done.

Often matplotlib or the Pandas built-in plotting are used in the exploratory phases. For top knotch visual presentations one can use one of several libraries: Bokeh, Seaborn, etc.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.scatter(x='carat', y='depth', data=df, c='b', alpha=.10)

Built in plotting in Pandas, for series and dataframes, is a simple wrapper around plt.plot().

Which means we can also write:

In [None]:
df.plot.scatter(x='carat', y='price')

What else can you do with the plot() method?

  - plot.area
  - plot.bar
  - plot.barh
  - ...
  - plot.pie

And there are more plotting functions in pandas.tools.plotting:

  - Scatter matrix
  - Andrews curves
  - Parallel coordinates
  - Lag plot
  - Autocorre;ation plot
  - Bootstrap plot
  - RadViz

Seaborn is a visualization library that allows for easy exploration of data contained in series or dataframes. Just as with Matplotlib, on which Seaplot is built, you define the data that you want to use on x and y axes.

In [None]:
import seaborn as sns

sns.countplot(x='cut', data=df)
sns.despine()

In [None]:
sns.barplot(x='cut', y='price', data=df)

One can do pretty amazing visualizations with Seaborn (but I am not an expert). The documentation is thorough, so you will probably find your way around.

One last example. We will plot carat, price, and color to investigate whether there are "expensive" colors.

In [None]:
g = sns.FacetGrid(df, col='color', hue='color', col_wrap=4)
g.map(sns.regplot, 'carat', 'price')

Further reading:

  - [Matplotlib](http://matplotlib.org/api/)
  - [Pandas plotting](http://pandas.pydata.org/pandas-docs/stable/visualization.html)
  - [Seaborn](http://seaborn.pydata.org/)