Skip to content

Commit

Permalink
Cleaned up Seaborn tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
James A. Bednar committed Feb 24, 2015
1 parent 125234b commit db427a6
Showing 1 changed file with 51 additions and 36 deletions.
87 changes: 51 additions & 36 deletions doc/Tutorials/Pandas_Seaborn.ipynb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"metadata": {
"name": "",
"signature": "sha256:09c17db37d0962cd1fe637e815965c7b9fd3c44a9c3bbadb6cc82efd72785aed"
"signature": "sha256:0e7307ca81daa4d37eb9ae1175763357264aef587982d00fc433573029073063"
},
"nbformat": 3,
"nbformat_minor": 0,
Expand All @@ -20,20 +20,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we'll look at interfacing between the composability and ability to generate complex visualizations that HoloViews provides and the power of ``pandas`` dataframes and the great looking plots incorporated in the ``seaborn`` library.\n",
"In this notebook we'll look at interfacing between the composability and ability to generate complex visualizations that HoloViews provides, the power of [pandas](http://pandas.pydata.org/) library dataframes for manipulating tabular data, and the great looking statistical plots and analyses provided by the [Seaborn](http://stanford.edu/~mwaskom/software/seaborn) library.\n",
"\n",
"Additionally we explore how a Pandas dframe can be wrapped in a general purpose Element type, which can either be used to convert the data into standard Element types or be visualized directly using a wide array of plotting options, including:\n",
"We also explore how a pandas ``DFrame`` can be wrapped in a general purpose ``Element`` type, which can either be used to convert the data into other standard ``Element`` types or be visualized directly using a wide array of Seaborn-based plotting options, including:\n",
"\n",
"* Regression plots, correlation plots, box plots, autocorrelation plots, scatter matrices, histograms or regular scatter or line plots.\n",
"* [regression plots](#Regression)\n",
"* [correlation plots](#Correlation)\n",
"* [box plots](#Box)\n",
"* autocorrelation plots\n",
"* scatter matrices\n",
"* [histograms](#Histogram)\n",
"* scatter or line plots\n",
"\n",
"This tutorial assumes you're already familiar with some of the core concepts of HoloViews, if you're not or just need a quick refresher have a look at our [other Tutorials](http://ioam.github.io/holoviews/Tutorials/index.html)."
"This tutorial assumes you're already familiar with some of the core concepts of HoloViews, which are explained in the [other Tutorials](http://ioam.github.io/holoviews/Tutorials/index.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start with the imports all we'll need is NumPy, Pandas and Seaborn:"
"This tutorial requires NumPy, Pandas, and Seaborn to be installed and imported:"
]
},
{
Expand All @@ -59,7 +65,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Having dealt with the imports we begin by loading the HoloViews IPython extension and selecting static and animation backends."
"We can now select static and animation backends:"
]
},
{
Expand All @@ -77,14 +83,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualizing Distributions of Data"
"# Visualizing Distributions of Data <a id='Histogram'/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll start by generating a number of Distribution Elements containing normal distribution with different means and standard deviations and overlaying them. Using the opts magic you can specify specific plot and style options as usual, for example we can deactivate the default histogram and shade the kernel density estimate."
"If ``import seaborn`` succeeds, HoloViews will provide a number of additional ``Element`` types, including ``Distribution``, ``Bivariate``, ``TimeSeries``, ``Regression``, and ``DFrame`` (a ``Seaborn``-visualizable version of the ``DFrame`` ``Element`` class provided when only pandas is available).\n",
"\n",
"We'll start by generating a number of ``Distribution`` ``Element``s containing normal distributions with different means and standard deviations and overlaying them. Using the ``%%opts`` magic you can specify specific plot and style options as usual; here we deactivate the default histogram and shade the kernel density estimate:"
]
},
{
Expand All @@ -107,7 +115,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Thanks to Seaborn you can choose to plot your distribution using as histograms, kernel density estimates and rug plots."
"Thanks to Seaborn you can choose to plot your distribution as histograms, kernel density estimates, or rug plots:"
]
},
{
Expand All @@ -125,7 +133,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also visualize the same data with a Bivariate distribution:"
"We can also visualize the same data with ``Bivariate`` distributions:"
]
},
{
Expand All @@ -145,7 +153,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This plot type also has the option of enabling a joint plot with marginal distribution along each axis. Finally, through the kind option you can control whether to visualize the distribution as a ``scatter``, ``reg``, ``resid``, ``kde`` or ``hex`` plot."
"This plot type also has the option of enabling a joint plot with marginal distribution along each axis, and the ``kind`` option lets you control whether to visualize the distribution as a ``scatter``, ``reg``, ``resid``, ``kde`` or ``hex`` plot:"
]
},
{
Expand All @@ -170,14 +178,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with TimeSeries data"
"# Working with ``TimeSeries`` data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next let's take a look at the TimeSeries View type, which allows you to visualize statistical time series data. TimeSeries data can take the form of a number of observations of some dependent variable at multiple timepoints. By controlling the plot and style option the data can be visualized in a number of ways including confidence intervals, error bars, traces or scatter points."
"Next let's take a look at the ``TimeSeries`` View type, which allows you to visualize statistical time-series data. ``TimeSeries`` data can take the form of a number of observations of some dependent variable at multiple timepoints. By controlling the plot and style option the data can be visualized in a number of ways, including confidence intervals, error bars, traces or scatter points."
]
},
{
Expand Down Expand Up @@ -211,15 +219,15 @@
"cell_type": "code",
"collapsed": false,
"input": [
"sine_stack = holoviews.HoloMap(key_dimensions=['Observation error', 'Random error'])\n",
"sine_stack = holoviews.HoloMap(key_dimensions=['Observation error','Random error'])\n",
"cos_stack = holoviews.HoloMap(key_dimensions=['Observation error', 'Random error'])\n",
"for oe, te in itertools.product(np.linspace(0.5,2,4), np.linspace(0.5,2,4)):\n",
" sines = np.array([sine_wave(31, oe, te) for _ in range(20)])\n",
" sine_stack[(oe, te)] = TimeSeries(sines, label='Sine', key_dimensions=['Time', 'Observation'],\n",
" group='Activity')\n",
" sine_stack[(oe, te)] = TimeSeries(sines, label='Sine', group='Activity',\n",
" key_dimensions=['Time', 'Observation'])\n",
" cosines = np.array([sine_wave(31, oe, te, phase=np.pi) for _ in range(20)])\n",
" cos_stack[(oe, te)] = TimeSeries(cosines, label='Cosine', key_dimensions=['Time', 'Observation'],\n",
" group='Activity')"
" cos_stack[(oe, te)] = TimeSeries(cosines, group='Activity',label='Cosine', \n",
" key_dimensions=['Time', 'Observation'])"
],
"language": "python",
"metadata": {},
Expand Down Expand Up @@ -247,7 +255,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And the cosine stack with error bars."
"And the cosine stack with error bars:"
]
},
{
Expand All @@ -265,7 +273,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the opts cell magic has applied the style to each plot individually we can now overlay the two with different visualization styles."
"Since the ``%%opts`` cell magic has applied the style to each object individually, we can now overlay the two with different visualization styles in the same plot:"
]
},
{
Expand All @@ -282,7 +290,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's apply the databounds across the HoloMap again and visualize all the observations as unit points."
"Let's apply the databounds across the HoloMap again and visualize all the observations as unit points:"
]
},
{
Expand All @@ -300,14 +308,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with Pandas DataFrames"
"# Working with pandas DataFrames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to make this a little more interesting let's use some realworld data from the seaborn library. Seaborn provides dates a number of datasets to visualize. The holoviews DFrame object can be used to wrap the pandas dataframes just like this:"
"In order to make this a little more interesting, we can use some of the real-world datasets provid3ed with the Seaborn library. The holoviews ``DFrame`` object can be used to wrap the Seaborn-generated pandas dataframes like this:"
]
},
{
Expand All @@ -326,7 +334,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"By default the DFrame simply inherits the column names of the data frames and converts them into Dimensions. This works very well as a default but in order to override it you can either supply an explicit list of key_dimensions to the DFrame object or a dimensions dictionary, which maps from the column name to the appropriate Dimension object. In this case we define a Months Dimension, which defines the ordering of months:"
"By default the ``DFrame`` simply inherits the column names of the data frames and converts them into ``Dimension``s. This works very well as a default, but if you wish to override it, you can either supply an explicit list of ``key_dimensions`` to the ``DFrame`` object or a dimensions dictionary, which maps from the column name to the appropriate ``Dimension`` object. In this case, we define a ``Month`` ``Dimension``, which defines the ordering of months:"
]
},
{
Expand Down Expand Up @@ -364,7 +372,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can easily use the conversion methods on the DFrame object to create standard HoloViews Elements, e.g. a TimeSeries and a HeatMap in this case:"
"Now we can easily use the conversion methods on the ``DFrame`` object to create HoloViews ``Element``s, e.g. a Seaborn-based ``TimeSeries`` ``Element`` and a HoloViews standard ``HeatMap``:"
]
},
{
Expand All @@ -383,14 +391,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tipping data"
"### Tipping data <a id='Regression'/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A simple regression can easily be visualized using the Regression Element type. However, we'll also split out ``smoker`` and ``sex`` as ``Dimensions``, overlaying the former and laying out the latter. Allowing us to compare how tipping among male and female, smokers and non-smokers compares."
"A simple regression can easily be visualized using the ``Regression`` ``Element`` type. However, here we'll also split out ``smoker`` and ``sex`` as ``Dimensions``, overlaying the former and laying out the latter, so that we can compare tipping between smokers and non-smokers, separately for males and females."
]
},
{
Expand All @@ -409,21 +417,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"When you're dealing with higher dimensional data you can work with Pandas dataframes directly by displaying the DFrame object directly. This allows you to perform all the standard HoloViews operations on more complex Seaborn and Pandas plot types."
"When you're dealing with higher dimensional data you can also work with pandas dataframes directly by displaying the ``DFrame`` ``Element`` directly. This allows you to perform all the standard HoloViews operations on more complex Seaborn and pandas plot types, as explained in the following sections."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iris Data"
"### Iris Data <a id='Box'/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's visualize the relationship between sepal length and width in the Iris flower dataset. Here we can make use of some of the inbuilt Seaborn plot types, a PairGrid which can plot each variable in a dataset against each other variable. We can customize this plot further by passing arguments via the style options, this way we can define what plot types the PairPlot will use and define the dimension that will be used to apply the hue option to. "
"Let's visualize the relationship between sepal length and width in the Iris flower dataset. Here we can make use of some of the inbuilt Seaborn plot types, a ``pairplot`` which can plot each variable in a dataset against each other variable. We can customize this plot further by passing arguments via the style options, to define what plot types the ``pairplot`` will use and define the dimension to which we will apply the hue option. "
]
},
{
Expand All @@ -441,7 +449,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"When working with a DFrame object directly, you can select particular columns of your DFrame to visualize by supplying ``x`` and ``y`` parameters corresponding to the Dimensions or columns you want visualize. Here we'll visualize the sepal_width and sepal_length by species as a box plot and violin plot respectively."
"When working with a ``DFrame`` object directly, you can select particular columns of your ``DFrame`` to visualize by supplying ``x`` and ``y`` parameters corresponding to the ``Dimension``s or columns you want visualize. Here we'll visualize the ``sepal_width`` and ``sepal_length`` by species as a box plot and violin plot, respectively."
]
},
{
Expand All @@ -459,14 +467,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Titanic passenger data"
"## Titanic passenger data <a id='Correlation'/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a truly large dataset so we can make use of some of the more advanced features of Seaborn and Pandas. Above we saw the usage of a pairgrid, which allows you to quickly compare each variable in your dataset. HoloViews also support Seaborn based [FacetGrids](http://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html#subsetting-data-with-facetgrid). The FacetGrid specification is simply passed via the style options, where the map keyword should be supplied as a tuple of the plotting function to use and the Dimensions to place on the x-axis and y-axis. Additionally you may specify the Dimensions to lay out along the rows and columns of the plot and the hue groups:"
"The Titanic passenger data is a truly large dataset, so we can make use of some of the more advanced features of Seaborn and pandas. Above we saw the usage of a ``pairgrid``, which allows you to quickly compare each variable in your dataset. HoloViews also support Seaborn based [FacetGrids](http://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html#subsetting-data-with-facetgrid). The FacetGrid specification is simply passed via the style options, where the ``map`` keyword should be supplied as a tuple of the plotting function to use and the ``Dimension``s to place on the x axis and y axis. You may also specify the ``Dimension``s to lay out along the ``row``s and ``col``umns of the plot, and the ``hue`` groups:"
]
},
{
Expand All @@ -484,7 +492,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"FacetGrids support most seaborn and matplotlib plot types:"
"FacetGrids support most Seaborn and matplotlib plot types:"
]
},
{
Expand All @@ -502,7 +510,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally we can summarize our data using a correlation plot and split out Dimensions using the holomap method, which groups by the specified dimension, giving you a frame for each value along that Dimension. In this case for the survived and not survived conditions."
"Finally, we can summarize our data using a correlation plot and split out ``Dimension``s using the ``.holomap`` method, which groups by the specified dimension, giving you a frame for each value along that ``Dimension``. Here we group by the ``survived`` ``Dimension`` (with 1 if the passenger survived and 0 otherwise), which thus provides a widget to allow us to compare those two values."
]
},
{
Expand All @@ -515,6 +523,13 @@
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, the Seaborn plot types and pandas interface provide substantial additional capabilities to HoloViews, while HoloViews allows simple animation, combinations of plots, and visualization across parameter spaces. Note that the ``DFrame`` ``Element`` is still available even if Seaborn is not installed, but it will use the standard ``HoloViews`` visualizations rather than ``Seaborn`` in that case."
]
}
],
"metadata": {}
Expand Down

0 comments on commit db427a6

Please sign in to comment.