Skip to content

Commit

Permalink
Merge pull request #1391 from ioam/operations_tutorial
Browse files Browse the repository at this point in the history
Added Operations tutorial
  • Loading branch information
jlstevens authored May 5, 2017
2 parents f3af5c8 + a9d83dc commit 64ca484
Showing 1 changed file with 299 additions and 0 deletions.
299 changes: 299 additions & 0 deletions doc/Tutorials/Operations.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import param\n",
"import numpy as np\n",
"import holoviews as hv\n",
"hv.notebook_extension('bokeh', 'matplotlib')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"HoloViews objects provide a convenient way of wrapping your data along with some metadata for exploration and visualization. In addition to the Elements and containers, HoloViews also provides so called ``Operations``, which can transform objects in custom ways allowing the user to build data processing pipelines. Examples of such operations are ``histogram``, ``rolling``, ``datashade`` or ``decimate``, which apply some computation on certain types of Element and return a new Element with the transformed data.\n",
"\n",
"In this Tutorial we will discover how operations work, how to control their parameters and how to chain them. The [Dynamic_Operations](Dynamic_Operations.ipynb) extends what we have learned to demonstrate how operations can be applied lazily by using the ``dynamic`` flag, letting us define deferred processing pipelines that can drive highly complex visualizations and dashboards.\n",
"\n",
"\n",
"## Inspecting operations\n",
"\n",
"The most common and useful kind of operations are the ``ElementOperation`` classes, which transform one Element or Overlay of Elements returning a new and transformed Element. All operations are so called ``ParameterizedFunction`` objects, which means that they allow defining parameters using the ``param`` library, providing validation for all parameters that control how the operation is applied. This also means we can change the parameters of an operation at the class-level, instance-level and when we call the operation. Let's start by having a look at the ``histogram`` operation. Just as for all other HoloViews objects we can inspect the ``histogram`` operation with the ``hv.help`` function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from holoviews.operation import histogram\n",
"hv.help(histogram)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Applying operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above we can see a listing of all the parameters of the operation, with their defaults, the expected types and detailed docstrings for each one. The ``histogram`` operation can be applied to any Element and will by default generate a histogram for the first value dimension defined on the object it is applied to. As a simple example we can create an ``BoxWhisker`` Element containing samples from a normal distribution, and then apply the ``histogram`` operation to those samples in two ways: 1) by creating an instance on which we will change the ``num_bins`` and 2) by passing ``bin_range`` directly to the call of the operation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"boxw = hv.BoxWhisker(np.random.randn(10000))\n",
"hist_instance = histogram.instance(num_bins=50)\n",
"\n",
"boxw + hist_instance(boxw).relabel('num_bins=50') + histogram(boxw, bin_range=(0, 3)).relabel('bin_range=(0, 3)')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that these two ways of using operations gives us convenient control over how the parameters are applied. An instance allows us to persist some defaults, while passing keyword argument to the operations applies the parameters for just that particular call to the operation.\n",
"\n",
"As the name implies ``ElementOperations`` are applied to individual Elements. This means that even when you apply an operation to a container object containing multiple Elements of one type, e.g. ``NdLayout``, ``GridSpace`` and ``HoloMap`` containers, the operation is applied per Element, essentially mapping the operation over all Elements contained in the container object. As a simple example we can define a HoloMap of ``BoxWhisker`` Elements by varying the width of the distribution via the ``Sigma`` value and then apply the histogram operation to it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"holomap = hv.HoloMap({(i*0.1+0.1): hv.BoxWhisker(np.random.randn(10000)*(i*0.1+0.1)) for i in range(5)},\n",
" kdims=['Sigma'])\n",
"holomap + histogram(holomap)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see the operation has generated a ``Histogram`` for each value of ``Sigma`` in the ``HoloMap``. In this way we can apply the operation to the entire parameter space defined by a ``HoloMap``, ``GridSpace``, and ``NdLayout``."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Combining operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since operations take a HoloViews object as input and return another HoloViews object we can very easily chain and combine multiple operations to perform complex analyses quickly and easily, while instantly visualizing the output.\n",
"\n",
"In this example we'll work with a timeseries, so we'll define a small function to generate a random, noisy timeseries:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from holoviews.operation import timeseries\n",
"\n",
"def time_series(T = 1, N = 100, mu = 0.1, sigma = 0.1, S0 = 20): \n",
" \"\"\"Parameterized noisy time series\"\"\"\n",
" dt = float(T)/N\n",
" t = np.linspace(0, T, N)\n",
" W = np.random.standard_normal(size = N) \n",
" W = np.cumsum(W)*np.sqrt(dt) # standard brownian motion\n",
" X = (mu-0.5*sigma**2)*t + sigma*W \n",
" S = S0*np.exp(X) # geometric brownian motion\n",
" return S\n",
"\n",
"curve = hv.Curve(time_series(N=1000))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will start applying some operations to this data. HoloViews ships with two ready-to-use timeseries operations: the ``rolling`` operation, which applies a function over a rolling window, and a ``rolling_outlier_std`` operation that computes outlier points in a timeseries by excluding points less than ``sigma`` standard deviation removed from the rolling mean:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%opts Scatter [width=600] (color='black')\n",
"smoothed = curve * timeseries.rolling(curve) * timeseries.rolling_outlier_std(curve)\n",
"smoothed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Defining custom operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now define our own custom ElementOperation, if we wish. If you recall from above, operations accept both Elements and Overlays. This means we can define a simple operation that takes our Overlay of the original and smoothed Curve Elements and subtracts one from the other. Such a subtraction will give us the residual between the smoothed and unsmoothed Curves, removing long-term trends and leaving the short-term variation.\n",
"\n",
"Defining an operation is very simple. An ElementOperation subclass should define a ``_process`` method, which accepts an ``element`` and an optional (and deprecated) ``key`` argument. Optionally we can also define parameters on the operation, which we can access using the ``self.p`` attribute on the operation. In this case we define a String-type parameter, which will become the name of the subtracted value dimension on the returned Element."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from holoviews.operation import ElementOperation\n",
"\n",
"class residual(ElementOperation):\n",
" \"\"\"\n",
" Subtracts two curves from one another.\n",
" \"\"\"\n",
" \n",
" label = param.String(default='Residual', doc=\"\"\"\n",
" Defines the label of the returned Element.\"\"\")\n",
" \n",
" def _process(self, element, key=None):\n",
" # Get first and second Element in overlay\n",
" el1, el2 = element.get(0), element.get(1)\n",
" \n",
" # Get x-values and y-values of curves\n",
" xvals = el1.dimension_values(0)\n",
" yvals = el1.dimension_values(1)\n",
" yvals2 = el2.dimension_values(1)\n",
" \n",
" # Return new Element with subtracted y-values\n",
" # and new label\n",
" return el1.clone((xvals, yvals-yvals2),\n",
" vdims=[self.p.label])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Having defined the residual operation let's try it out right away by applying it to our original and smoothed ``Curve``. We'll place the two objects on top of each other so they can share an x-axis and we can compare them directly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%opts Curve [width=600] Overlay [xaxis=None]\n",
"(smoothed + residual(smoothed)).cols(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this view we can immediately see that only a very small residual is left when applying this level of smoothing. However we have only tried one particular ``rolling_window`` value, the default value of ``10``. To assess how this parameter affects the residual we can evaluate the operation for a number of different values, as we do in the next section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluating operation parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When applying an operation there are often various parameters to vary. Using traditional plotting methods it's often difficult to evaluate them interactively to get an detailed understanding of what they do. Here we will apply the ``rolling`` operations with varying ``rolling_window`` widths and ``window_type``s:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rolled = hv.HoloMap({(w, str(wt)): timeseries.rolling(curve, rolling_window=w, window_type=wt)\n",
" for w in [10, 25, 50, 100, 200] for wt in [None, 'hamming', 'triang']},\n",
" kdims=['Window', 'Window Type'])\n",
"rolled"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This view is already useful since we can compare the various parameter values. However since we can also chain operations we can also easily compute the residual and view the two together. To do so we simply overlay the ``HoloMap`` of smoothed curves on top of the original curve and again pass it to our new ``residual`` function. Then we again combine the the smoothed view with the original and see how the smoothing and residual vary with the operation parameters:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%opts Curve [width=600] Overlay [legend_position='top_left']\n",
"(curve(style=dict(color='black')) * rolled + residual(curve * rolled)).cols(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using just a few more lines we have now evaluated the operation over a combination of different parameters providing a quick way to not only process data but gain a better understanding of both the parameters of the operation and the underlying data.\n",
"\n",
"## Benefits of using operations\n",
"\n",
"Now that we have seen some operations in action we can get some appreciation of what makes them useful. When working with data interactively you often end up doing a lot of data wrangling, which provides maximum flexibility but is neither reproducible nor maintainable. Operations allow you to encapsulate analysis code using a well defined interface that is well suited towards building complex analysis pipelines:\n",
"\n",
"1. Their parameters are well defined by declaring parameters on the class. These parameters also perform validation on the types and ranges of the inputs.\n",
"\n",
"2. Both the inputs and outputs of operations are visualizable, because the data **is** the visualization. This means you're not constantly context switching between data processing and visualization---you essentially get the visualization for free.\n",
"\n",
"3. Operations understand HoloViews datastructures and can be applied to many Elements at once, allowing you to evaluate the operation with permutations of parameter values. This flexibility makes it easier to assess what the parameters of the operation are actually doing and how they shape your data.\n",
"\n",
"4. As we will discover in the [Dynamic Operation Tutorial](Dynamic_Operations.ipynb), operations can be applied lazily to build up complex deferred data-processing pipelines, which can aid your data exploration and drive your interactive visualizations and dashboards."
]
}
],
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 64ca484

Please sign in to comment.