Updated 15-Large_Data.ipynb to show by, hide fake points, and support…

… run all
holoviz · May 27, 2020 · 781e70c · 781e70c
1 parent d883dae
commit 781e70c
Showing 1 changed file with 51 additions and 15 deletions.
diff --git a/examples/user_guide/15-Large_Data.ipynb b/examples/user_guide/15-Large_Data.ipynb
@@ -250,7 +250,7 @@
    "source": [
     "# Multidimensional plots\n",
     "\n",
-    "The above plots show two dimensions of data plotted along *x* and *y*, but Datashader operations can be used with additional dimensions as well.  For instance, an extra dimension (here called `k`), can be treated as a category label and used to colorize the points or lines.  Compared to a standard scatterplot that would suffer from overplotting, here the result will be merged mathematically by Datashader, completely avoiding any overplotting issues except local ones due to spreading:"
+    "The above plots show two dimensions of data plotted along *x* and *y*, but Datashader operations can be used with additional dimensions as well.  For instance, an extra dimension (here called `k`), can be treated as a category label and used to colorize the points or lines, aggregating the data points separately depending on which category value they have. Compared to a standard overlaid scatterplot that would suffer from overplotting, here the result will be merged mathematically by Datashader, completely avoiding any overplotting issues except any local issues that may arise from spreading when zoomed in:"
    ]
   },
   {
@@ -263,23 +263,50 @@
     "kdims=['d1','d2']\n",
     "num_ks=8\n",
     "\n",
-    "def rand_gauss2d():\n",
-    "    return 100*np.random.multivariate_normal(np.random.randn(2), random_cov(), (100000,))\n",
-    "\n",
-    "gaussians = {i: hv.Points(rand_gauss2d(), kdims) for i in range(num_ks)}\n",
-    "lines = {i: hv.Curve(time_series(N=10000, S0=200+np.random.rand())) for i in range(num_ks)}\n",
+    "def rand_gauss2d(value=0, n=100000):\n",
+    "    \"\"\"Return a randomly shaped 2D Gaussian distribution with an associated numeric value\"\"\"\n",
+    "    g = 100*np.random.multivariate_normal(np.random.randn(2), random_cov(), (n,))\n",
+    "    return np.hstack((g,value*np.ones((g.shape[0],1))))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "gaussians = {i: hv.Points(rand_gauss2d(i), kdims, \"i\") for i in range(num_ks)}\n",
+    "c = dynspread(datashade(hv.NdOverlay(gaussians, kdims='k'), aggregator=ds.by('k', ds.count())))\n",
+    "m = dynspread(datashade(hv.NdOverlay(gaussians, kdims='k'), aggregator=ds.by('k', ds.mean(\"i\"))))\n",
     "\n",
-    "gaussspread = dynspread(datashade(hv.NdOverlay(gaussians, kdims='k'), aggregator=ds.count_cat('k')))\n",
-    "linespread  = dynspread(datashade(hv.NdOverlay(lines,     kdims='k'), aggregator=ds.count_cat('k')))\n",
+    "(c + m).opts(opts.RGB(width=400))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Above you can see that (as of Datashader 0.11) categorical aggregates can take any reduction function, either `count`ing the datapoints (left) or reporting some other statistic (e.g. the mean value of a column, right).\n",
     "\n",
-    "(gaussspread + linespread).opts(opts.RGB(width=400))"
+    "Categorical aggregates are one way to allow separate lines or other shapes to be visually distinctive from one another while avoiding obscuring data due to overplotting:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lines = {i: hv.Curve(time_series(N=10000, S0=200+np.random.rand())) for i in range(num_ks)}\n",
+    "linespread = dynspread(datashade(hv.NdOverlay(lines, kdims='k'), aggregator=ds.by('k', ds.count())))\n",
+    "linespread.opts(opts.RGB(width=400))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Because Bokeh only ever sees an image, providing legends and keys has to be done separately, though we are working to make this process more seamless.  For now, you can show a legend by adding a suitable collection of labeled points:"
+    "Note that Bokeh only ever sees an image come out of `hd.datashade`, not any of the actual data. As a result, providing legends and keys has to be done separately, though we are working to make this process more seamless.  For now, you can show a legend by adding a suitable collection of \"fake\" labeled points (size zero and thus invisible):"
    ]
   },
   {
@@ -289,13 +316,13 @@
    "outputs": [],
    "source": [
     "# definition copied here to ensure independent pan/zoom state for each dynamic plot\n",
-    "gaussspread = dynspread(datashade(hv.NdOverlay(gaussians, kdims=['k']), aggregator=ds.count_cat('k')))\n",
+    "gaussspread2 = dynspread(datashade(hv.NdOverlay(gaussians, kdims=['k']), aggregator=ds.by('k', ds.count())))\n",
     "\n",
     "from datashader.colors import Sets1to3 # default datashade() and shade() color cycle\n",
     "color_key = list(enumerate(Sets1to3[0:num_ks]))\n",
-    "color_points = hv.NdOverlay({k: hv.Points([0,0], label=str(k)).opts(color=v) for k, v in color_key})\n",
+    "color_points = hv.NdOverlay({k: hv.Points([0,0], label=str(k)).opts(color=v, size=0) for k, v in color_key})\n",
     "\n",
-    "(color_points * gaussspread).opts(width=600)"
+    "(color_points * gaussspread2).opts(width=600)"
    ]
   },
   {
@@ -329,7 +356,7 @@
     "\n",
     "dates = pd.date_range(start=\"2014-01-01\", end=\"2016-01-01\", freq='1D') # or '1min'\n",
     "curve = hv.Curve((dates, time_series(N=len(dates), sigma = 1)))\n",
-    "datashade(curve, cmap=[\"blue\"]).opts(width=800)"
+    "datashade(curve, cmap=[\"blue\"], width=800).opts(width=800)"
    ]
   },
   {
@@ -350,7 +377,7 @@
     "outliers = rolling_outlier_std(curve, rolling_window=50, sigma=2)\n",
     "\n",
     "ds_curve = datashade(curve, cmap=[\"blue\"])\n",
-    "spread = dynspread(datashade(smoothed, cmap=[\"red\"]),max_px=1) \n",
+    "spread = dynspread(datashade(smoothed, cmap=[\"red\"], width=800),max_px=1) \n",
     "\n",
     "(ds_curve * spread * outliers).opts(\n",
     "    opts.Scatter(line_color=\"black\", fill_color=\"red\", size=10, tools=['hover', 'box_select'], width=800))"
@@ -573,6 +600,15 @@
     "dynspread(datashade(hv.NdLayout(curves,'sign')))"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "hv.output(backend='bokeh')"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},