Add Sankey Element #2328

philippjfr · 2018-02-10T13:07:42Z

As requested in #1123 this PR implements a Sankey element based on the implementation in d3-sankey. The code consists of three components, which are very similar to the Chord element:

The Sankey element type based on Graph
An Operation which is called by the Sankey constructor which lays out the nodes of diagram and computes an abstract graph representation
The plotting classes

%%opts Sankey [label_position='left' show_values=False]
edges = pd.read_csv('energy.csv')
Sankey(edges).redim(value='TWh')

The plots looks pretty much identical across backends. Here is what a simpler example looks like:

%%output backend='matplotlib' fig='svg'
Sankey([
         ('Wage', 'Pension', 200),
         ('Wage', 'Social Security', 200),
         ('Wage', 'Net Income', 3500),
         ('Net Income', 'Rent', 1000),
         ('Net Income', 'Utilities', 200),
         ('Utilities', 'Electricity', 39),
         ('Utilities', 'Internet', 50),
         ('Utilities', 'Gas', 89),
         ('Net Income', 'Investment', 500),
         ('Wage', 'Tax', 1000),
         ('Net Income', 'Leisure', 300),
         ('Net Income', 'Food', 400),
         ('Leisure', 'Cinema', 20),
         ('Leisure', 'Drinks', 50)])

Suggestions for reference examples welcome.

Add tests
Add reference gallery examples
Add gallery demos

philippjfr · 2018-04-05T02:19:09Z

Ready to review, going to add unit tests in the morning.

philippjfr · 2018-04-05T02:20:53Z

Here's the reference guide example I used to demonstrate the label_index option together with integer node indices:

nodes = ["PhD", "Career Outside Science",  "Early Career Researcher",
         "Permanent Research Staff",  "Professor",  "Non-Academic Research"]
nodes = hv.Dataset(enumerate(nodes), 'index', 'label')
edges = [
    (0, 1, 53), (0, 2, 47), (2, 5, 17), (2, 3, 30), (3, 1, 22.5), (3, 5, 4.), (3, 4, 0.45)   
]

value_dim = hv.Dimension('Percentage', unit='%')
hv.Sankey((edges, nodes), ['From', 'To'], vdims=value_dim).options(
    label_index='label', label_position='left', width=800, height=400, edge_color_index='To')

jlstevens · 2018-04-07T22:17:09Z

holoviews/core/data/__init__.py

@@ -173,7 +173,7 @@ class Dataset(Element):

    # In the 1D case the interfaces should not automatically add x-values
    # to supplied data
-    _auto_indexable_1d = False
+    _auto_indexable_1d = True


Is this correct? The auto indexing behavior caused problems fixed in a recently merged PR...

jlstevens · 2018-04-07T22:18:54Z

holoviews/element/sankey.py

+                                 'which matches the node ids on the edges.')
+            self._nodes = nodes
+            chord = layout_sankey(self)
+            self._nodes = chord.nodes


Shouldn't be chord

jlstevens · 2018-04-07T22:25:10Z

holoviews/element/sankey.py

+            kdims = element.node_type.kdims
+        nodes = element.node_type(node_data, kdims=kdims, vdims=element.nodes.vdims)
+        edges = element.edge_type(paths)
+        sankey = Sankey((element.data, nodes, edges), compute=False)


It is rather odd how the Sankey element uses this operation in its constructor which then returns a Sankey object! It would be better to have a method just computing the information needed by Sankey and using that in _process.

jlstevens · 2018-04-07T22:26:56Z

holoviews/element/sankey.py

+        _, y0, _, y1 = self.p.bounds
+        py = self.p.node_padding
+
+        def initializeNodeBreadth():


It might be nice to avoid these kinds of inline function - explicit arguments are nicer than closures...

jlstevens · 2018-04-07T22:30:57Z

holoviews/element/sankey.py

+from .util import quadratic_bezier
+
+
+class layout_sankey(Operation):


Another idea would be to move this code to the element class itself e.g as a number of classmethods used by the constructor to compute the layout. I suppose the parameters of this operation could then be plot parameters. What other use does this operation have other than to create sankey elements which you can just do using Sankey?

I mention this as it is a little confusing to see such a long operation in elements.

jlstevens · 2018-04-07T22:34:37Z

holoviews/element/stats.py

@@ -18,6 +18,9 @@ class StatisticsElement(Chart):

    __abstract = True

+    # Ensure Interface does not add an index
+    _auto_indexable_1d = False
+


Is this needed?

jlstevens · 2018-04-07T22:40:55Z

holoviews/plotting/bokeh/sankey.py

+
+    def get_extents(self, element, ranges):
+        """
+        A Chord plot is always drawn on a unit circle.


Not a Chord plot!

jlstevens · 2018-04-07T22:43:12Z

holoviews/plotting/bokeh/sankey.py

+
+    _style_groups = dict(GraphPlot._style_groups, quad='nodes', text='label')
+
+    _draw_order = ['patches', 'multi_line', 'quad', 'text']


Is 'multi_line' used?

jlstevens · 2018-04-07T22:44:02Z

holoviews/plotting/bokeh/sankey.py

+
+    def _patch_hover(self, element, data):
+        """
+        Replace edge start and end hover data with label_index data.


Worth checking if ChordPlot needs this too...

jlstevens · 2018-04-07T22:45:55Z

holoviews/plotting/mpl/__init__.py

@@ -83,7 +85,7 @@ def get_color_cycle():

 # Define Palettes and cycles from matplotlib colormaps
 Palette.colormaps.update({cm: plt.get_cmap(cm) for cm in plt.cm.datad
-                          if 'spectral' not in cm and 'Vega' not in cm})
+                          if not ('spectral' in cm or (mpl_ge_200 and 'Vega' in cm))})


Is this relevant to Sankey or did this diff come from somewhere else?

jlstevens · 2018-04-07T22:53:46Z

tests/plotting/bokeh/testgraphplot.py

@@ -71,7 +71,7 @@ def test_graph_inspection_policy_edges(self):
        renderer = plot.handles['glyph_renderer']
        hover = plot.handles['hover']
        self.assertIsInstance(renderer.inspection_policy, EdgesAndLinkedNodes)
-        self.assertEqual(hover.tooltips, [('start', '@{start}'), ('end', '@{end}')])
+        self.assertEqual(hover.tooltips, [('start', '@{start_values}'), ('end', '@{end_values}')])


My understanding is these names are a convention established by holoviews for use in bokeh's hover tools. It would be good either to document this in this PR or open an issue as a reminder to document these names.

jlstevens · 2018-04-07T22:58:33Z

examples/gallery/demos/bokeh/energy_sankey.ipynb

+    "edges = pd.DataFrame(data['links'])\n",
+    "nodes = pd.DataFrame(data['nodes'])\n",
+    "edges['source'] = nodes['name'].values[edges['source'].values]\n",
+    "edges['target'] = nodes['name'].values[edges['target'].values]\n",


These two lines are rather obscure. It would be nice if they weren't needed.

jlstevens · 2018-04-07T23:06:46Z

I've made a number of comments, most of which are fairly minor (e.g outdated docstrings). The biggest issue right now is that the sankey element is currently more operation than element which is surprising/unusual for something living in the element module.

If that is addressed, I am happy to see the PR merged once the tests are passing.

jlstevens · 2018-04-08T03:15:08Z

Looks good. Merging.

philippjfr added type: feature A major new feature tag: component: plotting labels Feb 10, 2018

philippjfr force-pushed the sankey branch from a09283d to 26e0c5a Compare February 10, 2018 16:28

philippjfr force-pushed the sankey branch from 26e0c5a to b289c8d Compare March 27, 2018 02:45

philippjfr force-pushed the sankey branch 2 times, most recently from 7139a13 to 400968c Compare April 4, 2018 20:05

philippjfr force-pushed the sankey branch 4 times, most recently from 9f2db0c to e2e43ac Compare April 5, 2018 17:51

jlstevens reviewed Apr 7, 2018

View reviewed changes

philippjfr added 5 commits April 7, 2018 21:29

Added Sankey element

2a29026

Fixed Graph plot bug when using non-numeric source/target indices

0a913ce

Fixed autoindexing of element types

2de9d5f

Handled Sankey colors for matplotlib < 2.0

37add0e

Added Sankey gallery examples

0b55b96

philippjfr added 15 commits April 7, 2018 21:29

Reversed change to Dataset auto-index

d4b481e

Various Sankey improvements

a295534

Added Sankey reference guides

97a08ad

Fixed flakes

75067ec

Minor refactoring

a229a6b

Added Sankey unit tests

7ab6442

Further Sankey refactoring

12ee42f

Various minor fixes

3080d4c

Define RecursionError for python 2

72e1c7c

Fixed py2 division

d83b219

Implemented mpl Sankey updating

a059889

Added unit tests for Sankey matplotlib plot

b5be030

Fixed Sankey mpl tests

b809ad9

Addressed review comments

030f2f9

Fixed Sankey unit tests

df19c02

philippjfr force-pushed the sankey branch from 1205ecd to df19c02 Compare April 8, 2018 02:29

jlstevens merged commit ed031ad into master Apr 8, 2018

philippjfr deleted the sankey branch July 4, 2018 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sankey Element #2328

Add Sankey Element #2328

philippjfr commented Feb 10, 2018 •

edited

philippjfr commented Apr 5, 2018

philippjfr commented Apr 5, 2018 •

edited

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

philippjfr Apr 8, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens Apr 7, 2018

jlstevens commented Apr 7, 2018

jlstevens commented Apr 8, 2018

		from .util import quadratic_bezier


		class layout_sankey(Operation):


		_style_groups = dict(GraphPlot._style_groups, quad='nodes', text='label')

		_draw_order = ['patches', 'multi_line', 'quad', 'text']

Add Sankey Element #2328

Add Sankey Element #2328

Conversation

philippjfr commented Feb 10, 2018 • edited

philippjfr commented Apr 5, 2018

philippjfr commented Apr 5, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Apr 7, 2018

jlstevens commented Apr 8, 2018

philippjfr commented Feb 10, 2018 •

edited

philippjfr commented Apr 5, 2018 •

edited