Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sankey Element #2328

Merged
merged 20 commits into from Apr 8, 2018
Merged

Add Sankey Element #2328

merged 20 commits into from Apr 8, 2018

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Feb 10, 2018

As requested in #1123 this PR implements a Sankey element based on the implementation in d3-sankey. The code consists of three components, which are very similar to the Chord element:

  • The Sankey element type based on Graph
  • An Operation which is called by the Sankey constructor which lays out the nodes of diagram and computes an abstract graph representation
  • The plotting classes
%%opts Sankey [label_position='left' show_values=False]
edges = pd.read_csv('energy.csv')
Sankey(edges).redim(value='TWh')

screen shot 2018-02-10 at 1 03 59 pm

The plots looks pretty much identical across backends. Here is what a simpler example looks like:

%%output backend='matplotlib' fig='svg'
Sankey([
         ('Wage', 'Pension', 200),
         ('Wage', 'Social Security', 200),
         ('Wage', 'Net Income', 3500),
         ('Net Income', 'Rent', 1000),
         ('Net Income', 'Utilities', 200),
         ('Utilities', 'Electricity', 39),
         ('Utilities', 'Internet', 50),
         ('Utilities', 'Gas', 89),
         ('Net Income', 'Investment', 500),
         ('Wage', 'Tax', 1000),
         ('Net Income', 'Leisure', 300),
         ('Net Income', 'Food', 400),
         ('Leisure', 'Cinema', 20),
         ('Leisure', 'Drinks', 50)])

image

Suggestions for reference examples welcome.

  • Add tests
  • Add reference gallery examples
  • Add gallery demos

@philippjfr
Copy link
Member Author

Ready to review, going to add unit tests in the morning.

@philippjfr
Copy link
Member Author

philippjfr commented Apr 5, 2018

Here's the reference guide example I used to demonstrate the label_index option together with integer node indices:

nodes = ["PhD", "Career Outside Science",  "Early Career Researcher",
         "Permanent Research Staff",  "Professor",  "Non-Academic Research"]
nodes = hv.Dataset(enumerate(nodes), 'index', 'label')
edges = [
    (0, 1, 53), (0, 2, 47), (2, 5, 17), (2, 3, 30), (3, 1, 22.5), (3, 5, 4.), (3, 4, 0.45)   
]

value_dim = hv.Dimension('Percentage', unit='%')
hv.Sankey((edges, nodes), ['From', 'To'], vdims=value_dim).options(
    label_index='label', label_position='left', width=800, height=400, edge_color_index='To')

screen shot 2018-04-05 at 3 26 00 am

@philippjfr philippjfr force-pushed the sankey branch 4 times, most recently from 9f2db0c to e2e43ac Compare April 5, 2018 17:51
@@ -173,7 +173,7 @@ class Dataset(Element):

# In the 1D case the interfaces should not automatically add x-values
# to supplied data
_auto_indexable_1d = False
_auto_indexable_1d = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? The auto indexing behavior caused problems fixed in a recently merged PR...

'which matches the node ids on the edges.')
self._nodes = nodes
chord = layout_sankey(self)
self._nodes = chord.nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be chord

kdims = element.node_type.kdims
nodes = element.node_type(node_data, kdims=kdims, vdims=element.nodes.vdims)
edges = element.edge_type(paths)
sankey = Sankey((element.data, nodes, edges), compute=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is rather odd how the Sankey element uses this operation in its constructor which then returns a Sankey object! It would be better to have a method just computing the information needed by Sankey and using that in _process.

_, y0, _, y1 = self.p.bounds
py = self.p.node_padding

def initializeNodeBreadth():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to avoid these kinds of inline function - explicit arguments are nicer than closures...

from .util import quadratic_bezier


class layout_sankey(Operation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea would be to move this code to the element class itself e.g as a number of classmethods used by the constructor to compute the layout. I suppose the parameters of this operation could then be plot parameters. What other use does this operation have other than to create sankey elements which you can just do using Sankey?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mention this as it is a little confusing to see such a long operation in elements.

@@ -18,6 +18,9 @@ class StatisticsElement(Chart):

__abstract = True

# Ensure Interface does not add an index
_auto_indexable_1d = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is.


def get_extents(self, element, ranges):
"""
A Chord plot is always drawn on a unit circle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a Chord plot!


_style_groups = dict(GraphPlot._style_groups, quad='nodes', text='label')

_draw_order = ['patches', 'multi_line', 'quad', 'text']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'multi_line' used?


def _patch_hover(self, element, data):
"""
Replace edge start and end hover data with label_index data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth checking if ChordPlot needs this too...

@@ -83,7 +85,7 @@ def get_color_cycle():

# Define Palettes and cycles from matplotlib colormaps
Palette.colormaps.update({cm: plt.get_cmap(cm) for cm in plt.cm.datad
if 'spectral' not in cm and 'Vega' not in cm})
if not ('spectral' in cm or (mpl_ge_200 and 'Vega' in cm))})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant to Sankey or did this diff come from somewhere else?

@@ -71,7 +71,7 @@ def test_graph_inspection_policy_edges(self):
renderer = plot.handles['glyph_renderer']
hover = plot.handles['hover']
self.assertIsInstance(renderer.inspection_policy, EdgesAndLinkedNodes)
self.assertEqual(hover.tooltips, [('start', '@{start}'), ('end', '@{end}')])
self.assertEqual(hover.tooltips, [('start', '@{start_values}'), ('end', '@{end_values}')])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is these names are a convention established by holoviews for use in bokeh's hover tools. It would be good either to document this in this PR or open an issue as a reminder to document these names.

"edges = pd.DataFrame(data['links'])\n",
"nodes = pd.DataFrame(data['nodes'])\n",
"edges['source'] = nodes['name'].values[edges['source'].values]\n",
"edges['target'] = nodes['name'].values[edges['target'].values]\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are rather obscure. It would be nice if they weren't needed.

@jlstevens
Copy link
Contributor

I've made a number of comments, most of which are fairly minor (e.g outdated docstrings). The biggest issue right now is that the sankey element is currently more operation than element which is surprising/unusual for something living in the element module.

If that is addressed, I am happy to see the PR merged once the tests are passing.

@jlstevens
Copy link
Contributor

Looks good. Merging.

@jlstevens jlstevens merged commit ed031ad into master Apr 8, 2018
@philippjfr philippjfr deleted the sankey branch July 4, 2018 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants