This notebook produces the Sankey diagrams in the paper. To get it set up, see [README](README.md).

In [1]:
import pandas as pd
import networkx as nx

from sankeyview import *
from sankeyview.jupyter import show_sankey

ModuleNotFoundError: No module named 'sankeyview'

## Load dataset and set up hierarchies and partitions

In [None]:
dataset = Dataset.from_csv('fruit_flows.csv', 'fruit_processes.csv')

These hierarchies and partitions are used later:

In [None]:
tree_func = nx.DiGraph()
tree_func.add_edges_from([
        ('*', x) for x in ('inputs', 'composting', 'landfill', 'composting', 'growers', 'consumers')
    ] + [
        ('growers', x) for x in ('allotment', 'farm')
    ] + [
        ('farm', x) for x in ('small farm', 'large farm')
    ] + [
        ('composting', x) for x in ('composting process', 'composting stock')
    ])
h_func = Hierarchy(tree_func, 'function')

farm_ids = ['farm{}'.format(i) for i in range(1, 16)]

In [None]:
farm_partition_2 = (Partition([Group('other', [('process', farm_ids[2:])])]) + 
                   Partition.Simple('process', farm_ids[:2]) 
                   )

farm_partition_5 = (Partition([Group('Other farms', [('process', farm_ids[5:])])]) + 
                   Partition.Simple('process', farm_ids[:5])
                   )

partition_fruit = Partition.Simple('material', ['bananas', 'apples', 'oranges'])

partition_sector = Partition.Simple('process.sector', ['government', 'industry', 'domestic'])

In [None]:
default_margins = {'top': 15, 'right': 80, 'bottom': 5, 'left': 80}

# Figure 4: Diagram structure

This is one possible Sankey Diagram Definition (SDD) based on the fruit dataset:

In [None]:
nodes = {
    'inputs':     ProcessGroup(['inputs']),
    'compost':    ProcessGroup(h_func('composting stock')),
    'landfill':   ProcessGroup('function == "landfill" and location != "London"'),
    'composting': ProcessGroup(h_func('composting process') + ' and location != "London"'),
    'farms':      ProcessGroup(h_func('growers')),
    'eat':        ProcessGroup('function == "consumers" and location != "London"'),
}
ordering = [
    ['inputs', 'compost'],
    ['farms'],
    ['eat'],
    ['landfill', 'composting'],
]
bundles = [
    Bundle('eat', 'landfill'),
    Bundle('eat', 'composting'),
    Bundle('inputs', 'farms'),
    Bundle('compost', 'farms'),
    Bundle('farms', 'eat'),
    Bundle('farms', 'compost'),
    Bundle('composting', 'compost'),
]
sdd = SankeyDefinition(nodes, bundles, ordering)

Which leads to this Sankey diagram:

In [None]:
show_sankey(sdd, dataset, width=500, height=250, override_node_layout={
        '__farms_compost_1^*': {'y': 0.75},
        '__farms_compost_0^*': {'y': 0.75},
    }, override_link_layout={
        "__composting_compost_0^*-compost^*-*": {'r0': 20.5, 'r1': 20.5},
    }, margins=default_margins
).auto_save_svg('structure_sankey_1.svg')

> *The layout overrides are currently experimental and undocumented features of the Sankey layout code.*

Other options are possible, for example partition together the 'farms' and 'eating' processes, and starting with 'composting' on the left:

In [None]:
nodes = {
    'inputs':      ProcessGroup(['inputs']),
    'compost':     ProcessGroup(h_func('composting stock')),
    'farms & eat': ProcessGroup(h_func('growers', 'consumers')),
    'landfill':    ProcessGroup('function == "landfill"'),
    'composting' : ProcessGroup('function == "composting process"'),
}
ordering = [
    ['inputs', 'composting'],
    ['compost'],
    ['farms & eat'],
    ['landfill'],
]
bundles = [
    Bundle('inputs', 'farms & eat'),
    Bundle('compost', 'farms & eat'),
    Bundle('farms & eat', 'compost'),
    Bundle('farms & eat', 'landfill'),
    Bundle('farms & eat', 'composting'),
    Bundle('composting', 'compost'),
]
sdd = SankeyDefinition(nodes, bundles, ordering)

show_sankey(sdd, dataset, width=500, height=250, override_node_layout={
        'compost^*': {'y': 0.55},
        'composting^*': {'y': 0.55},
        'landfill^*': {'y': 0}
    }, margins=default_margins
).auto_save_svg('structure_sankey_2.svg')

## Figure 5: Partitioning processes

The definitions in Figure 4 produce Sankey diagrams without much detail, as all of the underlying processes in each node are grouped together. In fact there is a direct correspondence between the high-level graph and the resulting Sankey digram. To see more detail, we can specify partitions for each node (these are defined above):

In [None]:
nodes = {
    'inputs':     ProcessGroup(['inputs']),
    'compost':    ProcessGroup(h_func('composting stock')),
    'farms':      ProcessGroup(h_func('growers'), partition=farm_partition_2, title='farms'),
    'eat':        ProcessGroup('function == "consumers" and location != "London"',
                               partition=partition_sector, title='eat'),
    'landfill':   ProcessGroup('function == "landfill" and location != "London"'),
    'composting': ProcessGroup('function == "composting process" and location != "London"'),
}
ordering = [
    ['inputs', 'compost'],
    ['farms'],
    ['eat'],
    ['landfill', 'composting'],
]
bundles = [
    Bundle('inputs', 'farms'),
    Bundle('compost', 'farms'),
    Bundle('farms', 'eat'),
    Bundle('farms', 'compost'),
    Bundle('eat', 'landfill'),
    Bundle('eat', 'composting'),
    Bundle('composting', 'compost'),
]
sdd = SankeyDefinition(nodes, bundles, ordering)

show_sankey(sdd, dataset, width=500, height=250, margins=default_margins
).auto_save_svg('partition_processes_sankey.svg')

## Figure 6: Waypoints adding extra layers

To introduce a new layer between the `farms` and `eat` nodes, we introduce a "waypoint" in the bundle from farm to eat. For simplicity, only the part of the Sankey diagram between `farms` and `eat` is shown in this figure:

In [None]:
from datetime import date
partition_weeks = Partition([
    Group(day, [('time', [date(2011, 8, d).strftime('%Y-%m-%d')
                          for d in range(1, 32) if date(2011, 8, d).strftime('%a') == day])])
    for day in ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
])
nodes = {
    'farms': ProcessGroup(h_func('growers', 'composting stock'), dataset.partition('source.location'),
                          title='farm location'),
    'eat':   ProcessGroup(h_func('consumers'), dataset.partition('target.location'), title='consumer location'),
    'waypoint1': Waypoint(dataset.partition('material'), title='material'),
    'waypoint2': Waypoint(partition_weeks, title='day of week'),
    'waypoint3': Waypoint(dataset.partition('target.sector'), title='consumer sector'),

}
bundles = [
    Bundle('farms', 'eat', waypoints=['waypoint1', 'waypoint2', 'waypoint3']),
]
ordering = [
    ['farms'], ['waypoint1'], ['waypoint2'], ['waypoint3'], ['eat'],
]
sdd = SankeyDefinition(nodes, bundles, ordering)
show_sankey(sdd, dataset, width=650, height=250,
            margins={'top': 15, 'right': 110, 'bottom': 5, 'left': 80}
).auto_save_svg('waypoint_layers_sankey.svg')

## Figure 7: Partitioning flows
So far flows have been distinguished based only on their start and end points in the diagram. Bundles can also be partitioned based on attributes of the underlying flows. Here the flows are partitioned based on the flow `material` attribute:

In [None]:
nodes = {
    'farms': ProcessGroup(h_func('growers', 'composting stock'), dataset.partition('source.location')),
    'eat':   ProcessGroup(h_func('consumers'), dataset.partition('target.sector')),
    'waypoint1': Waypoint(dataset.partition('material')),
 
}
bundles = [
    Bundle('farms', 'eat', waypoints=['waypoint1',]),
]
ordering = [
    ['farms'], ['waypoint1'], ['eat'],
]
sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=partition_fruit)
show_sankey(sdd, dataset, width=500, height=250, margins=default_margins
).auto_save_svg('partition_flows_sankey_1.svg')

In [None]:
sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=dataset.partition('source.location'))
show_sankey(sdd, dataset, width=500, height=250, margins=default_margins,
).auto_save_svg('partition_flows_sankey_2.svg')

## Figure 8: Final Sankey diagram

In [None]:
nodes = {
    'inputs':     ProcessGroup(['inputs'], title='Other inputs'),
    'compost':    ProcessGroup(h_func('composting stock'), title='Compost'),
    'farms':      ProcessGroup(h_func('growers'), partition=farm_partition_5),
    'eat':        ProcessGroup('function == "consumers" and location != "London"', partition=partition_sector,
                      title='consumers by sector'),
    'landfill':   ProcessGroup('function == "landfill" and location != "London"', title='Landfill'),
    'composting': ProcessGroup('function == "composting process" and location != "London"', title='Composting'),

    'fruit':        Waypoint(partition_fruit, title='fruit type'),
    'w1':           Waypoint(direction='L', title=''),
    'w2':           Waypoint(direction='L', title=''),
    'export fruit': Waypoint(Partition.Simple('material', ['apples', 'bananas', 'oranges'])),
    'exports':      Waypoint(title='Exports'),
}
ordering = [
    [[], ['inputs', 'compost'], []],
    [[], ['farms'], ['w2']],
    [['exports'], ['fruit'], []],
    [[], ['eat'], []],
    [['export fruit'], ['landfill', 'composting'], ['w1']],
]
bundles = [
    Bundle('inputs', 'farms'),
    Bundle('compost', 'farms'),
    Bundle('farms', 'eat', waypoints=['fruit']),
    Bundle('farms', 'compost', waypoints=['w2']),
    Bundle('eat', 'landfill'),
    Bundle('eat', 'composting'),
    Bundle('composting', 'compost', waypoints=['w1', 'w2']),
    Bundle('farms', Elsewhere, waypoints=['exports', 'export fruit', ]),
]
sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=dataset.partition('material'))

show_sankey(sdd, dataset, width=800, height=500, override_link_layout={
        "farms^Other farms-w2^*-compost": {'r0': 12, 'r1': 12},
        "farms^farm1-w2^*-compost": {'r0': 9, 'r1': 9},
        "farms^farm2-w2^*-compost": {'r0': 8, 'r1': 8},
        "farms^farm3-w2^*-compost": {'r0': 7, 'r1': 7},
        "farms^farm4-w2^*-compost": {'r0': 6, 'r1': 6},
        "farms^farm5-w2^*-compost": {'r0': 5, 'r1': 5},
    }, margins=default_margins
).auto_save_svg('final_diagram.svg')