In [1]:
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.offline as offline

offline.init_notebook_mode(connected=True)

**Sankey Diagrams**

Sankey diagrams are a useful way to display movement between locations/nodes.

Here an Airline's flight routes for some cities of USA has been given.

**Create a list for all the locations/nodes**

In [2]:
locations = ['Cleveland', 
             'Detroit', 
             'Indianapolis', 
             'Nashville', 
             'NewYork', 
             'WashingtonD.C.',
             'Harrisburg'
            ]

### Define the Sankey diagram
The nodes will be the locations we have defined above. The ordering of these nodes matters as they will be assigned an index which will be used to define the links in the Sankey diagram.

The source and target are effectively the nodes at either end of the link. The value represents the weight of the link. So taking the first index of each list, we get (source=0, target=4, value=2) - this means two flights from Cleveland to NewYork

In [3]:
data = go.Sankey(node = dict(label = locations),
                 
                 link = dict(source = [0, 0, 2, 1, 1, 3, 4, 2, 5, 3], 
                             target = [4, 5, 4, 4, 5, 4, 6, 5, 6, 5], 
                             value =  [2, 2, 2, 1, 1, 2, 3, 2, 3, 1]
                            )
                )

In [4]:
layout =  dict(title = 'Basic Sankey Diagram',
               font = dict(size = 10)
)

**Plot the diagram**



In [5]:
fig = dict(data=[data], 
           layout=layout)

offline.iplot(fig)

**Using a real dataset**

Download the dataset here: https://github.com/plotly/dash-app-datasets/blob/master/scottish-votes.csv

In [6]:
import pandas as pd

data = pd.read_csv('datasets/scottish-votes.csv')
data


Unnamed: 0,Source,Target,Value,Color,"Node, Label",Link-Color
0,0,5,20,#F27420,Remain+No – 28,"rgba(253, 227, 212, 0.5)"
1,0,6,3,#4994CE,Leave+No – 16,"rgba(242, 116, 32, 1)"
2,0,7,5,#FABC13,Remain+Yes – 21,"rgba(253, 227, 212, 0.5)"
3,1,5,14,#7FC241,Leave+Yes – 14,"rgba(219, 233, 246, 0.5)"
4,1,6,1,#D3D3D3,Didn’t vote in at least one referendum – 21,"rgba(73, 148, 206, 1)"
5,1,7,1,#8A5988,46 – No,"rgba(219, 233, 246,0.5)"
6,2,5,3,#449E9E,39 – Yes,"rgba(250, 188, 19, 1)"
7,2,6,17,#D3D3D3,14 – Don’t know / would not vote,"rgba(250, 188, 19, 0.5)"
8,2,7,2,,,"rgba(250, 188, 19, 0.5)"
9,3,5,3,,,"rgba(127, 194, 65, 1)"


**Plot the Sankey diagram**

Format the nodes:

* pad determines the amount of padding between the nodes
* thickness sets the width of the node

The 'Node, Label' field is meant to be a list of all the nodes contains several nan values in the dataframe. We drop those values when using it in our diagram

In [7]:
data_trace = go.Sankey(node = dict(pad = 8,
                                   thickness = 20,
                                   
                                   label =  data['Node, Label'].dropna(),
                                   
                                   color = data['Color'].dropna()),
                       
                       link = dict(source = data['Source'],
                                   target = data['Target'],
                                   value = data['Value'],
                                   color = data['Link-Color'])
                        )

In [8]:
layout =  dict(title = "scottish referendum voters")

In [9]:
fig = dict(data=[data_trace], 
           layout=layout)

offline.iplot(fig)