Skip to content
Go to file
Cannot retrieve contributors at this time
230 lines (165 sloc) 9.09 KB
output: html_document
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message=FALSE, echo=TRUE)
## Migration Flows Sankey
As well as supporting the generation of parameterised reports, reproducible workflows also support the automated generation of (templated) code that implements interactive charts.
For example, inspired by Oli Hawkins ([Visualising migration between the countries of the UK]( [[demo](]), we can generate interactive Sankey plots using the `googleVis` or `rCharts` packages.
Note that the packages generate standalone HTML/Javascript code to implement the charts, code that can be used elsewhere or embedded in other HTML pages.
#The RCharts package throws a wobbly if we don't load knitr in explicitly
#Data from ONS:
regionsquarematrix2015 = read_csv("../data/laandregionsquarematrices2015/regionsquarematrix2015.csv", skip = 8)
#The data has thousand separator commas - so remove them and convert to numeric
#There is probably a more idiomatic way of doing this using tidyr...
regionsquarematrix2015 = cbind(regionsquarematrix2015[1:2],
function(x) as.numeric(gsub(",", "", x)) ) )
The Sankey diagram generators seem to expect the data to be provided as edge lists (*from*, *to*, *value*).
#Melt the data (wide to long) so we have from/to/value flows
rr=regionsquarematrix2015 %>% gather(source, value, 3:ncol(.))
#Merge in names for the source areas
rr=merge(rr, unique(data.frame(SOURCE=rr$DESTINATION,
#The Sankey diagram generators dislike cycles - so set unique labels for from/to
rr$SOURCE=paste0(rr$SOURCE,' ')
#Drop rows that have no flow associated with them
colnames(rr) = c("source","targetName","target","value","sourceName")
rr = rr[,c("sourceName","targetName","source","target","value")]
### `googleVis`
[`googleVis`]( is an R package that provides an R wrapper around/interface to *Google Chart tools*.
We can generate a Sankey diagram using `googleVis` from a data frame representing an edge list in the following way:
```{r, results='asis'}
#For use in Rmd/knitr, set the block parameter: results='asis'
#Generate the Sankey diagram HTML
#And render it
Notwithstanding the availability of `from`, `to` and `weight` parameters for specifying column names, the function appears to want the dataframe passed in in a particular way, specifically `from`, `to`, `weight`.
According to the [Google Sankey diagram documentation](, node labels, as well as the color of nodes and edges, can be controlled; see [Sankey diagrams with googleVis]( or [this StackOverflow answer]( for an example of how to pass the parameters in.
To color the nodes, we need to provide node colors in the order in which the nodes are added to the chart. We can find the node order by interleaving the source and target columns:
nodeOrder=unique(c(rbind(rr$source, rr$target)))
Add some node colors:
```{r , results='asis'}
#Now we need to get the color for the node order.
nodeColor=unname(colormapl[substring(nodeOrder, 1, 1)])
colors_node_array = paste0("[", paste0("'", nodeColor,"'", collapse = ','), "]")
opts = paste0("{ node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rr[,c('source','target','value')], options=list( sankey=opts))
Add some edge colors, again in the order of edges supplied.
```{r, results='asis'}
#Use the originating node colour for the edge
opts = paste0("{ link: { colorMode: 'source' },node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rr[,c('source','target','value')], options=list(sankey=opts))
The labels are the node values. If we map the identifiers to (distinct) labels, they make the chart more informative.
s=gvisSankey(rr[,c('sourceName','targetName','value')], options=list(sankey=opts))
Generate a view of the chart that omits flows within the same country.
#Limit the rows
rr2=rr[substring(rr$target, 1, 1)!=substring(rr$source, 1, 1),]
#ABstract out the code that allows us to generate a new color array
#Interleave the nodes from the edgelist in the order they are introduced
nodeOrder=unique(c(rbind(df[[source]], df[[target]])))
#Generae a color mapping from the country indicator at the start of the country/region code
nodeColor=unname(colormapl[substring(nodeOrder, 1, 1)])
#Get the data in the form that the Sankey widget wants it...
colors_node_array = paste0("[", paste0("'", nodeColor,"'", collapse = ','), "]")
opts = paste0("{ link: { colorMode: 'source' }, node: { colors: ", colors_node_array ," } }" )
s=gvisSankey(rr2[,c('sourceName','targetName','value')], options=list(sankey=opts))
Finally, how about we group the (English) regional flows to a single English flow.
countrymap=c(E='England',N='Northern Ireland',S='Scotland',W='Wales')
rr2$countrysource=countrymap[substring(rr2$source, 1, 1)]
rr2$countrytarget=paste0(countrymap[substring(rr2$target, 1, 1)],' ')
rrg = rr2 %>%
group_by(countrysource,countrytarget) %>%
summarise(value = sum(value))
#Generate new color array
opts = paste0("{ link: { colorMode: 'source' }, node: { colors: ", colors_node_array ," } }" )
We can also change the edge color to a gradient between the source and target color values, but this just looks a horrible mess to me!
```{r, results='asis'}
opts = paste0("{ link: { colorMode: 'gradient' }, node: { colors: ", colors_node_array ," } }" )
### `rCharts`
Generate a Sankey diagram using `rCharts`:
```{r, results='asis'}
#Based on
#There is also a particle flow enhancement demoed at
sankeyPlot2 <- rCharts$new()
data = rr[,c('source','target','value')],
nodeWidth = 15,
nodePadding = 10,
layout = 32,
width = 750,
height = 500
sankeyPlot2$show('iframesrc', cdn = TRUE)
#Note that at the time of writing, the rCharts_d3_sankey bakes in the http protocol for loading three
#resources that breaks if the output HTML page is served as https.
Some control over colouring can be introduced by extending the template, as demonstrated in [this Stack Overflow answer]( (following [this original explanation]( from @timelyportfolio, the author of the rCharts Sankey package.)
A wide range of interactive chart types can be generated in this way. The [*htmlwidgets for R*]( project represents the latest iteration in the production of interactive Javascript widgets for use in RMarkdown documents and Shiny applications.
### `sankeyD3`
This seems to be the most recent attempt at an R/Sankey diagram library, again using D3.js.
The library appears to require nodes being identified as consecutive integers, starting at 0.
```{r, results='asis'}
#Get a mapping from codes to numeric node IDs
#Need to interleave the nodes appropriately
rrd=data.frame(rid= c(rbind(regionsquarematrix2015$Region,
paste0(regionsquarematrix2015$Region,'_'))) )
#Map the edges
sankeyNetwork(Links = rr, Nodes = rrd, Source = "source", title="Migration Flows",
Target = "target", Value = "value", NodeID = "name",
fontSize = 12, nodeWidth = 30,showNodeValues = FALSE)
You can’t perform that action at this time.