# Interactive Innovation Mapping with Python

Bring some life to your innovation mapping notebook analysis with interactive data viz! 🕹

---

This tutorial covers a few examples of interactive data visualisation with Python that can be used to create rich notebooks or prototypes for web visualisations.

In this tutorial, we are going to be based on Bokeh and will make use of HoloViews, GeoViews and Datashader. There are many options for interactive data visualisation with Python however, including Altair, Plotly, Dash, and even Matplotlib, so try them out too!

## Preamble

In [None]:
%load_ext autoreload
%autoreload 2
# install im_tutorial package
# !pip install git+https://github.com/nestauk/im_tutorials.git
# !pip install python-louvain

In [None]:
# useful Python tools
import itertools
import collections

# matplotlib for static plots
import matplotlib.pyplot as plt
# networkx for networks
import networkx as nx
# numpy for mathematical functions
import numpy as np
# pandas for handling tabular data
import pandas as pd
# seaborn for pretty statistical plots
import seaborn as sns

from im_tutorials.data.gtr import gtr_table, gtr_link_table, gtr_table_list

pd.set_option('max_columns', 99)

## GtR Projects

In [None]:
gtr_projects_df = gtr_table('projects')
gtr_funds_df = gtr_table('funds')
gtr_funds_link_table = gtr_link_table('funds')

- Join funding table to link table to get project ids. Groupby project to get start and end date, sum of funding.
- Group leads and collaborators and create network
- Join with project descriptions and make collaboration network with SDGs

In [None]:
gtr_funds_df.head(2)

In [None]:
gtr_funds_link_table.head(2)

In [None]:
gtr_funds_df = gtr_funds_df.merge(gtr_funds_link_table, left_on='id', right_on='id')
gtr_funds_df = gtr_funds_df.drop_duplicates(['project_id', 'amount'])

In [None]:
gtr_funds_df.head()

In [None]:
print('Earliest start date:', gtr_funds_df['start'].min())
print('Earliest end date:', gtr_funds_df['end'].min())
print('\n')
print('Latest start date:', gtr_funds_df['start'].max())
print('Latest end date:', gtr_funds_df['end'].max())

In [None]:
gtr_funds_df['start'].dt.year.value_counts()[:15]

In [None]:
gtr_funds_df['end'].dt.year.value_counts()

In [None]:
min_start_year = 2006
max_start_year = 2019
max_end_year = 2030

gtr_funds_df = gtr_funds_df[(gtr_funds_df['start'].dt.year >= min_start_year) & 
                            (gtr_funds_df['start'].dt.year < max_end_year)]
gtr_funds_df = gtr_funds_df[(gtr_funds_df['end'].dt.year <= max_end_year)]

In [None]:
gtr_projects_funds_df = gtr_projects_df.merge(
    gtr_funds_df, left_on='id', right_on='project_id', 
    how='left', suffixes=('_proj', '_fund'))

gtr_project_funds_df = gtr_projects_funds_df.drop_duplicates(subset=['project_id'])
gtr_project_funds_df['start_year'] = gtr_project_funds_df['start_fund'].dt.year

In [None]:
# create a column that just has the start year
gtr_project_funds_df['start_year'] = gtr_project_funds_df['start_fund'].dt.year
# group funds by start of year and funder
amount_year_sum = gtr_project_funds_df.groupby(['start_year', 'leadFunder'])['amount'].sum()
amount_year_sum = amount_year_sum.loc[2006:2018]

In [None]:
amount_year_sum.head(10)

In [None]:
amount_year_sum = amount_year_sum.unstack()
amount_year_sum.head()

In [None]:
rolling_window = 3
amount_year_sum_rolling = amount_year_sum.rolling(rolling_window).mean()

#### The Usual Suspect

**Matplotlib** is a very popular library for plotting in Python. It can be used to produce publication worthy plots, but is typically restricted to producing static images.

Here we will plot the total amounts awarded by each funder in each year, with a legend.

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(amount_year_sum_rolling, marker='o')
ax.legend(amount_year_sum_rolling.columns)
ax.set_xlabel('Year')
ax.set_ylabel('Total Funding (£)')
ax.set_title('Total Funding Over Time');

### Next Level

Matplotlib is great for scientific plots, but for interaction we need to look elsewhere. Here we are going to try **Bokeh** to produce an interactive version of exactly the same plot.

In [None]:
# need these to make plot visible in notebook
from bokeh.io import show, output_notebook
# figure object to 
from bokeh.plotting import figure
# basic bokeh building blocks
from bokeh.models import ColumnDataSource, Circle, Line
from bokeh.models import PrintfTickFormatter, HoverTool
from bokeh.palettes import Category10
output_notebook()

In [None]:
n_funders = len(amount_year_sum_rolling.columns)
cmap = Category10[n_funders]

In [None]:
# create a figure object with desired dimensions
p = figure(width=550, height=350,
           title='Total Awards by Funder over Time')

# loop through columns, select color, plot line and circles
for i, c in enumerate(amount_year_sum.columns):
    color = cmap[i]
    p.line(
        x=amount_year_sum_rolling.index.values, 
        y=amount_year_sum_rolling[c], 
        legend=c,
        color=color,
        line_width=2,
        alpha=0.7,
        name=c,
        muted_alpha=0.1,
        muted_color=color
          )
    p.circle(
        x=amount_year_sum_rolling.index.values, 
        y=amount_year_sum_rolling[c], 
        legend=c,
        color=color, 
        name=c,
        muted_alpha=0.1, 
        muted_color=color
    )

# build a hover tool that will display funding amount (y value), 
# year (x value) and funding amount
hover = HoverTool(tooltips=[('Amount', '£@y{( 0.00 a)}'),
                            ('Year', '@x{%F}'),
                            ('Funder', '$name')],
                 )
p.add_tools(hover)

# add labels and formatting
p.xaxis.axis_label = 'Year'
p.yaxis.axis_label = 'Total Funding'    
p.yaxis[0].formatter = PrintfTickFormatter(format="£%.1e")
# add interactive legend
p.legend.click_policy = "mute"
p.legend.location = 'top_left'
p.legend.label_text_font_size = '6pt'
    
show(p)

### Bokeh Scatter (Circle)

We can also use Bokeh to make simple interactive plots such as a scatter diagram.

In [None]:
duration = (gtr_funds_df['end'] - gtr_funds_df['start']).dt.days / 365.25
amount = gtr_funds_df['amount']

p = figure(width=550, height=350, y_axis_type="log")
p.grid.visible = False

p.circle(x=duration, y=amount, size=1, alpha=0.05)

p.xaxis.axis_label = 'Duration (years)'
p.yaxis.axis_label = 'Funding Amount (£)'

p.add_tools(HoverTool(
    tooltips=[("Duration", "$x"), ("Amount", "$y")],
    mode="mouse", point_policy="follow_mouse"
))

show(p)

## Declarative Data Visualisation

What is it?

> By declarative, we mean that while plotting any chart, you only need to declare links between data columns to the encoding channels, such as x-axis, y-axis, colour, etc. and rest all of the plot details are handled automatically. Let’s understand it by an example. 

[https://www.analyticsvidhya.com/blog/2017/12/introduction-to-altair-a-declarative-visualization-in-python/](https://www.analyticsvidhya.com/blog/2017/12/introduction-to-altair-a-declarative-visualization-in-python/)

Instead of having to faff around with lots of details, declarative plotting libraries do lots of the work behing the scenes and are loaded with good default settings. These can normally be overridden by just changing some options, making the code a lot cleaner, and the workflow a lot faster for exploratory analysis.

### Declarative Hexbin

For our declarative data visualisation, we are going to use **Holoviews**. Holoviews is a library that simply puts your data, the relationships between the data and the plot, and any plot settings you want to apply in to a structure that is ready to be plotted. 

It can't actually do the plotting on its own, so we need to specify a "back end" to do the work. Because of this, Holoviews can actually be initialised with different plotting libraries as the back end. In this case we will use Bokeh as we want to retain the interactivity, but we could also have used Matplotlib.

In [None]:
import holoviews as hv
hv.extension('bokeh')

In [None]:
mask = amount > 0
df_hex = pd.DataFrame({'Project Duration (years)': duration,
                   'Funding Amount': np.log10(amount[mask])})

hx = hv.HexTiles(df_hex)
hx.opts(width=550, height=350, logz=True, yformatter='£10^%d',
        tools=['hover'], hover_color='pink', hover_alpha=0.7,
        title='Funding Amount by Project Duration')

To see the plotting options for a specific Holoviews chart type, you can interrogate the object with `?` notation.

```python
from holoviews import opts
opts.HexTiles?
```

While we're at it, let's see what it would have looked like to make the scatter plot above with Holoviews.

In [None]:
scat = hv.Scatter(df_hex.sample(frac=0.1))
scat.opts(width=550, height=350, alpha=0.1)

Holoviews is also great at handling layouts. Let's overlay a scatter on top of the hexbin.

In [None]:
scat = hv.Scatter(df_hex.sample(frac=0.05))
scat.opts(alpha=0.3, color='white', size=0.5)

hx * scat

## Maps

A popular type of interactive chart is a map, due to the ability to zoom and pan. This makes sense as large geographic datasets can have both high level and low level structure.

Let's pull the details of the organisations in Gateway to Research and explore their locations.

In [None]:
gtr_org_locs_df = gtr_table('organisations_locations')
gtr_org_locs_df.head()

# drop orgs where no location is available
gtr_org_locs_df = gtr_org_locs_df[(~pd.isnull(gtr_org_locs_df['latitude'])) &
                                  (~pd.isnull(gtr_org_locs_df)['longitude'])]

### Folium

**Folium** is a Python wrapper around the web visualisation mapping library Leaflet.js. It can be easily used to produce familiar looking visualisations, using openly available map tile sets. Let's use it to make a cluster marker style plot.

In [None]:
from im_tutorials.data.gis import country_basic_info
country_df = country_basic_info()

In [None]:
import folium
from folium.plugins import MarkerCluster, FastMarkerCluster

In [None]:
lat_c, lng_c = country_df.set_index('alpha3Code').loc['GBR']['latlng']

cluster_map = folium.Map(location=[lat_c, lng_c], zoom_start=5, width=550, height=550)
cluster_map.add_child(FastMarkerCluster(data=gtr_org_locs_df[['latitude', 'longitude']].values.tolist()))

Folium can also be used to plot vectors, rasters and choropleths.

**Tasks**

1. Change the marker icon.
2. Remove the organisations at duplicate addresses.

## Datashader

Does one job. Does it well.

Sometimes, you may have a lot of points to plot. We saw earlier with Bokeh that it took a while to plot a few thousand scatter points. We also saw that the hexbin did a much better job of showing the information at a low zoom level. Datashader solves both of these problems.

Datashader takes 2 dimensional point data and rapidly turns it into an intesnity map image. Let's try it on the same data as above.

We're also going to make use of Geoviews which adds some extra functionality in the Bokeh/Datashader/Holoviews family for working with geographical data.

In [None]:
import datashader as ds
# import datashader.transfer_functions as tf
# from datashader.colors import colormap_select, Greys9
from datashader.utils import lnglat_to_meters as webm

from holoviews.element import tiles
from holoviews.operation.datashader import datashade, dynspread
from holoviews.streams import RangeXY

import geoviews as gv
import cartopy.crs as crs

First we need to convert our lat long into a map projection.

In [None]:
gtr_org_locs_df['easting'], gtr_org_locs_df['northing'] = webm(
    gtr_org_locs_df['longitude'], gtr_org_locs_df['latitude'])

Next we fetch some background map tile images.

In [None]:
url = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}.jpg'
map_tiles = gv.WMTS(url, crs=crs.GOOGLE_MERCATOR)

#### Colour - An Aside

So far, we've been using palettes and colourmaps that come with the plotting libraries. There are also other libraries that provide premade palettes. 

**Colorcet** is one of these and it focuses on providing *perceptually linear* palettes. Perceptually linear palettes are incrementally changing and account for the percieved difference in intensity of different colours by the human eye to provide a colour scale that is linear.

In [None]:
import colorcet

Now let's datashade these organisation locations!

In [None]:
width=600
height=600
cmap = colorcet.kbc

opts = dict(width=width, height=height, x_sampling=1, y_sampling=1, cmap=cmap, dynamic=False)
tile_opts  = dict(width=width, height=height, xaxis=None, yaxis=None, bgcolor='white', show_grid=False)

def make_view(x_range, y_range, **kwargs):
    tiles = map_tiles.options(alpha=0.5, **tile_opts)
    points = hv.Points(gtr_org_locs_df, ['easting', 'northing'])
    d = dynspread(datashade(points, x_range=x_range, y_range=y_range, **opts), shape='circle', threshold=.1)
    return d * tiles

In [None]:
dmap = hv.DynamicMap(make_view, streams=[RangeXY()])
plot = hv.renderer('bokeh').instance(mode='server').get_plot(dmap)
dmap

## Visualising Text

We have a set of documents that relate to the United Nations Sustainable Development Goals (SDGs). We have seen how to turn text into vectors, but how can we visualise large dimensional text?

Here we are going to use Bokeh, as well as TSNE, a useful dimensionality reduction algorithm for visualisation.

In [None]:
from im_tutorials.data.sdg import sdg_web_articles

In [None]:
df_sdg = sdg_web_articles()

In [None]:
df_sdg.head()

In [None]:
sdg_definitions = {
     1: '1. No Poverty',
     2: '2. Zero Hunger',
     3: '3. Good Health & Well-being',
     4: '4. Quality Education',
     5: '5. Gender Equality',
     6: '6. Clean Water & Sanitation',
     7: '7. Affordable & Clean Energy',
     8: '8. Decent Work & Economic Growth',
     9: '9. Industry, Innovation & Infrastructure',
     10: '10.  Reduced Inequalities',
     11: '11.  Sustainable Cities & Communities',
     12: '12.  Responsible Consumption & Production',
     13: '13.  Climate Action',
     14: '14.  Life Below Water',
     15: '15.  Life on Land',
     16: '16.  Peace, Justice & Strong Institutions',
     17: '17.  Partnerships for the Goals'
}

In [None]:
df_sdg.head()

In [None]:
def remove_goal(l, goal=17):
    new_goals = [g for g in l if g != goal]
    return new_goals

df_sdg['sdg_goals'] = df_sdg['sdg_goals'].apply(remove_goal)

df_sdg['n_goals'] = [len(x) for x in df_sdg['sdg_goals']]
df_sdg = df_sdg[(df_sdg['n_goals'] > 0) & (df_sdg['n_goals'] < 4)]

df_sdg = df_sdg[df_sdg['text'].str.len() > 140]
df_sdg = df_sdg.drop_duplicates('text')
df_sdg = df_sdg.drop('index', axis=1)
df_sdg = df_sdg.reset_index()

In [None]:
df_sdg.shape

### Text Preprocessing

In [None]:
from im_tutorials.data.sdg import sdg_web_articles
from im_tutorials.features.text_preprocessing import *
from itertools import chain

In [None]:
tokenized = [list(chain(*tokenize_document(document))) for document in df_sdg['text'].values]

In [None]:
from collections import Counter
from itertools import chain

term_counts = Counter(chain(*tokenized))
term_counts.most_common(30)

In [None]:
stop_words = ['development', 'sdg', 'new', 'global', 'also', 'including', 'support', 'international',
             'report', 'implementation', 'national', 'said', 'agenda', 'meeting', 'regional']
text_clean = [' '.join([t for t in d if t not in stop_words]) for d in tokenized]

In [None]:
text_clean[0]

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from sklearn.manifold import TSNE, Isomap

In [None]:
# apply tfidf
tfidf = TfidfVectorizer(text_clean, min_df=10, max_df=0.5)
tfidf_vecs = tfidf.fit_transform(text_clean)

# reduce dimensions to 30
svd = TruncatedSVD(n_components=30)
svd_vecs = svd.fit_transform(tfidf_vecs)

# use tsne to create x, y positions
tsne = TSNE(n_components=2)
tsne_vecs = tsne.fit_transform(svd_vecs)

In [None]:
from bokeh.palettes import Category20_16

In [None]:
df_sdg.reset_index(inplace=True)
single_goals = (df_sdg[df_sdg['n_goals'] == 1]).index.values
tsne_vecs_single = tsne_vecs[single_goals]
goal_labels_single = [g[0] for g in df_sdg['sdg_goals'][single_goals]]
titles_single = df_sdg['title'][single_goals].values

In [None]:
colors = [Category20_16[g-1] for g in goal_labels_single]

cds = ColumnDataSource(data={
    'tsne_0': tsne_vecs[:, 0][single_goals],
    'tsne_1': tsne_vecs[:, 1][single_goals],
    'color': colors,
    'goal': [sdg_definitions[g] for g in goal_labels_single],
    'title': titles_single,
    'id': single_goals
})

p = figure(width=600, height=500, 
           title='TSNE Plot of Single SDG Article Vectors')

hover = HoverTool(
    tooltips=[
        ('Goal', '@goal'), 
        ('Title', '@title'), 
        ('Doc ID', '@id')]
)

p.circle(source=cds, x='tsne_0', y='tsne_1', color='color', line_width=0, legend='goal', radius=0.4, alpha=0.9)
p.add_tools(hover)

p.legend.label_text_font_size = '6pt'

show(p)

- What does this tell you about the overlap of language between the articles that discuss the different SDGs? 
- Can you use the hover interactivity to investigate the human labelling accuracy in mixed clusters?

**Task**
- Add legend muting

# Networks

In [None]:
from im_tutorials.data.gtr import gtr_sample
gtr_sample_df = gtr_sample()

In [None]:
gtr_sample_df.head()

### Cooccurrence Networks

We are going to define communities of research topics as groups of topics which commonly occur together. An effective way of finding these clusters, and visualising the results, is by creating a topic cooccurrence network.

A cooccurrence graph is a network structure, where nodes are elements and an edge represents the elements of two nodes having cooccured at least once. The edges can then be "weighted" by the frequencies of each cooccurring pair. In the case of our research projects, we can say that two topics have cooccurred if they appear in at least one project together. To find all cooccurrences we therefore need to find the pairwise combinations of research topics for every project. For example, a single project with the topics
```
['Materials Characterisation', 'High Performance Computing', 'Condensed Matter Physics']
```

will become a set of topic pairs:

In [None]:
# The combinations function from itertools generates all the possible
# elements of combinations from a list with length  r.
topics = ['Materials Characterisation', 'High Performance Computing', 'Condensed Matter Physics']
list(itertools.combinations(topics, 2))

To create a cooccurrence network across all projects in our dataset, we will use a Python list comprehension, and then chain togeher all of the cooccurring pairs into one long list.

In [None]:
# Generate every pair combination of research topics from each project.
# Each pair is sorted alphabetically to make sure that there is only one 
# possible permutation of each edge.
cooccurrences = []

for topics in gtr_sample_df['research_topics']:
    topic_pairs = itertools.combinations(topics, 2)
    for pair in topic_pairs:
        cooccurrences.append(tuple(sorted(pair)))

# The same can be achieved in this one-liner
# cooccurrences = list(
# chain(*[[tuple(sorted(c)) for c in (itertools.combinations(d, 2))] for d in gtr_sample_df['research_topics']])
# )

# Count the frequency of each cooccurring pair.
research_topic_co_counter = Counter(cooccurrences)

In [None]:
print("Top Research Topic Cooccurrences by Frequency", '\n')
print('{:<70}{}'.format('Cooccurrence', 'Frequency'))
for k, v in research_topic_co_counter.most_common(20):
    topics = k[0] + ' <---> ' + k[1]
    print(f'{topics:<70}{v}')
    
print('\nMedian Topic Cooccurrence Freqency:')
print(np.median(list(research_topic_co_counter.values())))

### Normalising Edge Weights

Looking at the most frequently cooccurring topics we can pairs that make intuitive sense and are all generally captured neatly within higher order academic disciplines.

However this, along with the individual topic frequencies, also shows us that using the cooccurrence frequency as our edge weight might not be such a good idea. High frequency elements are simply more likely to cooccur due to chance. Therefore we should normalise our edge weights. One method for this is to calculate the association strength is a an edge weight where the cooccurrence freqency is normalised by the product of the individual terms' occurrence counts. It is defined as

$$ a = \frac{2 n c_{ij}}{o_{i}o_{j}} $$

where $n$ is the total number of elements, $c_{ij}$ is the number of cooccurrences between elements $i$ and $j$, and $o_{i}$ and $o_{j}$ are the individual frequency counts of each element.

To build our cooccurrence network, we need to generate a list of unique edges from our long list of cooccurrences and then calculate the association strength for each edge.

In [None]:
def association_strength(combo, occurrences, cooccurrences, total):
    '''association_strength
    Calculates the association strength between a cooccurring pair.
    '''
    a_s = ((2 * total * cooccurrences[combo]) / 
           (occurrences[combo[0]] * occurrences[combo[1]]))
    return a_s

research_topic_counter = Counter(chain(*gtr_sample_df['research_topics']))

# Generate a set of cooccurences (a list of unique pairs).
# This will form the edges of our cooccurrence graph.
edges = set(cooccurrences)
# Calculate the total number of elements
n = len(list(itertools.chain(*gtr_sample_df['research_topics'])))
# Calculate the association strength for each edge.
# We take the log of the association strength to give it
# a normal distribution.
assoc_strengths = np.log10([association_strength(
    edge,
    research_topic_counter, 
    research_topic_co_counter, 
    n) for edge in edges])

In [None]:
fig, ax = plt.subplots()
ax.hist(assoc_strengths, bins=50)
ax.set_xlabel('Log10 Association Strength');

The distribution of the association strengths shows a fairly smooth normal distribution. We can see that without applying a logarithm, there would be weights in our graph 100,000 times larger than others!

Python has 3 main tools for working with networks: [`networkx`](https://networkx.github.io/), [`igraph`](https://igraph.org/redirect.html) and [`graph-tool`](https://graph-tool.skewed.de). The first of these, `networkx`, is easy to install and interacting with it is straightforward. It is suitable for networks with up to hundreds of thousands of nodes or edges. With very large networks, it is recommended to use `graph-tool`.

In [None]:
import networkx as nx

weighted_edges = []
for (s, t), a_s in zip(edges, assoc_strengths):
    weighted_edges.append((s, t, a_s))

g = nx.Graph()
g.add_weighted_edges_from(weighted_edges, weight='association_strength')

In [None]:
# `python-louvain` imports as `community`
import community

In [None]:
part = community.best_partition(g, resolution=0.6, random_state=42, weight='association_strength')
n_communities = len(set(part.values()))
print('{} communities detected.'.format(n_communities))

In [None]:
from bokeh.plotting import figure
from bokeh.palettes import Category20, Spectral4
from bokeh.models import MultiLine, TapTool
from bokeh.models.graphs import from_networkx, NodesAndLinkedEdges

Before, we make the plot, we will add some extra properties to the nodes in our network. First, we will give each node an attribute, `topic_name`, which is the name of the research topic that the node represents. Second, we will give the node a colour based on the community to which it belongs.

Note: This code will break if more than 20 communities are used. In this situation a different colour palette would be needed, or a different way of selecting colours from a small palette.

In [None]:
names = {k: k for k, _ in part.items()}
nx.set_node_attributes(g, names, name='topic_name')
community_colors = {k: Category20[n_communities][c] for k, c in part.items()}
nx.set_node_attributes(g, community_colors, name='color')

We can now print a node to see the properties it holds.

In [None]:
print(g.nodes['Materials Characterisation'])

To plot our network on a 2 dimensional plane, we will need to calculate coordinates for each node. There are read-made algorithms for positioning network nodes visually, and some are built in to `networkx`. The spring layout tries to position nodes according to their edges and relative levels of attraction based on edge weights.

In [None]:
pos = nx.spring_layout(g, weight='association_strength', scale=2)

Now we have everything we need to make a nice plot. Luckily, `bokeh` has built-in support for `networkx` graphs, which makes plotting and interacting with them easy.

You can read more about this and see examples in the [Bokeh network visualisation documentation](https://bokeh.pydata.org/en/latest/docs/user_guide/graph.html)

In [None]:
# Create a plot and give it some basic features.
plot = figure(title="Research Topic Cooccurrence Network",
              x_range=(-2.1,2.1), y_range=(-2.1,2.1),
             )

# Use the renderer built in to `bokeh` to transform our Graph
# object into something that `bokeh` can plot.
graph_renderer = from_networkx(g, pos, center=(0,0))
# Draw glyphs for our nodes and assign properties for interactions.
# These include how the nodes interact to clicks and hovering.
graph_renderer.node_renderer.glyph = Circle(size=7, fill_color='color', line_color=None)
graph_renderer.node_renderer.selection_glyph = Circle(size=7, fill_color='color')
graph_renderer.node_renderer.hover_glyph = Circle(size=7, fill_color='color')
graph_renderer.node_renderer.muted_glyph = Circle(size=7, fill_color='color', fill_alpha=0.9)
# Draw glyphs for edges and assign properties for interactions.
graph_renderer.edge_renderer.glyph = MultiLine(line_color="#CCCCCC", line_alpha=0.2, line_width=1)
graph_renderer.edge_renderer.selection_glyph = MultiLine(line_color=Spectral4[2], line_width=1.5)
graph_renderer.edge_renderer.hover_glyph = MultiLine(line_color=Spectral4[1], line_width=1.5)
# Add the ability to select nodes.
graph_renderer.selection_policy = NodesAndLinkedEdges()
# Add a hover tool, that allows us to investigate nodes with a tooltip. 
node_hover_tool = HoverTool(tooltips=[("Topic", "@topic_name")])
# Put everything on the plot.
plot.add_tools(node_hover_tool, TapTool())
plot.renderers.append(graph_renderer)

show(plot)

### With Datashader

While we have a nice node layout, and visible communities, we have lots of overlapping edges and understanding the graph edge structure is pretty hard. Luckily we can use Datashader to do this.

In [None]:
from datashader.layout import random_layout, circular_layout, forceatlas2_layout
from datashader.bundling import connect_edges, hammer_bundle

In [None]:
def nx_layout(graph, layout):
    data = [[node]+layout[node].tolist() for node in graph.nodes]

    nodes = pd.DataFrame(data, columns=['id', 'x', 'y'])
    nodes.set_index('id', inplace=True)

    edges = pd.DataFrame(list(graph.edges), columns=['source', 'target'])
    return nodes, edges

In [None]:
nodes, edges = nx_layout(g, pos)

In [None]:
bw = 0.1
r_nodes = hv.Points(nodes)
r_bundled = hv.Curve(hammer_bundle(r_nodes.data, r_edges.data, initial_bandwidth=bw))

In [None]:
datashade.cmap=colorcet.fire[40:]

In [None]:
ds_network = dynspread(datashade(r_nodes, cmap=["cyan"])) * datashade(r_bundled, **ds_opts)
ds_network

More examples of using Datashader for networks can be found [here](https://datashader.org/user_guide/Networks.html), including a great example of visualising UK [research collaboration networks](https://anaconda.org/jbednar/uk_researchers/notebook).

**Task**

- Make this look pretty!

# Try It Yourself

1. Import a dataset (or use one from above)
2. Pick a one or two variables you want to visualise (or something more complex like a network)
3. Plot with Bokeh, Holoviews, and/or Datashader
4. Change the options to add interactivity and customise the plot.

In [None]:
# start here!

# Resources

### Documentation
- [Bokeh](https://docs.bokeh.org/en/latest/index.html)
- [Holoviews](http://holoviews.org/)
- [Datashader](http://datashader.org)
- [Geoviews](http://geoviews.org/)


### Other Interactive Python Visualisation Packages
- [Altair](https://altair-viz.github.io)
- [Plotly](https://plot.ly/python/)


### For Dashboards
- Holoviews
- Bokeh Server