# Community Detection in Topic Cooccurrence Networks: Finding Research Disciplines in UK Research Projects

This tutorial looks at the use of [cooccurrence networks](https://en.wikipedia.org/wiki/Co-occurrence_network) and community detection to identify academic disciplines from research projects in the [Gateway to Research](https://gtr.ukri.org/) database. 

Disciplines are high level subject areas, such as _biological science_ or _engineering_ (they might map well to the faculties of a university). The Gateway to Research data does not categorise projects at this level, however in many cases it can be useful to do so. In an analysis, we may want to break down research funding according to discipline, see which projects are multi-disciplinary, or understand the differences in the nature of research outputs in different fields.

## Preamble

In [None]:
%load_ext autoreload
%autoreload 2

!pip install git+https://github.com/nestauk/im_tutorials.git

In [None]:
# importing useful Python utility libraries we'll need
import ast
import smart_open

from collections import Counter, defaultdict
import itertools

# matplotlib for static plots
import matplotlib.pyplot as plt
# numpy for mathematical functions
import numpy as np
# pandas for handling tabular data
import pandas as pd

from im_tutorials.utilities import chunks

## Import Data

The data for this project is stored as a csv on Amazon Web Services (AWS) S3, a static cloud file storage service. We can use `pandas` to pull the data directly into a DataFrame.

In [None]:
bucket='innovation-mapping-tutorials'
gtr_projects_key='gateway-to-research/gtr_projects.csv'
list_cols = ['research_topics', 'research_subjects']
# We use ast.literal_eval to convert the two columns above from
# string representations of lists to actual lists.
gtr_projects_df = pd.read_csv(
    smart_open.smart_open(f'https://s3.us-east-2.amazonaws.com/{bucket}/{gtr_projects_key}'),
    converters={k: ast.literal_eval for k in list_cols}
)

A quick look at the top few rows of data shows us the fields and the format of the data within each column.

In [None]:
gtr_projects_df.head()

At a first glance, The data in the research topic and research subject fields look fairly similar. For every project, both fields contain a lists of terms, which appear similar in content. From a first glance, it looks like the topics may be more granular than the subjects. We can count how many unique terms there are in each field to find out if that might be true.

In [None]:
# flatten the lists of research subjects and elements and count the contents
research_subject_counter = Counter(itertools.chain(*gtr_projects_df['research_subjects']))
research_topic_counter = Counter(itertools.chain(*gtr_projects_df['research_topics']))
print('There are {} unique research subjects in the GtR projects dataset.'.format(len(research_subject_counter)))
print('There are {} unique research topics in the GtR projects dataset.'.format(len(research_topic_counter)))

It looks like we were probably right. There are 82 research subjects, while there are over 600 research topics, indicating that these might be a finer representation of the contents of each project.

Let's also have a look at the frequencies of each subject and topic.

In [None]:
print("Top Research Subjects by Frequency", '\n')
print('{:<40}{}'.format('Topic', 'Frequency'))
for k, v in research_subject_counter.most_common(10):
    print('{:<40}{}'.format(k, v))
    
print('\nMedian Topic Freqency:')
print(np.median(list(research_subject_counter.values())))

In [None]:
print("Top Research Topics by Frequency", '\n')
print('{:<40}{}'.format('Topic', 'Frequency'))
for k, v in research_topic_counter.most_common(10):
    print('{:<40}{}'.format(k, v))
    
print('\nMedian Topic Freqency:')
print(np.median(list(research_topic_counter.values())))

We can see that the top research subject is _Info. & commun. Technol._ and the top research topic is _Climate & Climate Change_, both by some margin. However, we can also see that the top spots are populated by subjects and topics from several disciplines. 50% of the subjects occur 575 times or fewer, while for topics the median frequency is 69.

While the research topics and subjects are useful as keywords.

## Discipline Identification Through Community Detection

### Cooccurrence Networks

We are going to define communities of research topics as groups of topics which commonly occur together. An effective way of finding these clusters, and visualising the results, is by creating a topic cooccurrence network.

A cooccurrence graph is a network structure, where nodes are elements and an edge represents the elements of two nodes having cooccured at least once. The edges can then be "weighted" by the frequencies of each cooccurring pair. In the case of our research projects, we can say that two topics have cooccurred if they appear in at least one project together. To find all cooccurrences we therefore need to find the pairwise combinations of research topics for every project. For example, a single project with the topics
```
['Materials Characterisation', 'High Performance Computing', 'Condensed Matter Physics']
```

will become a set of topic pairs:

In [None]:
# The combinations function from itertools generates all the possible
# elements of combinations from a list with length  r.
list(itertools.combinations(['Materials Characterisation', 'High Performance Computing', 'Condensed Matter Physics'], 2))

These cooccurrences would form a triangular network; 3 nodes and 3 edges, where each edge has a frequency weight of 1.

Let's now imagine that we have several projects, and we repeat this process for each of them in turn. We will generate a list of cooccurring pairs, which we can then turn into a small cooccurrence network. The image below shows the cooccurrence network that is generated by applying this method to 3 projects. We can see that:

- Project 1 forms a single cooccurring pair.
- _Economic & Social History_ and _Music & Society_ are present in more than one project and bridge groups of topics that have not appeared together.

<img src="https://github.com/nestauk/im_tutorials/blob/master/img/topic_cooccurrence_network.png?raw=true" alt="drawing" width="700"/>

It is easy to see how repeating this process across hundreds or thousands of projects could quickly build up a picture of which topics commonly cooccur and form clusters that we might be able to identify as subjects or disciplines.

To create a cooccurrence network across all projects in our dataset, we will use a Python list comprehension, and then chain togeher all of the cooccurring pairs into one long list.

In [None]:
# Generate every pair combination of research topics from each project.
# Each pair is sorted alphabetically to make sure that there is only one 
# possible permutation of each edge.
cooccurrences = []

for topics in gtr_projects_df['research_topics']:
    topic_pairs = itertools.combinations(topics, 2)
    for pair in topic_pairs:
        cooccurrences.append(tuple(sorted(pair)))

# The same can be achieved in this one-liner
# cooccurrences = list(
# chain(*[[tuple(sorted(c)) for c in (itertools.combinations(d, 2))] for d in gtr_projects_df['research_topics']])
# )

# Count the frequency of each cooccurring pair.
research_topic_co_counter = Counter(cooccurrences)

In [None]:
print("Top Research Topic Cooccurrences by Frequency", '\n')
print('{:<70}{}'.format('Cooccurrence', 'Frequency'))
for k, v in research_topic_co_counter.most_common(20):
    print('{:<70}{}'.format((k[0] + ' + ' +k[1]), v))
    
print('\nMedian Topic Cooccurrence Freqency:')
print(np.median(list(research_topic_co_counter.values())))

### Normalising Edge Weights

Looking at the most frequently cooccurring topics we can pairs that make intuitive sense and are all generally captured neatly within higher order academic disciplines.

However this, along with the individual topic frequencies, also shows us that using the cooccurrence frequency as our edge weight might not be such a good idea. High frequency elements are simply more likely to cooccur due to chance. Therefore we should normalise our edge weights. One method for this is to calculate the association strength is a an edge weight where the cooccurrence freqency is normalised by the product of the individual terms' occurrence counts. It is defined as

$$ a = \frac{2 n c_{ij}}{o_{i}o_{j}} $$

where $n$ is the total number of elements, $c_{ij}$ is the number of cooccurrences between elements $i$ and $j$, and $o_{i}$ and $o_{j}$ are the individual frequency counts of each element.

In [None]:
def association_strength(combo, occurrences, cooccurrences, total):
    '''association_strength
    Calculates the association strength between a cooccurring pair.
    '''
    a_s = ((2 * total * cooccurrences[combo]) / 
           (occurrences[combo[0]] * occurrences[combo[1]]))
    return a_s

To build our cooccurrence network, we need to generate a list of unique edges from our long list of cooccurrences and then calculate the association strength for each edge.

In [None]:
# Generate a set of cooccurences (a list of unique pairs).
# This will form the edges of our cooccurrence graph.
edges = set(cooccurrences)
# Calculate the total number of elements
n = len(list(itertools.chain(*gtr_projects_df['research_topics'])))
# Calculate the association strength for each edge.
# We take the log of the association strength to give it
# a normal distribution.
assoc_strengths = np.log10([association_strength(
    edge,
    research_topic_counter, 
    research_topic_co_counter, 
    n) for edge in edges])

In [None]:
fig, ax = plt.subplots()
ax.hist(assoc_strengths, bins=100)
ax.set_xlabel('Association Strength')
plt.show()

The distribution of the association strengths shows a fairly smooth normal distribution. We can see that without applying a logarithm, there would be weights in our graph 100,000 times larger than others!

### Building the Cooccurrence Network


Python has 3 main tools for working with networks: [`networkx`](https://networkx.github.io/), [`igraph`](https://igraph.org/redirect.html) and [`graph-tool`](https://graph-tool.skewed.de). The first of these, `networkx`, is easy to install and interacting with it is straightforward. It is suitable for networks with up to hundreds of thousands of nodes or edges. With very large networks, it is recommended to use `graph-tool`.

In [None]:
import networkx as nx

To add the edges, we simply create a list of tuples that represent our edges, with each containing the source node `s`, the target node `t`, and the association strength `a_s`. We then instantiate a `networkx` `Graph` object, and simply use the method `.add_weighted_edges_from()` to put the list of edges into the network.

In [None]:
weighted_edges = []
for (s, t), a_s in zip(edges, assoc_strengths):
    weighted_edges.append((s, t, a_s))

g = nx.Graph()
g.add_weighted_edges_from(weighted_edges, weight='association_strength')

We can then call on an edge in the graph to view its properties.

In [None]:
print(g.edges[('Materials Characterisation', 'Materials Synthesis & Growth')])

### Community Detection

Community detection is the process of finding sets of nodes in a network that are densely internally. Algorithms for this process generally find the boundaries of communities by analysing the density of connections between a group of nodes with respect to the density of connections outside of this group. A pair of nodes is more likely to be connected if they are both members of the same community.


<img src="https://github.com/nestauk/im_tutorials/blob/master/img/community_detection.png?raw=true" alt="communities" width=200>

There are [many different types of community detection](https://github.com/benedekrozemberczki/awesome-community-detection). Here we will use the Louvain Method, as there is an actively maintained, easy to use Pyton implementation, [`python-louvain`](https://python-louvain.readthedocs.io).

In [None]:
# `python-louvain` imports as `community`
import community

To find which community each research topic is in, we apply `best_partition` to our cooccurrence network. We can vary the resolution to change granular the community detection is. We also pass in the name of the edge weight that we want the method to use when determining where community boundaries are.

In [None]:
part = community.best_partition(g, resolution=0.6, random_state=42, weight='association_strength')
n_communities = len(set(part.values()))
print('{} communities detected.'.format(n_communities))

### Interactive Network Visualisation

Now we have a cooccurrence network with each node assigned to a community. This seems like a nice place to visualise our output so far. To do this, we will use [`bokeh`](https://bokeh.pydata.org/en/latest/), a Python library that allows the user to create interactive plots, and is based on the popular plotting library `D3`, which powers many visualisations on the web.

In [None]:
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.palettes import Category20, Spectral4
from bokeh.models import Circle, MultiLine, HoverTool, TapTool
from bokeh.models.graphs import from_networkx, NodesAndLinkedEdges

output_notebook()

Before, we make the plot, we will add some extra properties to the nodes in our network. First, we will give each node an attribute, `topic_name`, which is the name of the research topic that the node represents. Second, we will give the node a colour based on the community to which it belongs.

Note: This code will break if more than 20 communities are used. In this situation a different colour palette would be needed, or a different way of selecting colours from a small palette.

In [None]:
names = {k: k for k, _ in part.items()}
nx.set_node_attributes(g, names, name='topic_name')
community_colors = {k: Category20[n_communities][c] for k, c in part.items()}
nx.set_node_attributes(g, community_colors, name='color')

We can now print a node to see the properties it holds.

In [None]:
print(g.nodes['Materials Characterisation'])

To plot our network on a 2 dimensional plane, we will need to calculate coordinates for each node. There are read-made algorithms for positioning network nodes visually, and some are built in to `networkx`. The spring layout tries to position nodes according to their edges and relative levels of attraction based on edge weights.

In [None]:
pos = nx.spring_layout(g, weight='association_strength', scale=2, seed=42)

Now we have everything we need to make a nice plot. Luckily, `bokeh` has built-in support for `networkx` graphs, which makes plotting and interacting with them easy.

In [None]:
# Create a plot and give it some basic features.
plot = figure(title="Research Topic Cooccurrence Network",
              x_range=(-2.1,2.1), y_range=(-2.1,2.1),
             )

# Use the renderer built in to `bokeh` to transform our Graph
# object into something that `bokeh` can plot.
graph_renderer = from_networkx(g, pos, center=(0,0))
# Draw glyphs for our nodes and assign properties for interactions.
graph_renderer.node_renderer.glyph = Circle(size=7, fill_color='color', line_color=None)
graph_renderer.node_renderer.selection_glyph = Circle(size=7, fill_color='color')
graph_renderer.node_renderer.hover_glyph = Circle(size=7, fill_color='color')
graph_renderer.node_renderer.muted_glyph = Circle(size=7, fill_color='color', fill_alpha=0.9)
# Draw glyphs for edges and assign properties for interactions.
graph_renderer.edge_renderer.glyph = MultiLine(line_color="#CCCCCC", line_alpha=0.2, line_width=1)
graph_renderer.edge_renderer.selection_glyph = MultiLine(line_color=Spectral4[2], line_width=1.5)
graph_renderer.edge_renderer.hover_glyph = MultiLine(line_color=Spectral4[1], line_width=1.5)
# Add the ability to select nodes.
graph_renderer.selection_policy = NodesAndLinkedEdges()
# Add a hover tool, that allows us to investigate nodes with a tooltip. 
node_hover_tool = HoverTool(tooltips=[("Topic", "@topic_name")])
# Put everything on the plot.
plot.add_tools(node_hover_tool, TapTool())
plot.renderers.append(graph_renderer)

show(plot)

# Uncomment this line if using google colab
# output_notebook()

### Investigating the Communities

Let's manually inspect the topics in each community to see if we can see what disciplines they might form.

In [None]:
reverse_part = defaultdict(list)
for k, v in part.items():
    reverse_part[v].append(k)
    
for c, topics in reverse_part.items():
    print(c)
    for chunk in chunks(topics, 4):
        print(', '.join(chunk))
    print('')

We can now create a community ID to discipline mapping.

In [None]:
community_discipline_map = {
    0: '',
    1: '',
    2: '',
    3: '',
    4: '',
    5: '',
    6: '',
    7: '',
    8: '',
    9: '',
}

### Assigning Subjects to Projects

If we want to do any analysis on the research projects using the discipline as a feature, we need to label each project with the correct discipline, according the its research topics.

The first step is to map each topic to the discipline community that it belongs to.

In [None]:
topic_discipline_mapping = {top: community_discipline_map[disc] for top, disc in part.items()}

Now we have this mapping we can:

1. Apply it to the research topics
    - `['Sociology', 'Economics', 'Information & Knowledge Mgmt']` might become `['social', 'social', 'maths_computing_ee']`
2. Get the unique set of disciplines for each project
    - `['social', 'social', 'maths_computing_ee']` becomes `{'social', 'maths_computing_ee'}`
3. Count the number of disciplines in each project
4. Flag projects that are mono-disciplinary

In [None]:
# Map topics to disciplines using pandas' apply method on
# the `research_topics` column.
gtr_projects_df['disciplines'] = gtr_projects_df['research_topics'].apply(
    lambda x: [topic_discipline_mapping[val] for val in x])
gtr_projects_df['discipline_set'] = [set(d) for d in gtr_projects_df['disciplines']]

# Projects funded by MRC and NC3Rs have no research topics
# We will make the assumption that they are all medical_sciences
gtr_projects_df['discipline_set'][
    (gtr_projects_df['funder_name'] == 'MRC') | 
    (gtr_projects_df['funder_name'] == 'NC3Rs')] = set(['medical_sciences'])
# Count the number of unique disciplines for each project
gtr_projects_df['n_disciplines'] = [len(x) for x in gtr_projects_df['discipline_set']]
# Create a field that flags whether a discipline is mono-disciplinary
gtr_projects_df['is_single_discipline'] = [True if len(x)==1 else np.nan if len(x)==0 else False 
                                           for x in gtr_projects_df['discipline_set']]

print('{:.2f}% of projects are mono-disciplinary.'.format(gtr_projects_df['is_single_discipline'].mean() * 100))

Let's have another look at our dataframe now that we've added these extra research discipline fields. 

In [None]:
gtr_projects_df.head()

### Analysis

Now that we have our projects labelled by discipline, we can do some analysis.

#### Interdisciplinarity

First, we are going to look at which disciplines are commonly found together in research projects to see what the landscape of interdisciplinary research is like in the UK.

To do this, we are going to apply our method for finding cooccurring pairs of entities to the `discipline_set` field.

In [None]:
# This time we apply our one-liner to find cooccurring disciplines
discipline_cooccurrences = list(
    itertools.chain(*[[tuple(sorted(c)) for c in itertools.combinations(d, 2)] for d in gtr_projects_df['discipline_set']])
)
# Count the frequency of each cooccurring pair.
discipline_edge_counter = Counter(discipline_cooccurrences)

We then create a pivot table of our discipline pair counts.

In [None]:
discipline_cooccurrence_df = pd.DataFrame({
    'subj0': [dcc[0] for dcc in discipline_edge_counter.keys()],
    'subj1': [dcc[1] for dcc in discipline_edge_counter.keys()],
    'count': list(discipline_edge_counter.values()),
}).pivot_table(index='subj0', columns='subj1')['count']

In [None]:
# Seaborn is a plotting library based on matplotlib
# It has lots of nice presets for statistical plotting
import seaborn as sns

Finally, let's plot a heatmap of the frequency of disciplinary pairs.

In [None]:
def format_discipline_labels(labels):
    return [l.get_text().replace('_', ' ').title() for l in labels]

fig, ax = plt.subplots(figsize=(6, 5))
sns.heatmap(discipline_cooccurrence_df, annot=True, fmt='.0f', ax=ax, cbar=None, cmap='viridis')
ax.set_xticklabels(format_discipline_labels(ax.get_xticklabels()), rotation=30, ha='right')
ax.set_yticklabels(format_discipline_labels(ax.get_yticklabels()))
ax.invert_yaxis()
ax.set_xlabel(None)
ax.set_ylabel(None)
ax.set_title('Discipline Crossover in Multidiscplinary Projects')
plt.show()

While we're at it, let's look at the distribution of disciplinarity among the projects.

In [None]:
fig, ax = plt.subplots()
gtr_projects_df['n_disciplines'].value_counts().plot.bar(color='C0', ax=ax)
ax.set_xlabel('N Disciplines')
ax.set_ylabel('Frequency')
ax.set_title('Project Frequency by Discipline Count')
plt.show()

#### Disciplines and Funding Bodies

It could be argued that we could infer the discipline of a project from the funding body. Besides the fact that we may want a slightly higher level of domain granularity than that offered by the funding bodies, it also excludes the overlap of disciplines between funders that exists in the real world.

Let's have a look at how our disciplines match up against the funders. To do this, we will create another heatmap, this time plotting the fraction of projects that contain a discipline, broken down by funding body.

In [None]:
from sklearn.preprocessing import MultiLabelBinarizer

In [None]:
mlb = MultiLabelBinarizer(classes=discipline_cooccurrence_df.index)
discipline_binarized = mlb.fit_transform(gtr_projects_df['discipline_set'])
discipline_binarized_df = pd.DataFrame(discipline_binarized, columns=mlb.classes_)

# Group projects by funder and calculate the frequencies of disciplines
# then normalise by the total number of projects for each funder (rows add to 100)
funder_discipline_df = discipline_binarized_df.groupby(gtr_projects_df['funder_name']).sum().divide(
    discipline_binarized_df.groupby(gtr_projects_df['funder_name']).sum().sum(axis=1), axis=0) * 100

In [None]:
fig, ax = plt.subplots(figsize=(6, 4.5))
sns.heatmap(funder_discipline_df, annot=True, fmt='.0f', ax=ax, cmap='viridis')
ax.set_title('Percentages of Projects Containing a Discipline by Funder')
ax.set_xlabel('Discipline')
ax.set_ylabel('Funder')
ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha='right')
plt.show()

- AHRC (Arts and Humanities Research Council)
- BBSRC (Biotechnology and Biological Sciences Research Council)     
- EPSRC (Engineering and Physical Sciences Research Council) 
- ESRC (Economic and Social Research Council) 
- JISC (Joint Information Systems Committee) 
- MRC (Medical Research Council)
- NC3Rs (The National Centre for the 3Rs)
- NERC (Natural Environment Research Council) 
- STFC (Science and Technology Facilities Council)

## Conclusions

In this tutorial, we have seen how to form a small cooccurrence network of research topics, apply community detection and identify clusters of commonly cooccurring topics, which can be considered as an approximation of high level research disciplines.

**Where can we go from here?**

For a start, cooccurrence networks and community detection do not need to be applied only to topics or keywords. For example, we could use them for people or organisations to study the nature of social networks and collaborations.

We can also combine the results from a method such as the one shown here with other techniques, such as supervised machine learning, to create a document labelling algorithm. In this instance, we could train a model to predict discipline labels from project descriptions, and then apply the model to another dataset that does not have research topic, subject or discipline tags. We have used this to transfer discipline labels from Gateway to Research to CORDIS, the European Union's research project database.

Or perhaps we might explore the possibilities of creating a more detailed data visualisation, for example one that helps the user to see the hierarchical relationships between topics, subjects and disciplines.

How could you apply these methods to your domain or data?

## Extra: Aggregating Community Detection

In [None]:
class AggregatePartition:
    '''AggregatePartition'''
    def __init__(self, graph):
        self.graph = graph
    
    def edgelist_to_cooccurrence(self, repeats, **best_partition_kwargs):
        edge_counter = Counter()
        for i in range(repeats):
            partition = community.best_partition(self.graph, random_state=i, **best_partition_kwargs)
            edgelist = self.partition_to_edgelist(partition)
            edge_counter.update(edgelist)

        g = nx.Graph()
        g.add_weighted_edges_from([(e[0][0], e[0][1], e[1]) for e in edge_counter.items()])
        return g
    
    def partition_to_edgelist(self, partition):
        partition_reverse_mapping = self.reverse_index_partition(partition)
        edgelist = []
        for community, elements in partition_reverse_mapping.items():
            combos = [tuple(sorted(e)) for e in itertools.combinations(elements, 2)]
            edgelist.extend(combos)
        return edgelist
     
    def reverse_index_partition(self, partition):
        partition_reverse_mapping = defaultdict(list)
        for k, v in partition.items():
            partition_reverse_mapping[v].append(k)
        return partition_reverse_mapping

In [None]:
cp = AggregatePartition(g)
c_co = cp.edgelist_to_cooccurrence(5, resolution=0.8)

In [None]:
part_c_co = community.best_partition(c_co, resolution=0.4, random_state=42, weight='weight')
n_c_co_communities = len(set(part_c_co.values()))
print('{} communities detected.'.format(n_c_co_communities))

In [None]:
names = {k: k for k, _ in part.items()}
nx.set_node_attributes(c_co, names, name='topic_name')
c_co_community_colors = {k: Category20[n_c_co_communities][c] for k, c in part_c_co.items()}
nx.set_node_attributes(c_co, c_co_community_colors, name='color')

In [None]:
pos = nx.spring_layout(c_co, weight='weight', scale=2, seed=42)

plot = figure(title="Research Topic Cooccurrence Network",
              x_range=(-2.1,2.1), y_range=(-2.1,2.1),
             )

graph_renderer = from_networkx(c_co, pos, center=(0,0))
graph_renderer.node_renderer.glyph = Circle(size=7, fill_color='color', line_color=None)
graph_renderer.node_renderer.selection_glyph = Circle(size=7, fill_color='color')
graph_renderer.node_renderer.hover_glyph = Circle(size=7, fill_color='color')
graph_renderer.node_renderer.muted_glyph = Circle(size=7, fill_color='color', fill_alpha=0.9)


graph_renderer.edge_renderer.glyph = MultiLine(line_color="#CCCCCC", line_alpha=0.2, line_width=1)
graph_renderer.edge_renderer.selection_glyph = MultiLine(line_color=Spectral4[2], line_width=1.5)
graph_renderer.edge_renderer.hover_glyph = MultiLine(line_color=Spectral4[1], line_width=1.5)

graph_renderer.selection_policy = NodesAndLinkedEdges()

node_hover_tool = HoverTool(tooltips=[("Topic", "@topic_name")])
plot.add_tools(node_hover_tool, TapTool())

plot.renderers.append(graph_renderer)

show(plot)