# Navigating the Hemibrain connectome
*created by Gabrielle J. Gutierrez, PhD for CAMP 2023 Pune, India*

This notebook is designed to introduce students to some basic functionality of the Neuprint python API and to the nuances of the Hemibrain data for the *Drosophila* brain connectome. Many of the examples will feature the circadian clock neurons analyzed in: Orie T Shafer, Gabrielle J Gutierrez, Kimberly Li, Amber Mildenhall, Daphna Spira, Jonathan Marty, Aurel A Lazar, Maria de la Paz Fernandez (2022) Connectomic analysis of the Drosophila lateral neuron clock cells reveals the synaptic basis of functional pacemaker classes *eLife 11:e79139* https://doi.org/10.7554/eLife.79139

The documentation for many of the functions that we'll be using can be found here: https://connectome-neuprint.github.io/neuprint-python/docs/queries.html.

## Getting set up  
Start by entering your client info here to start a neuprint session. We'll also import the most important packages we'll need. 

In [1]:
from neuprint import Client
c = Client('neuprint.janelia.org', dataset='hemibrain:v1.2.1', token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJlbWFpbCI6ImdhYnJpZWxsZS5ndXRpZXJyZXpAZ21haWwuY29tIiwibGV2ZWwiOiJub2F1dGgiLCJpbWFnZS11cmwiOiJodHRwczovL2xoMy5nb29nbGV1c2VyY29udGVudC5jb20vYS9BQWNIVHRkZTlSMGl1bDNMeWtaalV0WDRLRlNRb2NVNHZ1QWdpNTZUa2p0bWY4UVp6WlU9czk2LWM_c3o9NTA_c3o9NTAiLCJleHAiOjE4Njk4MzE5NTJ9.2kRc37oh1-ca2D7lAe7eZWwFYNlMC6a7vq_FF6FWSYw')
# insert personal token above. see https://connectome-neuprint.github.io/neuprint-python/docs/quickstart.html#client-and-authorization-token for instructions
c.fetch_version()

'0.1.0'

In [2]:
# import important stuff here
import numpy as np
import pandas as pd
import matplotlib

Every neuron, or piece of neuron, has its own body ID. Below is a manually created list of the body IDs for the labeled and annotated clock neurons in the Hemibrain. We'll use these body IDs to access information about these neurons from neuprint.

In [3]:
clock_bodyIds = [2068801704, 1664980698, 2007068523, 1975347348, 5813056917, 5813021192, 5813069648, 511051477,
                  296544364, 448260940, 5813064789, 356818551, 480029788, 450034902, 546977514, 264083994, 5813022274,
                  5813010153, 324846570, 325529237, 387944118, 387166379, 386834269, 5813071319, 1884625521,
                  2065745704, 5813001741, 5813026773]

## Fetch neurons
We'll start by fetching summary information about each of these neurons using the fetch_neurons function. This function takes some neuron criteria as its input and returns two dataframes containing summary information about the individual neurons that match the criteria and information about the number of synaptic sites on the neurons as well as which ROI those sites are located in. 

In [4]:
from neuprint import fetch_neurons

neuron_df, roi_counts_df = fetch_neurons(clock_bodyIds)

Display each of the dataframes below and look at the information in their columns. In addition to a bodyId, each neuron has a type and an instance. There are some objects that have a bodyId but no type or instance. These tend to be fragments of neurons that remain unidentified in the Hemibrain data.

The pre and post columns indicate the numbers of presynaptic and postsynaptic sites that are attributed to the neuron. The presynaptic sites are where the neuron would be releasing neurotransmitters from. The postsynaptic sites on the neuron are where it is receiving inputs. The downstream and upstream columns convey similar information; however, they indicate the number of sites that are on the other side of the pre or postsynaptic sites of the neuron (I think!). There are often multiple post sites for every pre site, but take note that not every pre site is connected to a post site and vice versa.

Mito is the number of mitochondria that were counted in the neuron. CellBodyFiber is related to the hemilineage of the neuron. It indicates which neurons likely derived from the same stem cell. InputRois, outputRois, and roiInfo contain information about the ROIs in which the synaptic sites of this neuron are located. Much of this information is also contained in the roi_counts_df dataframe.

There is also information about the status of the data in the Hemibrain, for example, the extent to which the neuron has been traced from the EM reconstruction. 

In [5]:
display(neuron_df)

Unnamed: 0,bodyId,instance,type,pre,post,downstream,upstream,mito,size,status,cropped,statusLabel,cellBodyFiber,somaRadius,somaLocation,inputRois,outputRois,roiInfo
0,264083994,DN1a_R,DN1a,395,1277,2847,1277,326,1277856419,Traced,False,Roughly traced,PDM10,270.0,"[11339, 22506, 4104]","[AME(R), CA(R), INP, MB(+ACA)(R), MB(R), OL(R)...","[AME(R), CA(R), INP, MB(+ACA)(R), MB(R), OL(R)...","{'SNP(R)': {'pre': 232, 'post': 1037, 'downstr..."
1,296544364,LNd_R,LNd,281,733,2068,733,284,1629467924,Traced,False,Roughly traced,ADL30,451.5,"[4088, 26003, 19232]","[SIP(R), SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","[SIP(R), SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","{'SNP(R)': {'pre': 219, 'post': 692, 'downstre..."
2,324846570,DN1pA_R,DN1pA,187,451,1085,451,153,835395927,Traced,False,Roughly traced,PDM24,278.0,"[17791, 19036, 5000]","[SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","[SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","{'SNP(R)': {'pre': 99, 'post': 371, 'downstrea..."
3,325529237,DN1pA_R,DN1pA,201,443,1161,443,148,823942629,Traced,False,Roughly traced,PDM24,339.0,"[17387, 19226, 5776]","[SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","[SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","{'SNP(R)': {'pre': 116, 'post': 373, 'downstre..."
4,356818551,LPN_R,LPN,646,1511,4668,1511,396,1966628095,Traced,False,Roughly traced,PDL18,366.0,"[8635, 11798, 15840]","[CA(R), INP, MB(+ACA)(R), MB(R), PLP(R), SCL(R...","[INP, MB(+ACA)(R), PLP(R), SCL(R), SIP(R), SLP...","{'SNP(R)': {'pre': 636, 'post': 1381, 'downstr..."
5,386834269,DN1pB_R,DN1pB,572,1121,3504,1121,328,1899760660,Traced,False,Roughly traced,PDM24,357.0,"[18893, 20415, 3856]","[AOTU(R), INP, PLP(R), SCL(R), SIP(R), SLP(R),...","[AOTU(R), INP, PLP(R), SCL(R), SIP(R), SLP(R),...","{'SNP(R)': {'pre': 427, 'post': 921, 'downstre..."
6,387166379,DN1pA_R,DN1pA,178,494,1025,494,163,836685076,Traced,False,Roughly traced,PDM24,301.0,"[16224, 19247, 5372]","[MB(+ACA)(R), SLP(R), SMP(L), SMP(R), SNP(L), ...","[SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","{'SNP(R)': {'pre': 89, 'post': 397, 'downstrea..."
7,387944118,DN1pA_R,DN1pA,144,468,775,468,163,767049416,Traced,False,Roughly traced,PDM24,319.5,"[16744, 19299, 4696]","[PLP(R), SLP(R), SMP(L), SMP(R), SNP(L), SNP(R...","[SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","{'SNP(R)': {'pre': 83, 'post': 358, 'downstrea..."
8,448260940,LNd_R,LNd,255,863,1968,863,383,1659640924,Traced,False,Roughly traced,ADL30,376.5,"[3107, 25129, 18592]","[MB(+ACA)(R), SIP(R), SLP(R), SMP(L), SMP(R), ...","[SIP(R), SLP(R), SMP(L), SMP(R), SNP(L), SNP(R)]","{'SNP(R)': {'pre': 181, 'post': 786, 'downstre..."
9,450034902,LPN_R,LPN,369,1099,2593,1099,292,1447849563,Traced,False,Roughly traced,PDL18,361.0,"[10497, 12517, 15648]","[INP, MB(+ACA)(R), SCL(R), SLP(R), SMP(R), SNP...","[INP, SCL(R), SLP(R), SMP(R), SNP(R)]","{'SNP(R)': {'pre': 365, 'post': 1049, 'downstr..."


In [6]:
roi_counts_df

Unnamed: 0,bodyId,roi,pre,post,downstream,upstream,mito
0,264083994,SNP(R),232,1037,1764,1037,212
1,264083994,SLP(R),230,1022,1748,1022,207
2,264083994,dACA(R),14,85,106,85,12
3,264083994,MB(+ACA)(R),23,392,167,392,29
4,264083994,lACA(R),7,289,48,289,17
...,...,...,...,...,...,...,...
244,5813071319,INP,30,17,193,17,12
245,5813071319,SCL(R),30,17,193,17,12
246,5813071319,VLNP(R),75,104,307,104,20
247,5813071319,PLP(R),3,27,13,27,3


We fetched a bunch of neurons based on a list of body IDs but what if we want to include more criteria so that we can return a more targeted set of neurons. To do this, we use NeuronCriteria. In the example below, I specify a set of criteria based on cellBodyFiber and I provide that criteria to fetch_neurons to retrieve the desired neurons. We can add other criteria to this. Try including cell type along with cellBodyFiber. 

In [9]:
from neuprint import NeuronCriteria as NC

criteria = NC(cellBodyFiber='PDM10')
#criteria = NC(cellBodyFiber='PDM10', type='DN1a')

In [10]:
neuron_df, roi_counts_df = fetch_neurons(criteria)
neuron_df

Unnamed: 0,bodyId,instance,type,pre,post,downstream,upstream,mito,size,status,cropped,statusLabel,cellBodyFiber,somaRadius,somaLocation,inputRois,outputRois,roiInfo
0,264083994,DN1a_R,DN1a,395,1277,2847,1277,326,1277856419,Traced,False,Roughly traced,PDM10,270.0,"[11339, 22506, 4104]","[AME(R), CA(R), INP, MB(+ACA)(R), MB(R), OL(R)...","[AME(R), CA(R), INP, MB(+ACA)(R), MB(R), OL(R)...","{'SNP(R)': {'pre': 232, 'post': 1037, 'downstr..."
1,5813022274,DN1a_R,DN1a,413,1255,2966,1255,340,1395281217,Traced,False,Roughly traced,PDM10,288.5,"[15975, 21557, 4352]","[AME(R), CA(R), INP, MB(+ACA)(R), MB(R), ME(R)...","[AME(R), INP, MB(+ACA)(R), ME(R), OL(R), PLP(R...","{'SNP(R)': {'pre': 231, 'post': 945, 'downstre..."


## Fetch skeletons
There is extensive information available about the reconstructions of the neurons in the Hemibrain. Below, I plot out the skeleton of a neuron using fetch_skeleton. This function returns all the line segments that compose a given neuron.

In [None]:
from neuprint import fetch_skeleton

s = fetch_skeleton(clock_bodyIds[13])
s

In [None]:
# Join parent/child nodes for plotting as line segments below.
# (Using each row's 'link' (parent) ID, find the row with matching rowId.)
segments = s.merge(s, 'inner', left_on=['link'], right_on=['rowId'], suffixes=['_child', '_parent'])

In [None]:
import bokeh
import bokeh.palettes
from bokeh.plotting import figure, show, output_notebook
output_notebook()

In [None]:
p = figure()
p.y_range.flipped = True

# Plot skeleton segments (in 2D)
p.segment(x0='x_child', x1='x_parent',
          y0='z_child', y1='z_parent',source=segments)

show(p)

We'll return to plotting neural skeletons and plotting synaptic sites on them later on. 

## Fetch synapses
There are multiple ways to obtain information about the synapses in the Hemibrain data. The most general way is with the fetch_synapses function which returns spatial information about the synaptic sites that are associated with the criteria you provide. This function takes neuron criteria and optionally takes synapse criteria as well. In the example below, I retrieve the synapse information for an example clock neuron using synapse criteria that ensure that only synapses within primary ROIs are retrieved. This prevents me from receiving the same synaptic sites multiple times if it is nested within a non-primary ROI. Try toggling the synapse criteria below to see what I mean.

In [None]:
from neuprint import fetch_synapses, NeuronCriteria as NC, SynapseCriteria as SC

# returns each synapse once and provides primary ROI where synapse is located
syn_sites = fetch_synapses(clock_bodyIds[13], SC(primary_only=True))
syn_sites

To visualize the synaptic sites, I create a colormap below that assigns pre and postsynaptic sites to distinct colors. This colormapping is then added onto the dataframe containing the synaptic sites.

In [None]:
# create a colormap so that pre and post each get a different color
colormap = dict(zip(syn_sites['type'].value_counts().index, bokeh.palettes.Dark2[5]))

# add the color information to the dataframe
syn_sites['color'] = syn_sites['type'].map(colormap)

In [None]:
syn_sites

Plot the skeleton from the previous section again and this time add the synapses using scatter.

In [None]:
p = figure()
p.y_range.flipped = True

# Plot skeleton segments (in 2D) in the x-z plane
p.segment(x0='x_child', x1='x_parent',
          y0='z_child', y1='z_parent',source=segments)

# Also plot the synapses from the above example in the x-z plane
p.scatter(syn_sites['x'], syn_sites['z'], color=syn_sites['color'])

show(p)

## Fetch connections
If we want information about how those synapses actually connect neurons in the Hemibrain, there are multiple options depending on exactly how much detail we want. 

If we want detailed information about the synapses that includes the pre-synaptic and post-synaptic neurons, the locations of these synaptic sites, the ROIs they are in, and the confidence score for each connection, then we would use fetch_synapse_connections.

We have to specify a source and a target. These can be a single neuron or a group of neurons and can be specified using neuron criteria, otherwise we can use "None" to avoid giving criteria for either the source or target. I don't recommending using None for both source and target because the function will attempt to return all of the synaptic connections to and from every neuron in the Hemibrain and that will likely time out. Below, I fetch all of the synapse connections to one of the clock neurons.

In [None]:
from neuprint import fetch_synapse_connections

conn_sites = fetch_synapse_connections(None, clock_bodyIds[13], SC(primary_only=True))

In [None]:
conn_sites.sort_values(by='x_pre')

Suppose we want more aggregated information about the number of synaptic connections between neurons. For this, we would use fetch_simple_connections. It returns a column of weights which has the numbers of synaptic sites between pairs of connected neurons. 

In [None]:
from neuprint import fetch_simple_connections

fetch_simple_connections(None,clock_bodyIds[13])

Now we will get all of the connections from clock neurons that are in the PDM10 cellBodyFiber to the LNds. We do this by setting neuron criteria.

In [None]:
pre_criteria = NC(cellBodyFiber='PDM10',bodyId=clock_bodyIds)
post_criteria = NC(type='LNd')

conns = fetch_simple_connections(pre_criteria,post_criteria)
conns

There is also fetch_adjacencies which returns more detailed info. I prefer fetch_simple_connections but fetch_adjacencies performs better when your criteria encompass a large number of neurons.

In [None]:
from neuprint import fetch_adjacencies

neuron_df, conn_df = fetch_adjacencies(pre_criteria,post_criteria)

In [None]:
conn_df.sort_values(by='bodyId_post')

The weight is the sum of synapse connections between a pair of neurons. By default, it is a count of the number of post sites on the postsynaptic neuron of the pair. 

We can make a connection matrix using connection_table_to_matrix. You pass it the dataframe with connections, which feature to use for the rows and columns of the matrix, and what you'd like the data to be sorted by.

In [None]:
from neuprint.utils import connection_table_to_matrix

matrix = connection_table_to_matrix(conns, 'bodyId', sort_by='type')
matrix

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn

fig = plt.figure(figsize=(16, 12))

# plot connectivity heatmap
seaborn.heatmap(matrix, vmin=0, annot=True, cmap=seaborn.light_palette("purple", as_cmap=True), cbar_kws={'label': 'connection strength'})
plt.title('Connectivity matrix')
plt.xlabel('postsynaptic')
plt.ylabel('presynaptic')

## Challenge #1
Create a connectivity matrix plot for all of the clock neuron types. Each row and column of the matrix should list the clock cell types and each entry in the matrix should be the combined weights from one cell type to another.

## Challenge #2
Plot the skeleton of a neuron and the synaptic sites from its top 3 strongest inputs (i.e. the 3 neurons with the most presynaptic weights to it). Use different colors for the 3 different inputs. 

## Challenge #3
The 4 sLNv neurons comprise the morning (M) group of neurons and the 6 LNds along with the 5th sLNv comprise the evening (E) group. Create and plot a connectivity matrix that shows the connections from the M and E cells to all the clock neurons.

## Challenge #4
Plot the skeletons of 2 connected neurons together on the same plot along with their synaptic connections.

## possible solutions (no peeking!)

### a possible solution for challenge #1

In [None]:
conns = fetch_simple_connections(clock_bodyIds,clock_bodyIds)

In [None]:
matrix = connection_table_to_matrix(conns, 'type', sort_by='type')
matrix

In [None]:
fig = plt.figure(figsize=(16, 12))

# plot connectivity heatmap
seaborn.heatmap(matrix, vmin=0, annot=True, cmap=seaborn.light_palette("purple", as_cmap=True), cbar_kws={'label': 'connection strength'})
plt.title('Connectivity matrix')
plt.xlabel('postsynaptic')
plt.ylabel('presynaptic')

### a possible solution for challenge #2

In [None]:
conns = fetch_simple_connections(None, clock_bodyIds[13])

conns = conns.head(3)

In [None]:
top3_inputs = conns['bodyId_pre']

syn_sites = fetch_synapse_connections(top3_inputs, clock_bodyIds[13], SC(primary_only=True))

In [None]:
# create a colormap so that pre and post each get a different color
colormap = dict(zip(syn_sites['bodyId_pre'].value_counts().index, bokeh.palettes.Dark2[5]))

# add the color information to the dataframe
syn_sites['color'] = syn_sites['bodyId_pre'].map(colormap)

In [None]:
s = fetch_skeleton(clock_bodyIds[13])

# Join parent/child nodes for plotting as line segments below.
# (Using each row's 'link' (parent) ID, find the row with matching rowId.)
segments = s.merge(s, 'inner', left_on=['link'], right_on=['rowId'], suffixes=['_child', '_parent'])

In [None]:
p = figure()
p.y_range.flipped = True

# Plot skeleton segments (in 2D) in the x-z plane
p.segment(x0='x_child', x1='x_parent',
          y0='z_child', y1='z_parent',source=segments)

# Also plot the synapses from the above example in the x-z plane
p.scatter(syn_sites['x_post'], syn_sites['z_post'], color=syn_sites['color'])

show(p)

### a possible solution for challenge #3

In [None]:
sLNv_criteria = NC(type='s-LNv',bodyId=clock_bodyIds)
sLNv_df, _ = fetch_neurons(sLNv_criteria)

In [None]:
LNd_criteria = NC(type='LNd',bodyId=clock_bodyIds)
LNd_df, _ = fetch_neurons(LNd_criteria)

In [None]:
ME_df = pd.concat([sLNv_df,LNd_df])

In [None]:
conns = fetch_simple_connections(ME_df['bodyId'],clock_bodyIds)
conns

In [None]:
matrix = connection_table_to_matrix(conns, 'bodyId', sort_by='type')
matrix

In [None]:
fig = plt.figure(figsize=(16, 12))

# plot connectivity heatmap
seaborn.heatmap(matrix, vmin=0, annot=True, cmap=seaborn.light_palette("purple", as_cmap=True), cbar_kws={'label': 'connection strength'})
plt.title('Connectivity matrix')
plt.xlabel('postsynaptic')
plt.ylabel('presynaptic')

### a possible solution for challenge #4

In [None]:
conn_sites = fetch_synapse_connections(NC(instance='5th s-LNv'), 5813069648, SC(primary_only=True))

In [None]:
s1 = fetch_skeleton(conn_sites['bodyId_pre'][0])
# Join parent/child nodes for plotting as line segments below.
# (Using each row's 'link' (parent) ID, find the row with matching rowId.)
segments1 = s1.merge(s1, 'inner', left_on=['link'], right_on=['rowId'], suffixes=['_child', '_parent'])

In [None]:
s2 = fetch_skeleton(5813069648)
# Join parent/child nodes for plotting as line segments below.
# (Using each row's 'link' (parent) ID, find the row with matching rowId.)
segments2 = s2.merge(s2, 'inner', left_on=['link'], right_on=['rowId'], suffixes=['_child', '_parent'])

In [None]:
p = figure()
p.y_range.flipped = True

# Plot skeleton segments (in 2D) in the x-z plane
p.segment(x0='x_child', x1='x_parent',
          y0='z_child', y1='z_parent',source=segments1)
p.segment(x0='x_child', x1='x_parent',
          y0='z_child', y1='z_parent',source=segments2, color='black')

# Also plot the synapses from the above example in the x-z plane
p.scatter(conn_sites['x_pre'], conn_sites['z_pre'], color='red')
p.scatter(conn_sites['x_post'], conn_sites['z_post'], fill_color=None, hatch_color='green')

show(p)

## Do your own thing
Explore any neuron or cell type in the Hemibrain data. To obtain the bodyId(s) of the neurons you are interested in, try searching in NeuroNLP (https://hemibrain.neuronlp.fruitflybrain.org) or the Neuprint web interface (https://neuprint.janelia.org).

*Many of the examples in the code cells of this notebook relied on the Quickstart tutorial examples in https://connectome-neuprint.github.io/neuprint-python/docs/notebooks/QueryTutorial.html#Neuron-Search-Criteria published by Janelia.*