# Data Retrieval

In order to retrieve the VNC dataset, you need to instantiate a neuprint Client, using the API key given to you when you sign up with a google account. 

As of June 07 2023, the MANC (manc:v1.0) or male VNC dataset is publicly available at https://neuprint.janelia.org/. To use this dataset see the following:

In [1]:
from neuprint import Client

MANC_client = Client('neuprint.janelia.org', dataset='manc:v1.0', token='Your-API-Key')

The client essentially acts as a bridge between the specific database and the Python API, so we can also create a client object for any dataset accessible through neuPrint:
- hemibrain:v1.2.1
- hemibrain:v1.1
- hemibrain:v1.0.1
- fib19:v1.0

In [None]:
hemibrain_client = Client('neuprint.janelia.org', dataset='hemibrain:v1.2.1', token='Your-API-Key')

# Data Importing/Storage
A wrapper class, ConnDF, provided within data_processing.py provides the core data import, storage and processing functions. 

Different instances of ConnDF can be made which parallel the client. i.e.) One instance for the MANC dataset, and one for the hemibrain dataset.

In [2]:
from data_processing import ConnDF

# Here you would pass in the MANC_client you created above. 
MANC_data = ConnDF(client=MANC_client)
# Instead you pass in the hemibrain_client. 
hemibrain_data = ConnDF(client=hemibrain_client)

The connectome datasets can be retrieved everytime an instance of ConnDF is called, however downloading this takes a long time. The datasets themselves come in the form of three .csvs, which are not that large in size, so storing this to local disk and then loading it is a more viable option. This is provided as an option for the function for loading/downloading the datasets.

To download the dataset, simply call the extract_full() method of your ConnDF instance with no parameters.

In [3]:
MANC_data.extract_full()

  0%|          | 0/119 [00:00<?, ?it/s]

This will retrieve all neuron metadata and all 'Traced' connections in the dataset, then save it to a default path in the current working directory (using the default path defined by neuprint.fetch_adjacencies). Using this path, one can then reload the dataset in future sessions by calling extract_full(file_path='default')

In [None]:
MANC_data.extract_full(file_path='default') # Throws 

To update the dataset, call the extract_full() method again, which redownloads the latest dataset and overwrites the .csvs in the default path.

In [None]:
MANC_data.extract_full()

Alternatively, if you want to keep the old version saved to local disk, but load the latest version as a class variable in the current jupyter notebook session you can call update_neurons().

In [None]:
MANC_data.update_neurons()

# Data Processing/Filtering

The available processing and filtering methods are a set of wrapped pandas queries and operations for convenience and particular use in the analyses and simulations performed by other packages of the CAST toolbox. These operate on variables of ConnDF instances which store the neuron metadata dataframe in neuron_master and connection edgelist dataframe in conn_master. Any filtering or processing step is performed on a copy of the conn_master dataframe, stored in conn_filter. 

In [None]:
# To access the neuron metadata dataframe, call:
MANC_data.neuron_master

# To access the connection edgelist dataframe, call:
MANC_data.conn_master

In [None]:
# Filter method. Currently this is an all-in-one method, but ideally in the future should be split into 
#   their own functions.

# Default parameter settings for roi, threshold, normalise set to None, leaves the connection dataframe unchanged. 
# But essentially sets conn_filter == conn_master.
MANC_data.filter(roi=None, threshold=None, normalise=None)

### ROI Filtering

Filtering by region of interest (ROI) queries the edgelist containing roi information. If you have a list of valid ROI(s), then it only includes connections or synapses that span those ROI(s). 

In [4]:
# For example, this only includes connections/synapses within the 'LegNp(T3)(R)' and 'LegNp(T3)(L)' ROI.
MANC_data.filter(roi=['LegNp(T3)(R)', 'LegNp(T3)(L)'])

In [6]:
MANC_data.conn_filter.roi.unique()

array(['LegNp(T3)(R)', 'LegNp(T3)(L)'], dtype=object)

## Threshold Filtering

Filtering by threshold is adaptive and depends on the type of normalisation being performed on the connection weights.

Normalisation options:

- If normalise is set to None, then connections are thresholded based on absolute/raw weights (synapse count).

- If normalise is set to 'relative', then connections are threshold based on relative weights (% of the connection relative to the post-synaptic neurons total inputs).




Below extracts all connections with a relative weight greater or equal than 1%. 

In [20]:
MANC_data.filter(threshold=0.01, normalise='relative')

In [21]:
MANC_data.conn_filter

Unnamed: 0,bodyId_pre,bodyId_post,roi,weight
6,10000,10110,LTct,0.034120
9,10000,10228,LTct,0.012840
95,10000,14882,LTct,0.034425
103,10000,15505,LTct,0.010377
105,10000,15556,LTct,0.015826
...,...,...,...,...
6253630,53613193093,49438,LegNp(T1)(R),0.025210
6253632,53613193093,49907,LegNp(T1)(R),0.045455
6253633,53613193093,49957,LegNp(T1)(R),0.020833
6253634,53613193093,79077,LegNp(T1)(R),0.037037


Combined with ROI-based filtering, one do this to get the 'strongest' connections in an ROI(s).
Note that everytime filter is called, it uses a new copy of conn_master. So filter operations are not performed on preivously filtered dataframes.

In [24]:
# For example, this only includes connections/synapses within the 'LegNp(T3)(R)' and 'LegNp(T3)(L)' ROI.
# AND have a relative connection weight greater than 0.01
MANC_data.filter(roi=['LegNp(T3)(R)', 'LegNp(T3)(L)'], threshold=0.01, normalise='relative')

In [25]:
MANC_data.conn_filter

Unnamed: 0,bodyId_pre,bodyId_post,roi,weight
1203,10007,32068,LegNp(T3)(R),0.012285
1212,10007,100363,LegNp(T3)(R),0.300000
1218,10007,152218,LegNp(T3)(R),0.072464
1241,10010,10108,LegNp(T3)(L),0.018383
1258,10010,10184,LegNp(T3)(R),0.018616
...,...,...,...,...
6252661,53202877530,25155,LegNp(T3)(R),0.018519
6252674,53202877530,27524,LegNp(T3)(L),0.018519
6253564,53609617775,39340,LegNp(T3)(L),0.014706
6253566,53609617775,47553,LegNp(T3)(L),0.013514


# Connection Annotation and Data Type Conversion

To accomodate for the wide array of analyses and/or simulation done on the connection dataframes (+ neuron metadata), there is the option to convert them into various data formats and to also annotate connections, adding additional information to the connectivity data, based on the neuron metadata.

## Connection Annotation

One useful way to add additional information to the connectivity data is to make them signed, where positive weights are excitatory connections and negative weights are inhibitory connections. The package makes use of the neuron metadata, using the neurotransmitter predictions for a given neuron.

For each neuron, predictions for the neurotransmitter they release are provided (see preprint https://www.biorxiv.org/content/10.1101/2023.06.07.543976v1). For a given connection, the sign is annotated based on the most probably neurotransmitter of the pre-synaptic neuron. 

Currently, it is assumed that GABAergic and glutamatergic neurons have an inhbitory effect, and cholinergic neurons have an excitatory effect. *Note that this holds true for motor neurons, but most of their connections are to muscles and rarely to any other neurons in the VNC.

Note that this step is optional.

In [30]:
# If we use conn_filter above, make_ei annotates the weights as positive or negative
# This is added as a signed_weight.
# The same can be done on connection dataframes that use absolute/raw weights.
annotated_df = MANC_data.make_ei(MANC_data.conn_filter)
annotated_df

Unnamed: 0,bodyId_pre,bodyId_post,roi,weight,weight_map,signed_weight
1203,10007,32068,LegNp(T3)(R),0.012285,1,0.012285
1212,10007,100363,LegNp(T3)(R),0.300000,1,0.300000
1218,10007,152218,LegNp(T3)(R),0.072464,1,0.072464
1241,10010,10108,LegNp(T3)(L),0.018383,1,0.018383
1258,10010,10184,LegNp(T3)(R),0.018616,1,0.018616
...,...,...,...,...,...,...
6252661,53202877530,25155,LegNp(T3)(R),0.018519,1,0.018519
6252674,53202877530,27524,LegNp(T3)(L),0.018519,1,0.018519
6253564,53609617775,39340,LegNp(T3)(L),0.014706,-1,-0.014706
6253566,53609617775,47553,LegNp(T3)(L),0.013514,-1,-0.013514


There is also added functionality for scaling the connection weights based on whether or not they are excitatory or inhibitory. For example, inhibitory weights can be scaled up by a factor of 2 by doing the following:

In [29]:
annotated_df = MANC_data.make_ei(MANC_data.conn_filter, inh_weighting=2)
annotated_df

# Similarly, you can do this for excitatory connections using exc_weighting;
# annotated_df = MANC_data.make_ei(MANC_data.conn_filter, exc_weighting=2)

Unnamed: 0,bodyId_pre,bodyId_post,roi,weight,weight_map,signed_weight
1203,10007,32068,LegNp(T3)(R),0.012285,1,0.012285
1212,10007,100363,LegNp(T3)(R),0.300000,1,0.300000
1218,10007,152218,LegNp(T3)(R),0.072464,1,0.072464
1241,10010,10108,LegNp(T3)(L),0.018383,1,0.018383
1258,10010,10184,LegNp(T3)(R),0.018616,1,0.018616
...,...,...,...,...,...,...
6252661,53202877530,25155,LegNp(T3)(R),0.018519,1,0.018519
6252674,53202877530,27524,LegNp(T3)(L),0.018519,1,0.018519
6253564,53609617775,39340,LegNp(T3)(L),0.014706,-2,-0.029412
6253566,53609617775,47553,LegNp(T3)(L),0.013514,-2,-0.027027


## Data Type Conversion

The main data format is a connectivity matrix; either symmetrical (square) or non-symmetrical. This usually calls conn_filter and hence preprocessing/filtering should be done before hand. There is also the option to do this directly on the unfiltered, full dataset.

In [31]:
# Non-square matrix on the filtered data, annotation connections by setting make_ei to True, 
matrix = MANC_data.df_as_matrix(which='filtered', make_ei=True, inh_weighting=1, exc_weighting=1)
matrix

# To do this on the full data set set which to 'full'.
# matrix = MANC_data.df_as_matrix(which='full', make_ei=True, inh_weighting=1, exc_weighting=1)

  matrix = agg_weights_df.pivot(col_pre, col_post, weight_col)


bodyId_post,32068,100363,152218,10108,10184,10207,10485,10545,10651,10703,...,157418,25225,44595,31329,29305,27689,29727,10896,164512,13780
bodyId_pre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10007,0.012285,0.3,0.072464,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10010,0.000000,0.0,0.000000,0.018383,0.018616,0.023705,0.013732,0.011050,0.016484,0.026734,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10012,0.000000,0.0,0.000000,0.000000,0.013539,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10016,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10018,0.000000,0.0,0.000000,0.024421,0.014786,0.014995,0.000000,0.016396,0.021062,0.017568,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50740666615,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
52788943322,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
53199180469,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
53202877530,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [32]:
# Symmetrical matrix ensuring square shape on the filtered data, annotation connections by setting make_ei to True, 
# With symmetrical matrix, redundant rows/columns are added, but is useful for direct use on some analyses/simulations
matrix = MANC_data.df_as_symmatrix(which='filtered', make_ei=True, inh_weighting=1, exc_weighting=1)
matrix

bodyId_post,10007,10010,10012,10016,10018,10021,10025,10031,10048,10051,...,23162287133,49919962216,49920001117,50740666615,51151062021,52788943322,53199180469,53202877530,53588309438,53609617775
bodyId_pre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10007,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10010,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10012,0.0,0.0,0.0,0.0,0.0,0.043902,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10016,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10018,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52788943322,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
53199180469,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
53202877530,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
53588309438,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The other data format are as graph objects, either as NetworkX or iGraph directed weighted graphs.

In [33]:
# Currently, this sets it to an attribute rather than a variable as it can take time to convert compared to connectivity matrices.
# Doing this 1) reduces the number of total graphs in the current notebook, and 2) provides a referencable instance in the case that
# the user forgets to save the object with a variable.


# i.e. To get the full VNC/MANC as an iGraph object, with annotated/signed edges
MANC_data.df_as_graph(which='full', make_ei=True, package='igraph')

# To instead get NetworkX objects (nx.DiGraph), set package to networkx
MANC_data.df_as_graph(which='full', make_ei=True, package='networkx')

In [None]:
# Then to access the object simply call:
MANC_data.graph

# Analysis Package Summary

The analysis package houses most of the analysis and visualisation. Contrary to ConnDF, multiple classes exist here:
- ClusterDF: For premotor/motor neuron clustering. This also has a user-interactive UI that performs all of the clustering and visualisation steps.
- SpikeAnalysis: This contains all the pre-processing, analysis and visusalisations for the spike data generated by simulations of the VNC. 