# Obtaining neuron datasets from the Janelia fruiftly brain project
Below is the code used to obtain neuron skeletons, but also further processing to obtain only the endpoints of the skeleton segments.


Please register and obtain your token at https://neuprint.janelia.org/help/api in order to access the data.

In [2]:
from neuprint import Client

TOKEN = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJlbWFpbCI6Imx1a2FibGFnb2pldmljMTk5NUBnbWFpbC5jb20iLCJsZXZlbCI6Im5vYXV0aCIsImltYWdlLXVybCI6Imh0dHBzOi8vbGgzLmdvb2dsZXVzZXJjb250ZW50LmNvbS9hLS9BT2gxNEdqdDZpdFFGR2xTSWZUTElNTjRmcEt1QzZ3QmE2Rlp0WU1XYmpKV1ZBPXM5Ni1jP3N6PTUwP3N6PTUwIiwiZXhwIjoxODA5MTE1OTEyfQ.0h6CJp8xfQEpkW8a2_gqJUBrEA5GyBiZkNvDjRpoXoY" # <--- Paste your token here

c = Client('neuprint.janelia.org', 'hemibrain:v1.2.1', TOKEN)

The function below  gives a dictionary with neural regions and their corresponding subregions:

In [3]:
from neuprint import fetch_roi_hierarchy
roi_dict = fetch_roi_hierarchy(False, True, 'dict')
roi_dict['hemibrain'].keys()

dict_keys(['AL(L)*', 'AL(R)*', 'AOT(R)', 'CX', 'GC', 'GF(R)', 'GNG*', 'INP', 'LH(R)*', 'LX(L)', 'LX(R)', 'MB(+ACA)(R)', 'MB(L)', 'OL(R)', 'PENP', 'POC', 'SNP(L)', 'SNP(R)', 'VLNP(R)', 'VMNP', 'mALT(L)', 'mALT(R)'])

I chose only one ('ME(R)') region to analyze, out of whole dataset and I set the criteria to 1 in-going and out-going synapse for neurons to be selected. This is because neurons may pass through and connect mutiple different regions and may share varying number of connections in each.

In [4]:
from neuprint import fetch_adjacencies, NeuronCriteria as NC
from neuprint import fetch_neurons
import pandas as pd
roi_list = ['ME(R)']
dataset_lengths = {}
neuron_df_dict = {}
for region in roi_list:
    if  region[-1:] == '*':
        region = region[:-1]
    # The criteria is set to one input synapse and oneoutput synapse, both within the selected region
    criteria = NC(min_roi_inputs=1,min_roi_outputs=1, inputRois=[region], outputRois=[region])
    # Fetching the neuron and region information based on the criteria
    neuron_df, roi_counts_df = fetch_neurons(criteria)
    # Saving and storing the information
    dataset_lengths[region] = len(neuron_df)
    neuron_df.to_csv('neuron_regions_information/' + region+ '_region.csv')
    neuron_df_dict[region] = neuron_df

print('Number of neurons in selected datasets:')
dataset_lengths

Number of neurons in selected datasets:


{'ME(R)': 3721}

Based on the acquired neuron information, I downloaded the exact skeleton (morphological) information, which is stored a series of connected linear segments. I only used the starting and ending point of each segment for the purpose of this research:

In [11]:
from neuprint import fetch_skeleton
# Columns I will filter by, coordinates, radius information and neuron identity (to which neuron does a point belong to)
columns = ['x','y','z','radius','bodyId']
# Storing missing/present neuron ids
missing_ids = []
present_ids = []

skeleton_df_dict = {}
skeleton_df = pd.DataFrame(columns=columns)
for region in roi_list:
    for i,bodyId in enumerate(neuron_df_dict[region]['bodyId']):
        if i % 500 == 0:
            print('Completed',i,'out of',dataset_lengths[region],'neurons for the region',region)
        # Try and except are used if there is an error in accesing the database
        try:
            # Fetching the skeleton based on the neuron bodyId
            s =  c.fetch_skeleton(body = int(bodyId),format='pandas')
            s['bodyId'] = str(bodyId)
            present_ids.append(bodyId)
            skeleton_df = pd.concat([skeleton_df,s])
        except:
            missing_ids.append(bodyId)
    skeleton_df = skeleton_df[columns].copy()
    skeleton_df.to_csv('neuron_regions_points/' + region+'_points.csv')
    skeleton_df_dict[region] = skeleton_df
print("Data succesfully obtained")                          
print('There was total of', dataset_lengths[region],'neurons.','Out of them',len(missing_ids),'were missing and',len(present_ids),'were downloaded')       

There was total of 3721 neurons. Out of them 742 were missing and 2979 were downloaded


Each region will have a set of points, with corresponding radii and neuron identites (bodyId):

In [10]:
skeleton_df_dict['ME(R)']

Unnamed: 0,x,y,z,radius,bodyId
0,9192.0,17552.0,9100.0,12.000000,543702186
1,9192.0,17576.0,9100.0,12.000000,543702186
2,9216.0,17600.0,9100.0,21.941099,543702186
3,9264.0,17648.0,9100.0,21.941099,543702186
4,9336.0,17696.0,9100.0,63.894699,543702186
...,...,...,...,...,...
15257,13954.0,11840.0,30690.0,11.000000,7112616299
15258,13954.0,11818.0,30690.0,11.000000,7112616299
15259,13932.0,11796.0,30712.0,11.000000,7112616299
15260,13932.0,11774.0,30712.0,11.000000,7112616299
