# The instruction for data acquired from neuPrint

The instructions below was based on the description from [neuPrintExplorer](https://neuprint.janelia.org/help/cypherexamples). For more information, please visit the original website. A technical neuPrint paper exists [here](https://www.biorxiv.org/content/10.1101/2020.01.16.909465v1.full).

## Detailed information about the data

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Neuprint_connections.csv')
df.head

<bound method NDFrame.head of          Unnamed: 0  bodyId_pre  bodyId_post     roi  weight type_pre  \
0                 0   106979579   1040004619     SAD       1      NaN   
1                 1   106979579   2099569651     SAD       1      NaN   
2                 2   106979579   2157206476     SAD       1      NaN   
3                 3   106979579   2535443901     SAD       1      NaN   
4                 4   106979579   5813024910     SAD       1      NaN   
...             ...         ...          ...     ...     ...      ...   
8034505     8034505  7112626669   5813018847   AL(R)       1      NaN   
8034506     8034506  7112626733    327588225  SMP(R)       1      NaN   
8034507     8034507  7112626733    330640044  SMP(R)       1      NaN   
8034508     8034508  7112626733    357945155  SMP(R)       1      NaN   
8034509     8034509  7112626733   5813130028  SMP(R)       1      NaN   

        instance_pre type_post instance_post  
0           Franken1     CL304       CL304_R  

In [2]:
Id_pre = list(df['bodyId_pre'])
Id_post = list(df['bodyId_post'])
R_l = list(df['roi'])
wt = list(df['weight'])
t_pre = list(df['type_pre'])
ins_pre = list(df['instance_pre'])
t_post = list(df['type_post'])
ins_post = list(df['instance_post'])

### bodyId
bodyId is the unique identifier of a body in neuPrint. A body is a segmentation piece with at least 1 synapse. They are treated as nodes in this graph.

### nodes and edges
As shown above, the total number of edges in this graph is 8034510.

In [3]:
print('The number of source nodes is', len(pd.value_counts(Id_pre)))

The number of source nodes is 138578


In [4]:
print('The number of target nodes is', len(pd.value_counts(Id_post)))

The number of target nodes is 165435


In [5]:
print('The total number of nodes is', len(pd.value_counts(Id_pre + Id_post)))

The total number of nodes is 179907


### ROI
There are 64 regions of interest (ROI), the name of the regions and the numbers of connections in each ROI are listed below.

In [6]:
print(pd.value_counts(R_l))

LO(R)      752331
SMP(R)     494343
SLP(R)     434722
SMP(L)     403630
AVLP(R)    386241
            ...  
CAN(R)       4147
BU(R)        3208
BU(L)        2603
AB(L)        1272
PRW           478
Length: 64, dtype: int64


### weight
Weight indicates the number of synapese in a connection between two neurons. 
- It ranges from 1 to 1409 in this graph. The total counts for the connections with different weight are listed below. 
- The first column shows the value of the weight, and the sencond column shows the number of connections with corresponding weight value.

In [7]:
print(pd.value_counts(wt))

1       4663656
2       1449485
3        627064
4        336346
5        206011
         ...   
329           1
328           1
326           1
325           1
1409          1
Length: 398, dtype: int64


### type
Type contains the neuron type of the body.
- By default cells get a type of the form NPXXX, where NP is a acronym for the neuropil with the largest overlap, such as CL for clamp, and XXX is numeric id
- If a cell is clearly recognised as a previously published type, then the systematic name will instead be replaced by that published name
- The types and the counts are listed in the blocks below. The first block shows the types of presynaptic neurons, and the second block shows the types of postsynaptic neurons. The first column in each blocks shows the name of the cell types, and the sencond column shows the number of of the corresponding type. 

In [8]:
print(pd.value_counts(t_pre))

KCg-m      197167
KCab-m     103932
KCab-s      67060
KCab-c      63125
LC10        40780
            ...  
WED183          2
DNa07           2
WED084          1
DNES1           1
AOTU036         1
Length: 5617, dtype: int64


In [9]:
print(pd.value_counts(t_post))

KCg-m      184800
KCab-m      98311
KCab-s      61634
KCab-c      56578
LC10        43556
            ...  
PS229          34
H1             26
DNp28          19
ORN_VM2        15
OCG09          12
Length: 5620, dtype: int64


### instance
Instance indicates a name that indicates a more specific instance of a neuron type.
- The instance and the counts are listed in the blocks below. The first block shows the instance of presynaptic neurons, and the second block shows the instance of postsynaptic neurons. The first column in each blocks shows the name of the instance, and the sencond column shows the number of of the corresponding instance. 

In [10]:
print(pd.value_counts(ins_pre))

KCg-m_R        197167
KCab-m_R       103932
KCab-s_R        67060
KCab-c_R        63125
(MBDLaxon1)     48392
                ...  
(PDM31)_R           1
DNES1_R             1
AOTU036_R           1
(ADM12)_L           1
(AVM09)_R           1
Length: 7792, dtype: int64


In [11]:
print(pd.value_counts(ins_post))

KCg-m_R      184800
KCab-m_R      98311
KCab-s_R      61634
KCab-c_R      56578
LC10          42490
              ...  
(PVM02)_R         5
Franken1          5
Franken6          5
(PDM31)_R         4
(AVL22)_R         2
Length: 7802, dtype: int64


## More information about the node attributes

:Segment (:Neuron) nodes

#### bodyId: a unique number for each distinct segment

#### pre: Number of pre-synaptic sites on the segment

#### post: Number of post-synaptic sites on the segment

#### type: Cell type name for given neuron (if provided)

#### instance: String identifier for a neuron (if provided)

#### size: Number of voxels in the body

#### roiInfo: JSON string showing the pre and post breakdown for each ROI the neuron intersects.

#### roi: This property only exists for the ROIs that intersect this segment

#### tatus: Reconstruction status for a neuron. By convention, we broadly consider proofread neurons as being “Traced”.

#### cropped: Since datasets often involve a portion of a larger brain, cropped indicates that a significant portion of a neuron is cut-off by the dataset extents. By convention, all “Traced” neurons should be explicitly noted whether they are cropped or not.

_quote from [here](https://www.biorxiv.org/content/10.1101/2020.01.16.909465v1.full)_