# Data S4 FAFB connectomics Investigation

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Initial Exploration of FAFB neurons used

In [2]:
s4_df = pd.read_csv("../data/DataS4_FAFBreconstruction.csv")
s4_df.head()

  s4_df = pd.read_csv("../data/DataS4_FAFBreconstruction.csv")


Unnamed: 0,root_630,root_783,pos_x,pos_y,pos_z,nucleus_id,side,ito_lee_hemilineage,hartenstein_hemilineage,morphology_group,...,acetylcholine,gaba,glutamate,dopamine,serotonin,octopamine,segregation_index,projection_score,in_ground_truth,notes
0,720575940603453286,720575940603453286,138500.0,56630.0,2156,5056135.0,right,ALad1,BAmv3,ALad1__1,...,0.8,0.0,0.0,0.1,0.0,0.0,0.425,0.425,False,
1,720575940625413395,720575940625413395,113000.0,59340.0,1655,4491112.0,left,ALad1,BAmv3,ALad1__1,...,0.8,0.0,0.0,0.1,0.0,0.0,0.543,0.543,True,
2,720575940624789125,720575940624789125,140300.0,58960.0,1734,5056625.0,right,ALad1,BAmv3,ALad1__1,...,0.8,0.0,0.1,0.1,0.1,0.0,0.308,0.308,False,
3,720575940629743415,720575940630003472,139800.0,59110.0,1788,5057631.0,right,ALad1,BAmv3,ALad1__1,...,0.8,0.0,0.0,0.0,0.1,0.0,0.297,0.297,False,
4,720575940629587671,720575940629587671,171200.0,33940.0,3250,5056503.0,right,ALad1,BAmv3,ALad1__1,...,0.8,0.0,0.1,0.1,0.0,0.0,0.258,0.258,False,


In [4]:
all_columns = s4_df.columns.tolist()
print("All columns in the DataFrame:")
for col in all_columns:
    print(f"    {col}")

All columns in the DataFrame:
    root_630
    root_783
    pos_x
    pos_y
    pos_z
    nucleus_id
    side
    ito_lee_hemilineage
    hartenstein_hemilineage
    morphology_group
    cell_class
    cell_sub_class
    cell_type
    hemibrain_type
    pre
    conf_nt
    conf_nt_p
    top_nt
    top_nt_p
    known_nt
    known_nt_source
    acetylcholine
    gaba
    glutamate
    dopamine
    serotonin
    octopamine
    segregation_index
    projection_score
    in_ground_truth
    notes


Ok so here's the quick breakdown of the columns and their meanings
| Column Name                                                                 | Description                                                                                                               |
| --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `root_630`                                                                  | **Neuron ID** in FlyWire version 630. Use this to view the neuron at [https://flywire.ai](https://flywire.ai).            |
| `root_783`                                                                  | Neuron ID in FlyWire version 783. Used internally for improved morphology and synapse annotation. Not publicly browsable. |
| `pos_x`, `pos_y`, `pos_z`                                                   | Approximate soma coordinates (in nm) for the neuron within the EM volume.                                                 |
| `nucleus_id`                                                                | ID of the nucleus associated with the neuron (may be `NaN` if unknown).                                                   |
| `side`                                                                      | Hemisphere the neuron resides in: `left` or `right`.                                                                      |
| `ito_lee_hemilineage`                                                       | Hemilineage name based on the Ito & Lee lineage naming convention.                                                        |
| `hartenstein_hemilineage`                                                   | Hemilineage name based on the Hartenstein lineage schema.                                                                 |
| `morphology_group`                                                          | Cluster of neurons with similar morphology, often indicating a cell type.                                                 |
| `cell_class`                                                                | Broad functional group (e.g. `sensory`, `interneuron`, `projection neuron`).                                              |
| `cell_sub_class`                                                            | Subdivision of cell class, if applicable.                                                                                 |
| `cell_type`                                                                 | Named or inferred cell type label (if available).                                                                         |
| `hemibrain_type`                                                            | Matching cell type from the Hemibrain dataset, if available.                                                              |
| `pre`                                                                       | Total number of **presynaptic sites** assigned to this neuron.                                                            |
| `conf_nt`                                                                   | Predicted neurotransmitter with **highest confidence**.                                                                   |
| `conf_nt_p`                                                                 | Proportion of presynaptic sites supporting `conf_nt`. (Confidence value, 0–1).                                            |
| `top_nt`                                                                    | Predicted transmitter based on **top probability**, even if not majority.                                                 |
| `top_nt_p`                                                                  | Probability of `top_nt` according to classifier.                                                                          |
| `known_nt`                                                                  | Literature-based known transmitter, if available.                                                                         |
| `known_nt_source`                                                           | Citation or source for `known_nt`.                                                                                        |
| `acetylcholine`, `gaba`, `glutamate`, `dopamine`, `serotonin`, `octopamine` | Per-class classifier confidence scores for each neurotransmitter (range: 0–1).                                            |
| `segregation_index`                                                         | Measure of how spatially segregated the axon and dendrite compartments are. Higher values imply clearer separation.       |
| `projection_score`                                                          | A score based on axon projection length and morphology—indicative of long-range projection neurons.                       |
| `in_ground_truth`                                                           | `True` if this neuron was used as part of the supervised ground-truth training set.                                       |
| `notes`                                                                     | Free-form field for manual annotations, often empty.                                                                      |

So the key columns I will use are:
* `root_630`: As the NeuronId
* `pos_x`, `pos_y`, and `pos_z`: For the approximate location of the neuron
* `cell_class`
* `cell_type`
* `pre`: The number of pre-synaptic sites.
* `conf_nt`: The models prediction for the neuron.
* `known_nt`: The known neurotransmitter types.
* `acetylcholine`, `gaba`, `glutamate`, `dopamine`, `serotonin`, `octopamine`: The ratio of synapses expressing the neurotransmitter in the neuron.
* `in_ground_truth`: Whether this is in the ground truth dataset or not.


In [9]:
known_nts = s4_df['known_nt'].value_counts()
display(known_nts)

known_nt
acetylcholine                           31646
glutamate                                6857
gaba, nitric oxide                       3127
acetylcholine, allatostatin-c            2975
acetylcholine, sNPF, sparkly             2490
                                        ...  
dopamine, sparkly, Nplp1                    2
serotonin, natalisin                        2
gaba, acetylcholine                         1
allatostatin-a, allatostatin-c, Dh44        1
glycine, pdf                                1
Name: count, Length: 72, dtype: int64

Ok so there are 72 different mixes here which we will need to trawl through. But first let's only investigate those part of the ground truth 

df = 