---

**Runtime Configuration:** This notebook has a paired setup script at `runtimes/fly_connectome_03_connectivity_analyses_post_startup.sh` which provides the complete installation recipe for all dependencies. This script can be used as a post-startup script for Google Colab to automatically configure the environment.

---

# Core Tutorial

## Setup and Configuration

In [1]:
# Dataset configuration
# Options: "banc_746", "fafb_783", "manc_121", "hemibrain_121", "malecns_09"
DATASET = "banc_746"
DATASET_ID = "banc_746_id"
SUBSET_NAME = "front_leg"  # Optional: use subset data if available

# Data location - can be GCS bucket or local path
# Option 1 (GCS - default): Access data directly from Google Cloud Storage
DATA_PATH = "gs://sjcabs_2025_data"

# Option 2 (Local): Use local copy if you've downloaded the data
# DATA_PATH = "/path/to/local/sjcabs_data"
# Example: DATA_PATH = "~/data/sjcabs_data"

# Detect if using GCS or local path
USE_GCS = DATA_PATH.startswith("gs://")

# Image output directory
import os
IMG_DIR = "images/tutorial_03"
os.makedirs(IMG_DIR, exist_ok=True)

print(f"Dataset: {DATASET}")
print(f"Data location: {DATA_PATH}")
print(f"Using GCS: {USE_GCS}")
print(f"Images will be saved to: {IMG_DIR}")

Dataset: banc_746
Data location: gs://sjcabs_2025_data
Using GCS: True
Images will be saved to: images/tutorial_03


## Import Packages

In [2]:
# Environment detection and Colab setup (auto-configured)
try:
    import google.colab
    IN_COLAB = True
    
    # Colab setup
    
    # Authenticate
    from google.colab import auth
    auth.authenticate_user()
    print("✓ Authenticated with Google Cloud")
    
    # Download utils.py
    import urllib.request, os
    HELPER_URL = "https://raw.githubusercontent.com/sjcabs/fly_connectome_data_tutorial/main/python/utils.py"
    if not os.path.exists("utils.py"):
        urllib.request.urlretrieve(HELPER_URL, "setup_helpers.py")
    
    print("✓ Colab environment ready\n")
except ImportError:
    IN_COLAB = False
    # Local environment - no output needed
    pass

In [3]:
# Import standard libraries and data science packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import gcsfs
import glob
import networkx as nx
import umap
from scipy.spatial.distance import squareform
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster, cut_tree

# Import neuroscience packages
import navis
import trimesh

# Import helper functions from utils
from utils import construct_path, read_feather_gcs, read_obj_from_gcs

print(f"pandas version: {pd.__version__}")
print(f"networkx version: {nx.__version__}")

✓ Packages loaded successfully
pandas version: 2.3.1
networkx version: 3.6.1


## Setup GCS Access

In [4]:
if USE_GCS:
    gcs = gcsfs.GCSFileSystem(token='google_default')
    print("✓ GCS filesystem initialized")
else:
    gcs = None
    print("Using local filesystem")

✓ GCS filesystem initialized


## Load Data

We'll load:
1. **Metadata** - neuron annotations
2. **Edgelist** - neuron-to-neuron connectivity

In [5]:
# Construct edgelist path (use subset if specified)
dataset_base = DATASET.split("_")[0]

if SUBSET_NAME is not None:
    # Use subset-specific edgelist
    subset_dir = f"{DATA_PATH}/{dataset_base}/{SUBSET_NAME}"
    edgelist_filename = f"{DATASET}_{SUBSET_NAME}_simple_edgelist.feather"
    edgelist_path = f"{subset_dir}/{edgelist_filename}"
    print(f"Using subset: {SUBSET_NAME}")
else:
    # Use full edgelist
    edgelist_path = construct_path(DATA_PATH, DATASET, "edgelist_simple")
    print("Using full dataset (this may be slow)")

print(f"Edgelist path: {edgelist_path}")

# Meta path (always uses full meta, not subset)
meta_path = construct_path(DATA_PATH, DATASET, "meta")
print(f"Meta path: {meta_path}")


Using subset: front_leg
Edgelist path: gs://sjcabs_2025_data/banc/front_leg/banc_746_front_leg_simple_edgelist.feather
Meta path: gs://sjcabs_2025_data/banc/banc_746_meta.feather


### Load Metadata

In [6]:
meta = read_feather_gcs(meta_path, gcs_fs=gcs)

print(f"\nTotal neurons: {len(meta):,}")
meta.head()

📦 Loading from cache: sjcabs_2025_data_banc_banc_746_meta.feather
✓ Loaded 168,759 rows (cached)

Total neurons: 168,759


Unnamed: 0,banc_746_id,supervoxel_id,region,side,hemilineage,nerve,flow,super_class,cell_class,cell_sub_class,cell_type,neurotransmitter_predicted,neurotransmitter_score,cell_function,cell_function_detailed,body_part_sensory,body_part_effector,status
0,720575941569192238,74803281603754231,central_brain,right,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",acetylcholine,0.7534,,,,,
1,720575941574697871,74873512908765054,central_brain,right,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",acetylcholine,0.7976,,,,,
2,720575941652939029,77477362601861709,central_brain,left,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",dopamine,0.5825,,,,,TRACING_ISSUE_2
3,720575941452014202,74310563223910394,central_brain,right,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",acetylcholine,0.5704,,,,,
4,720575941565035527,77406993858043384,central_brain,left,VPNp1_medial,,intrinsic,central_brain_intrinsic,,,"(PLP191,PLP192)a",acetylcholine,0.6317,,,,,


### Load Edgelist (Connectivity Data)

In [7]:
edgelist = read_feather_gcs(edgelist_path, gcs_fs=gcs)

# Rename norm column to weight for consistency
if 'norm' in edgelist.columns and 'weight' not in edgelist.columns:
    edgelist['weight'] = edgelist['norm']

print(f"\nTotal connections: {len(edgelist):,}")
print(f"Columns: {list(edgelist.columns)}")
edgelist.head()
# Filter meta to neurons present in edgelist (matches R version)
neuron_ids = set(edgelist["pre"].unique()) | set(edgelist["post"].unique())
meta = meta[meta[DATASET_ID].isin(neuron_ids)].copy()
print(f"Filtered meta to {len(meta):,} neurons present in edgelist")


📦 Loading from cache: sjcabs_2025_data_banc_front_leg_banc_746_front_leg_simple_edgelist.feather
✓ Loaded 348,856 rows (cached)

Total connections: 348,856
Columns: ['pre', 'post', 'count', 'norm', 'total_input', 'weight']
Filtered meta to 3,532 neurons present in edgelist


## Visualizing Neuropil Meshes

Before analyzing connectivity, let's visualize the 3D structure of the suboesophageal zone (SEZ) neuropils to understand their spatial organization.

**Mesh organization:**
- **Large anatomical regions** (VNC, brain, etc.): `obj/` directory
- **Smaller specific neuropils** (GNG, FLA, AMMC, etc.): `obj/neuropils/` subdirectory

In [8]:
# Helper function to load neuropil meshes
def load_neuropil_mesh(search_pattern, data_path, dataset_base, subdirectory="neuropils"):
    """Load a neuropil mesh from GCS and convert to navis.Volume"""
    if subdirectory:
        mesh_path = f"{data_path}/{dataset_base}/obj/{subdirectory}"
    else:
        mesh_path = f"{data_path}/{dataset_base}/obj"
    
    if USE_GCS:
        gcs_mesh_path = mesh_path.replace("gs://", "")
        try:
            mesh_files = gcs.ls(gcs_mesh_path)
            mesh_files = [f"gs://{f}" for f in mesh_files if f.endswith('.obj')]
        except:
            print(f"  Could not list files in {mesh_path}")
            return None
    else:
        import glob
        mesh_files = glob.glob(f"{mesh_path}/*.obj")
    
    # Find files matching the search pattern
    matching_files = [f for f in mesh_files if search_pattern.lower() in f.lower()]
    
    if len(matching_files) == 0:
        print(f"  No files found matching: {search_pattern}")
        return None
    
    # Load the first matching file
    mesh_file = matching_files[0]
    
    try:
        if USE_GCS:
            mesh = read_obj_from_gcs(gcs, mesh_file.replace('gs://', ''))
        else:
            mesh = trimesh.load(mesh_file)
        
        # Convert trimesh to navis.Volume
        volume = navis.Volume(mesh.vertices, mesh.faces, name=search_pattern)
        return volume
    except Exception as e:
        print(f"  Failed to load {search_pattern}: {e}")
        return None

# Load brain mesh (large region, for context)
print("Loading neuropil meshes...")
brain_volume = load_neuropil_mesh("brain", DATA_PATH, dataset_base, subdirectory="")

# Load SEZ neuropils (smaller specific regions)
gng_volume = load_neuropil_mesh("GNG", DATA_PATH, dataset_base)
fla_l_volume = load_neuropil_mesh("FLA_L", DATA_PATH, dataset_base)
fla_r_volume = load_neuropil_mesh("FLA_R", DATA_PATH, dataset_base)
sad_volume = load_neuropil_mesh("SAD", DATA_PATH, dataset_base)
prw_volume = load_neuropil_mesh("PRW", DATA_PATH, dataset_base)
ammc_l_volume = load_neuropil_mesh("AMMC_L", DATA_PATH, dataset_base)
ammc_r_volume = load_neuropil_mesh("AMMC_R", DATA_PATH, dataset_base)

# Collect all loaded volumes and set their colors/alphas
volumes = []

if brain_volume:
    brain_volume.color = (0.827, 0.827, 0.827, 0.1)  # lightgrey with alpha=0.1
    volumes.append(brain_volume)

if gng_volume:
    gng_volume.color = (1.0, 0.0, 0.0, 0.5)  # red with alpha=0.5
    volumes.append(gng_volume)

if fla_l_volume:
    fla_l_volume.color = (0.0, 0.0, 1.0, 0.5)  # blue with alpha=0.5
    volumes.append(fla_l_volume)

if fla_r_volume:
    fla_r_volume.color = (0.678, 0.847, 0.902, 0.5)  # lightblue with alpha=0.5
    volumes.append(fla_r_volume)

if sad_volume:
    sad_volume.color = (0.0, 0.502, 0.0, 0.5)  # green with alpha=0.5
    volumes.append(sad_volume)

if prw_volume:
    prw_volume.color = (0.502, 0.0, 0.502, 0.5)  # purple with alpha=0.5
    volumes.append(prw_volume)

if ammc_l_volume:
    ammc_l_volume.color = (1.0, 0.647, 0.0, 0.5)  # orange with alpha=0.5
    volumes.append(ammc_l_volume)

if ammc_r_volume:
    ammc_r_volume.color = (1.0, 0.627, 0.478, 0.5)  # lightsalmon with alpha=0.5
    volumes.append(ammc_r_volume)

# Plot all neuropils in 3D using navis
if len(volumes) > 0:
    print(f"\n✓ Loaded {len(volumes)} neuropil meshes")
    print("Visualizing neuropils with navis...")
    
    # Plot with navis - colors/alphas are set on Volume objects
    fig = navis.plot3d(
        volumes,
        backend='plotly',
        width=1200,
        height=800,
        title='Suboesophageal Zone Neuropils'
    )
    
    if fig is not None:
        fig.show()
    else:
        print("Note: Plot was displayed inline")
else:
    print("⚠ No neuropil meshes loaded")

Loading neuropil meshes...


  stacked = np.column_stack(stacked).round().astype(np.int64)



✓ Loaded 8 neuropil meshes
Visualizing neuropils with navis...


Note: Plot was displayed inline


## Meta Data Overview

Let's examine the distribution of neurons by `super_class` and `cell_class` in our subset.

### Neurotransmitter Consensus

First, we'll establish consensus neurotransmitter for each neuron based on prediction scores.

In [9]:
# Get neurotransmitter with highest score for each neuron
meta['nt'] = meta['neurotransmitter_predicted']

# Classify as excitatory or inhibitory
excitatory_nts = ['acetylcholine', 'glutamate']
inhibitory_nts = ['gaba', 'glycine']

meta['nt_type'] = 'other'
meta.loc[meta['nt'].isin(excitatory_nts), 'nt_type'] = 'excitatory'
meta.loc[meta['nt'].isin(inhibitory_nts), 'nt_type'] = 'inhibitory'

print("Neurotransmitter distribution:")
print(meta['nt'].value_counts())
print("\nNeurotransmitter type distribution:")
print(meta['nt_type'].value_counts())

Neurotransmitter distribution:
nt
acetylcholine    2079
gaba              970
glutamate         420
serotonin          17
dopamine            8
octopamine          8
histamine           5
Name: count, dtype: int64

Neurotransmitter type distribution:
nt_type
excitatory    2499
inhibitory     970
other           63
Name: count, dtype: int64


## Neurotransmitter Prediction and Connectivity Signs

To understand whether connections are excitatory or inhibitory, we use predicted neurotransmitter information.

We'll assign signs to connections:
- **Excitatory** (sign: +1): acetylcholine, glutamate
- **Inhibitory** (sign: -1): GABA, glycine

This creates signed connectivity weights that capture both connection strength and likely sign.

### Add Signed Connectivity

Infer edge sign based on presynaptic neuron's neurotransmitter.

In [10]:
# Map neurotransmitter types to edges
nt_map = dict(zip(meta[DATASET_ID], meta['nt_type']))
edgelist['nt_type'] = edgelist['pre'].map(nt_map)

# Assign sign: excitatory = +1, inhibitory = -1, other = 0
edgelist['sign'] = 0
edgelist.loc[edgelist['nt_type'] == 'excitatory', 'sign'] = 1
edgelist.loc[edgelist['nt_type'] == 'inhibitory', 'sign'] = -1

# Create signed weight
edgelist['weight_signed'] = edgelist['weight'] * edgelist['sign']

print(f"\n✓ Added signed connectivity")
print(f"Excitatory edges: {(edgelist['sign'] == 1).sum():,}")
print(f"Inhibitory edges: {(edgelist['sign'] == -1).sum():,}")
print(f"Other edges: {(edgelist['sign'] == 0).sum():,}")


✓ Added signed connectivity
Excitatory edges: 232,518
Inhibitory edges: 111,903
Other edges: 4,435


## Signed Weight Distribution

Let's visualize the distribution of signed connection weights to compare excitatory and inhibitory connections.

**Note:** While this signed classification is useful for circuit diagrams, it's important not to overstate its precision. In flies, glutamate can be excitatory or inhibitory, and inhibition doesn't only quell activity - it can enable it through disinhibition. Consider the visual system where histaminergic photoreceptors inhibit their targets.

In [11]:
# Filter edges with non-zero sign for visualization
edgelist_signed = edgelist[edgelist['sign'] != 0].copy()

# Create histogram of signed weights
fig = go.Figure()

fig.add_trace(go.Histogram(
    x=edgelist_signed[edgelist_signed['sign'] == 1]['weight'],
    name='Excitatory',
    marker_color='red',
    opacity=0.7,
    nbinsx=50
))

fig.add_trace(go.Histogram(
    x=edgelist_signed[edgelist_signed['sign'] == -1]['weight'],
    name='Inhibitory',
    marker_color='blue',
    opacity=0.7,
    nbinsx=50
))

fig.update_layout(
    title='Distribution of Signed Connection Weights',
    xaxis_title='Connection Weight (synapse count)',
    yaxis_title='Count',
    barmode='overlay',
    template='plotly_white',
    height=500
)

fig.write_html(f"{IMG_DIR}/{DATASET}_signed_weight_distribution.html")
fig.show()

print(f"✓ Saved signed weight distribution plot")
print(f"Excitatory connections: {(edgelist['sign'] == 1).sum():,}")
print(f"Inhibitory connections: {(edgelist['sign'] == -1).sum():,}")

✓ Saved signed weight distribution plot
Excitatory connections: 232,518
Inhibitory connections: 111,903


## Neuron Class Distribution

Let's examine the distribution of neurons by super_class.

In [12]:
# Count neurons by super_class
super_class_counts = meta['super_class'].value_counts().reset_index()
super_class_counts.columns = ['super_class', 'count']

# Create bar plot
fig = px.bar(
    super_class_counts,
    x='super_class',
    y='count',
    color='super_class',
    title='Neuron Distribution by Super Class',
    labels={'super_class': 'Super Class', 'count': 'Number of Neurons'},
    template='plotly_white',
    height=500
)

fig.update_layout(showlegend=False, xaxis_tickangle=-45)
fig.write_html(f"{IMG_DIR}/{DATASET}_super_class.html")
fig.show()

print(f"✓ Saved super_class distribution plot")
print(f"\nSuper class distribution:")
print(super_class_counts)

✓ Saved super_class distribution plot

Super class distribution:
                    super_class  count
0  ventral_nerve_cord_intrinsic   1692
1                       sensory    814
2                     ascending    570
3                    descending    339
4                         motor     78
5             sensory_ascending     18
6          visceral_circulatory      6
7            sensory_descending      2


## Basic Network Statistics

Next, we'll examine basic properties of the connectivity graph using network analysis.

### Weight Correlation Analysis

Examine the relationship between synapse count and normalized weight.

### Flow Subclass Analysis

If we have sensory (afferent) or effector (efferent) neurons, let's examine their cell sub-classes.

In [13]:
# Get sensory and effector neurons
flow_subset = meta[meta['flow'].isin(['afferent', 'efferent'])].copy()

if len(flow_subset) > 0:
    # Get top 15 cell sub-classes per flow type
    flow_counts = flow_subset.groupby(['flow', 'cell_sub_class']).size().reset_index(name='count')
    top_per_flow = flow_counts.sort_values(['flow', 'count'], ascending=[True, False])
    top_per_flow = top_per_flow.groupby('flow').head(15)
    
    # Rename flow types for display
    top_per_flow['flow_label'] = top_per_flow['flow'].map({
        'afferent': 'Sensory (afferent)',
        'efferent': 'Effector (efferent)'
    })
    
    # Create faceted bar plot
    fig = px.bar(
        top_per_flow,
        x='cell_sub_class',
        y='count',
        color='cell_sub_class',
        facet_col='flow_label',
        title='Top 15 Cell Sub-Classes per Flow Type',
        labels={'cell_sub_class': 'Cell Sub-Class', 'count': 'Number of Neurons'},
        template='plotly_white',
        height=500
    )
    
    fig.update_xaxes(tickangle=-45)
    fig.update_layout(showlegend=False)
    fig.write_html(f"{IMG_DIR}/{DATASET}_flow_subclass.html")
    fig.show()
    
    print(f"✓ Saved flow subclass plot")
    print(f"Sensory neuron count: {(flow_subset['flow'] == 'afferent').sum():,}")
    print(f"Effector neuron count: {(flow_subset['flow'] == 'efferent').sum():,}")
else:
    print("No sensory or effector neurons in this subset.")

✓ Saved flow subclass plot
Sensory neuron count: 834
Effector neuron count: 84


In [14]:
# Sample if too many points
if len(edgelist) > 50000:
    edgelist_sample = edgelist.sample(n=50000, random_state=42)
    print(f"Sampling 50,000 connections for visualization")
else:
    edgelist_sample = edgelist

# Calculate Spearman correlation
from scipy.stats import spearmanr
corr, _ = spearmanr(edgelist['count'], edgelist['norm'])

# Create scatter plot
fig = px.scatter(
    edgelist_sample,
    x='count',
    y='norm',
    log_x=True,
    log_y=True,
    opacity=0.3,
    title=f'Relationship between Synapse Count and Normalized Weight<br><sub>Spearman correlation: {corr:.3f}</sub>',
    labels={'count': 'Synapse Count (log scale)', 'norm': 'Normalized Weight (log scale)'},
    template='plotly_white',
    height=500
)

# Add trendline
fig.update_traces(marker=dict(color='steelblue', size=3))

fig.write_html(f"{IMG_DIR}/{DATASET}_weight_correlation.html")
fig.show()

print(f"✓ Saved weight correlation plot")
print(f"Spearman correlation: {corr:.3f}")

Sampling 50,000 connections for visualization


✓ Saved weight correlation plot
Spearman correlation: 0.499


### Degree Distribution

Analyze in-degree and out-degree distributions with different synapse count thresholds.

In [15]:
# Calculate in-degree and out-degree with two thresholds
degree_data = []

# Threshold 1: count > 1
in_degree_1 = edgelist[edgelist['count'] > 1].groupby('post').size().reset_index()
in_degree_1.columns = ['neuron', 'degree']
in_degree_1['type'] = 'In-degree'
in_degree_1['threshold'] = '>1 synapse'

out_degree_1 = edgelist[edgelist['count'] > 1].groupby('pre').size().reset_index()
out_degree_1.columns = ['neuron', 'degree']
out_degree_1['type'] = 'Out-degree'
out_degree_1['threshold'] = '>1 synapse'

# Threshold 2: count > 10
in_degree_10 = edgelist[edgelist['count'] > 10].groupby('post').size().reset_index()
in_degree_10.columns = ['neuron', 'degree']
in_degree_10['type'] = 'In-degree'
in_degree_10['threshold'] = '>10 synapses'

out_degree_10 = edgelist[edgelist['count'] > 10].groupby('pre').size().reset_index()
out_degree_10.columns = ['neuron', 'degree']
out_degree_10['type'] = 'Out-degree'
out_degree_10['threshold'] = '>10 synapses'

# Combine all
degree_data = pd.concat([in_degree_1, out_degree_1, in_degree_10, out_degree_10], ignore_index=True)

# Create density plot using histogram
fig = make_subplots(rows=1, cols=2, subplot_titles=('>1 synapse', '>10 synapses'))

for i, threshold in enumerate(['>1 synapse', '>10 synapses']):
    subset = degree_data[degree_data['threshold'] == threshold]
    
    for dtype, color in [('In-degree', 'orange'), ('Out-degree', 'blue')]:
        data = subset[subset['type'] == dtype]['degree']
        
        fig.add_trace(
            go.Histogram(
                x=np.log10(data),
                name=dtype,
                marker_color=color,
                opacity=0.6,
                nbinsx=50,
                histnorm='probability density',
                showlegend=(i == 0)
            ),
            row=1, col=i+1
        )

fig.update_xaxes(title_text="Degree (log10 scale)", row=1, col=1)
fig.update_xaxes(title_text="Degree (log10 scale)", row=1, col=2)
fig.update_yaxes(title_text="Density", row=1, col=1)

fig.update_layout(
    title_text='Degree Distribution by Synapse Count Threshold',
    height=500,
    template='plotly_white',
    barmode='overlay'
)

fig.write_html(f"{IMG_DIR}/{DATASET}_degree_distribution.html")
fig.show()

print(f"✓ Saved degree distribution plot")

✓ Saved degree distribution plot


## Connectivity Matrix

One effective way to visualize connectivity is through connectivity matrices, with `pre` on rows and `post` on columns.

Since we have many neurons, we'll collapse our edgelist by `cell_type`.

In [16]:
# Add cell_class to edgelist
cell_class_map = dict(zip(meta[DATASET_ID], meta['cell_class']))
edgelist['pre_class'] = edgelist['pre'].map(cell_class_map)
edgelist['post_class'] = edgelist['post'].map(cell_class_map)

# Aggregate by cell_class (sum weights)
class_connectivity = edgelist.groupby(['pre_class', 'post_class'])['weight'].sum().reset_index()

# Find most connected cell classes
top_classes_pre = class_connectivity.groupby('pre_class')['weight'].sum().nlargest(20).index
top_classes_post = class_connectivity.groupby('post_class')['weight'].sum().nlargest(20).index
top_classes = list(set(top_classes_pre) | set(top_classes_post))

# Filter to top classes
class_conn_filtered = class_connectivity[
    class_connectivity['pre_class'].isin(top_classes) &
    class_connectivity['post_class'].isin(top_classes)
]

# Create pivot table for heatmap
conn_matrix = class_conn_filtered.pivot(index='pre_class', columns='post_class', values='weight').fillna(0)

# Create heatmap
fig = go.Figure(data=go.Heatmap(
    z=conn_matrix.values,
    x=conn_matrix.columns,
    y=conn_matrix.index,
    colorscale='Viridis',
    colorbar=dict(title='Total Weight')
))

fig.update_layout(
    title=f'Connectivity Matrix: Top {len(top_classes)} Cell Classes',
    xaxis_title='Postsynaptic Cell Class',
    yaxis_title='Presynaptic Cell Class',
    height=800,
    width=900,
    template='plotly_white'
)

fig.update_xaxes(tickangle=-45)

fig.write_html(f"{IMG_DIR}/{DATASET}_connectivity_matrix.html")
fig.show()

print(f"✓ Saved connectivity matrix heatmap")
print(f"Matrix dimensions: {conn_matrix.shape}")

✓ Saved connectivity matrix heatmap
Matrix dimensions: (19, 19)


## Sensory and Effector Connectivity

In general, sensory neurons (`flow == "afferent"`) and effector neurons (`flow == "efferent"`) are quite interpretable because their cell_class labels often indicate body part innervation.

Let's re-collapse our edgelist using `cell_class` for sensory/effector neurons and `cell_type` for everything else.

In [17]:
# Create edgelist with mixed labels (cell_class for sensory/effector, cell_type for others)
# OPTIMIZED: Use vectorized operations instead of apply()

# Create mapping dictionaries from meta
id_to_flow = dict(zip(meta[DATASET_ID], meta['flow']))
id_to_class = dict(zip(meta[DATASET_ID], meta['cell_class']))
id_to_type = dict(zip(meta[DATASET_ID], meta['cell_type']))


# Map flow for pre and post
edgelist['pre_flow'] = edgelist['pre'].map(id_to_flow)
edgelist['post_flow'] = edgelist['post'].map(id_to_flow)

# Create labels: use cell_class for sensory/effector, cell_type for others
def create_label(neuron_id, flow):
    """Vectorized label creation function"""  
    if pd.isna(neuron_id) or pd.isna(flow):
        return None
    if flow in ['afferent', 'efferent']:
        return id_to_class.get(neuron_id, id_to_type.get(neuron_id))
    else:
        return id_to_type.get(neuron_id)

# Apply vectorized (much faster than row-wise apply)
import numpy as np
edgelist['pre_label'] = np.vectorize(create_label)(edgelist['pre'].values, edgelist['pre_flow'].values)
edgelist['post_label'] = np.vectorize(create_label)(edgelist['post'].values, edgelist['post_flow'].values)

# Filter out rows with missing labels
edgelist_mixed = edgelist[edgelist['pre_label'].notna() & edgelist['post_label'].notna()].copy()

print(f"✓ Created mixed labels edgelist with {len(edgelist_mixed):,} connections")


✓ Created mixed labels edgelist with 348,856 connections


### Sensory Neuron Outputs

Which cell types receive strong input from sensory neurons (≥100 synapses)?

In [18]:
# Calculate total inputs per target from sensory neurons
sensory_post_totals = edgelist_mixed[edgelist_mixed['pre_flow'] == 'afferent'].groupby('post_label')['count'].sum()

# Get sensory outputs (≥100 synapses)
sensory_outputs = edgelist_mixed[edgelist_mixed['pre_flow'] == 'afferent'].groupby(
    ['pre_label', 'post_label']
)['count'].sum().reset_index()
sensory_outputs.columns = ['pre_label', 'post_label', 'total_count']

# Add total counts and normalize
sensory_outputs = sensory_outputs.merge(
    sensory_post_totals.rename('post_total_count'),
    left_on='post_label',
    right_index=True
)
sensory_outputs['norm'] = sensory_outputs['total_count'] / sensory_outputs['post_total_count']

# Filter to ≥100 synapses
sensory_outputs = sensory_outputs[sensory_outputs['total_count'] >= 100]

if len(sensory_outputs) > 0:
    print(f"Found {len(sensory_outputs)} sensory connections ≥100 synapses")
    print(f"Sensory neuron types: {sensory_outputs['pre_label'].nunique()}")
    print(f"Target neuron types: {sensory_outputs['post_label'].nunique()}")
    
    # Create matrix for heatmap
    sensory_matrix = sensory_outputs.pivot(
        index='pre_label',
        columns='post_label',
        values='norm'
    ).fillna(0)
    
    print(f"Matrix dimensions: {sensory_matrix.shape[0]} x {sensory_matrix.shape[1]}")
    
    # Perform hierarchical clustering for ordering
    from scipy.cluster.hierarchy import linkage, dendrogram
    from scipy.spatial.distance import pdist
    
    # Cluster rows and columns
    if sensory_matrix.shape[0] > 1:
        row_linkage = linkage(pdist(sensory_matrix.values, metric='euclidean'), method='ward')
        row_order = dendrogram(row_linkage, no_plot=True)['leaves']
    else:
        row_order = [0]
    
    if sensory_matrix.shape[1] > 1:
        col_linkage = linkage(pdist(sensory_matrix.T.values, metric='euclidean'), method='ward')
        col_order = dendrogram(col_linkage, no_plot=True)['leaves']
    else:
        col_order = [0]
    
    # Reorder matrix
    sensory_matrix_ordered = sensory_matrix.iloc[row_order, col_order]
    
    # Create static heatmap with matplotlib/seaborn
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    fig, ax = plt.subplots(figsize=(10, 10))
    sns.heatmap(
        sensory_matrix_ordered,
        cmap='RdYlBu_r',  # Cold to hot
        xticklabels=False,
        yticklabels=False,
        cbar_kws={'label': 'Normalized Weight'},
        ax=ax
    )
    ax.set_title('Sensory Neuron Outputs (≥100 synapses)', fontsize=14, fontweight='bold')
    ax.set_xlabel('Target Neuron Type', fontsize=12)
    ax.set_ylabel('Sensory Neuron Type', fontsize=12)
    plt.tight_layout()
    plt.savefig(f'{IMG_DIR}/{DATASET}_sensory_outputs.png', dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"\u2713 Saved static heatmap: {IMG_DIR}/{DATASET}_sensory_outputs.png")
    
    # Create interactive heatmap with plotly
    fig = go.Figure(data=go.Heatmap(
        z=sensory_matrix_ordered.values,
        x=sensory_matrix_ordered.columns,
        y=sensory_matrix_ordered.index,
        colorscale='Viridis',
        colorbar=dict(title='Normalized<br>Weight'),
        hovertemplate='Sensory: %{y}<br>Target: %{x}<br>Weight: %{z:.3f}<extra></extra>'
    ))
    
    fig.update_layout(
        title='Sensory Neuron Outputs (≥100 synapses)',
        xaxis_title='Target Neuron Type',
        yaxis_title='Sensory Neuron Type',
        xaxis=dict(showticklabels=False),
        yaxis=dict(showticklabels=False),
        height=800,
        width=900,
        template='plotly_white'
    )
    
    fig.show()
    print(f"\u2713 Displayed interactive heatmap")
else:
    print("No sensory neurons with ≥100 synapses found in this dataset.")

Found 208 sensory connections ≥100 synapses
Sensory neuron types: 8
Target neuron types: 191
Matrix dimensions: 8 x 191


✓ Saved static heatmap: images/tutorial_03/banc_746_sensory_outputs.png


✓ Displayed interactive heatmap


### Effector Neuron Inputs

Which cell types provide strong input to effector neurons (≥100 synapses)?

We'll use signed weights here to show excitatory (+) vs inhibitory (-) connections.

In [19]:
# Calculate total inputs per effector neuron type
effector_post_totals = edgelist_mixed[edgelist_mixed['post_flow'] == 'efferent'].groupby('post_label')['count'].sum()

# Get effector inputs (≥100 synapses) with signed weights
effector_inputs = edgelist_mixed[edgelist_mixed['post_flow'] == 'efferent'].groupby(
    ['pre_label', 'post_label']
).agg({
    'count': 'sum',
    'weight_signed': 'sum'
}).reset_index()
effector_inputs.columns = ['pre_label', 'post_label', 'total_count', 'total_signed_count']

# Add total counts and normalize (signed)
effector_inputs = effector_inputs.merge(
    effector_post_totals.rename('post_total_count'),
    left_on='post_label',
    right_index=True
)
effector_inputs['signed_norm'] = effector_inputs['total_signed_count'] / effector_inputs['post_total_count']

# Filter to ≥100 synapses
effector_inputs = effector_inputs[effector_inputs['total_count'] >= 100]

if len(effector_inputs) > 0:
    print(f"Found {len(effector_inputs)} effector input connections ≥100 synapses")
    print(f"Input neuron types: {effector_inputs['pre_label'].nunique()}")
    print(f"Effector neuron types: {effector_inputs['post_label'].nunique()}")
    
    # Create matrix for heatmap
    effector_matrix = effector_inputs.pivot(
        index='pre_label',
        columns='post_label',
        values='signed_norm'
    ).fillna(0)
    
    print(f"Matrix dimensions: {effector_matrix.shape[0]} x {effector_matrix.shape[1]}")
    
    # Perform hierarchical clustering on absolute values for ordering
    if effector_matrix.shape[0] > 1:
        row_linkage = linkage(pdist(np.abs(effector_matrix.values), metric='euclidean'), method='ward')
        row_order = dendrogram(row_linkage, no_plot=True)['leaves']
    else:
        row_order = [0]
    
    if effector_matrix.shape[1] > 1:
        col_linkage = linkage(pdist(np.abs(effector_matrix.T.values), metric='euclidean'), method='ward')
        col_order = dendrogram(col_linkage, no_plot=True)['leaves']
    else:
        col_order = [0]
    
    # Reorder matrix
    effector_matrix_ordered = effector_matrix.iloc[row_order, col_order]
    
    # Cap color scale at 95th percentile for better visibility
    # Separate percentiles for positive and negative values to handle asymmetry
    positive_vals = effector_matrix_ordered.values[effector_matrix_ordered.values > 0]
    negative_vals = effector_matrix_ordered.values[effector_matrix_ordered.values < 0]
    
    if len(positive_vals) > 0:
        p95_pos = np.percentile(positive_vals, 95)
    else:
        p95_pos = 0
    
    if len(negative_vals) > 0:
        p95_neg = np.percentile(np.abs(negative_vals), 95)
    else:
        p95_neg = 0
    
    # Use maximum of 95th percentiles for symmetric color scale
    max_abs_val = max(p95_pos, p95_neg)
    
    print(f"Color scale range: [{-max_abs_val:.4f}, {max_abs_val:.4f}]")
    
    # Create static heatmap with matplotlib/seaborn (diverging colormap)
    fig, ax = plt.subplots(figsize=(10, 10))
    sns.heatmap(
        effector_matrix_ordered,
        cmap='RdBu_r',  # Diverging: red (positive/excitatory) to blue (negative/inhibitory)
        center=0,
        vmin=-max_abs_val,
        vmax=max_abs_val,
        xticklabels=False,
        yticklabels=False,
        cbar_kws={'label': 'Signed Weight'},
        ax=ax
    )
    ax.set_title('Signed Effector Neuron Inputs (≥100 synapses)', fontsize=14, fontweight='bold')
    ax.set_xlabel('Effector Neuron Type', fontsize=12)
    ax.set_ylabel('Input Neuron Type', fontsize=12)
    plt.tight_layout()
    plt.savefig(f'{IMG_DIR}/{DATASET}_effector_inputs.png', dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"\u2713 Saved static heatmap: {IMG_DIR}/{DATASET}_effector_inputs.png")
    
    # Create interactive heatmap with plotly (red-white-blue diverging colormap)
    fig = go.Figure(data=go.Heatmap(
        z=effector_matrix_ordered.values,
        x=effector_matrix_ordered.columns,
        y=effector_matrix_ordered.index,
        colorscale='RdBu',  # Red (positive) - White (0) - Blue (negative)
        zmid=0,  # Center white at zero
        zmin=-max_abs_val,
        zmax=max_abs_val,
        colorbar=dict(title='Signed<br>Weight'),
        hovertemplate='Input: %{y}<br>Effector: %{x}<br>Weight: %{z:.3f}<extra></extra>'
    ))
    
    fig.update_layout(
        title='Signed Effector Neuron Inputs (≥100 synapses)',
        xaxis_title='Effector Neuron Type',
        yaxis_title='Input Neuron Type',
        xaxis=dict(showticklabels=False),
        yaxis=dict(showticklabels=False),
        height=800,
        width=900,
        template='plotly_white'
    )
    
    fig.show()
    print(f"\u2713 Displayed interactive heatmap")
else:
    print("No effector neurons with ≥100 input synapses found in this dataset.")

Found 367 effector input connections ≥100 synapses
Input neuron types: 358
Effector neuron types: 3
Matrix dimensions: 358 x 3
Color scale range: [-0.0000, 0.0000]


✓ Saved static heatmap: images/tutorial_03/banc_746_effector_inputs.png


✓ Displayed interactive heatmap


# Your Turn: New Subset

Now try this analysis yourself with a different dataset!

**Exercise:** Switch the pre-prepared subset at the top of the notebook:
- Try `antennal_lobe` to explore olfactory circuits
- Try `front_leg` to examine leg sensorimotor circuits
- Try `mushroom_body` for learning and memory circuits

Simply change the `SUBSET_NAME` variable in the configuration cell and re-run the notebook!

---

# Extensions

In the extended code blocks below, you can learn how to identify connectivity-based clusters and visualise cluster morphologies. These analyses involve longer compute times but reveal functional groupings based on connectivity patterns.

## UMAP Dimensionality Reduction

Use UMAP to visualize connectivity patterns in 2D space based on cosine similarity.

In [20]:
# Filter neurons with sufficient connectivity (>=10 connections)
conn_counts = pd.concat([
    edgelist.groupby('pre').size().rename('n_out'),
    edgelist.groupby('post').size().rename('n_in')
], axis=1).fillna(0)
conn_counts['total'] = conn_counts['n_out'] + conn_counts['n_in']

neurons_to_use = conn_counts[conn_counts['total'] >= 10].index.tolist()
print(f"Using {len(neurons_to_use):,} neurons with ≥10 connections")

# Create input/output connectivity matrix
# Each neuron has both input and output partners
edgelist_filtered = edgelist[
    (edgelist['pre'].isin(neurons_to_use)) | 
    (edgelist['post'].isin(neurons_to_use))
].copy()

# Replace zeros with small value
edgelist_filtered['norm'] = edgelist_filtered['norm'].replace(0, 0.001)

# Prepare connectivity data (inputs and outputs as separate features)
conn_list = []

# Outputs: neuron -> partner
outputs = edgelist_filtered[edgelist_filtered['pre'].isin(neurons_to_use)][['pre', 'post', 'norm']].copy()
outputs.columns = ['neuron', 'partner', 'weight']
outputs['partner'] = 'output_' + outputs['partner']
conn_list.append(outputs)

# Inputs: partner -> neuron
inputs = edgelist_filtered[edgelist_filtered['post'].isin(neurons_to_use)][['post', 'pre', 'norm']].copy()
inputs.columns = ['neuron', 'partner', 'weight']
inputs['partner'] = 'input_' + inputs['partner']
conn_list.append(inputs)

# Combine
conn_data = pd.concat(conn_list, ignore_index=True)

# Create sparse matrix
from scipy.sparse import csr_matrix

# Create indices
neuron_idx = {n: i for i, n in enumerate(neurons_to_use)}
partner_idx = {p: i for i, p in enumerate(conn_data['partner'].unique())}

rows = conn_data['neuron'].map(neuron_idx).values
cols = conn_data['partner'].map(partner_idx).values
data = conn_data['weight'].values

connectivity_matrix = csr_matrix(
    (data, (rows, cols)),
    shape=(len(neurons_to_use), len(partner_idx))
)

print(f"Connectivity matrix: {connectivity_matrix.shape}")
print(f"Sparsity: {100 * (1 - connectivity_matrix.nnz / (connectivity_matrix.shape[0] * connectivity_matrix.shape[1])):.2f}%")

# Calculate cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity_matrix = cosine_similarity(connectivity_matrix)
similarity_matrix[np.isnan(similarity_matrix)] = 0
similarity_matrix[np.isinf(similarity_matrix)] = 0

print(f"✓ Similarity matrix: {similarity_matrix.shape}")

# Convert to distance for UMAP
distance_matrix = 1 - similarity_matrix
distance_matrix = np.clip(distance_matrix, 0, 2)  # Ensure valid distances

# Run UMAP
reducer = umap.UMAP(
    n_neighbors=15,
    min_dist=0.1,
    n_components=2,
    metric='precomputed',
    random_state=42
)

umap_embedding = reducer.fit_transform(distance_matrix)
print(f"✓ UMAP complete")

# Create DataFrame with UMAP coordinates and metadata
umap_df = pd.DataFrame({
    'neuron_id': neurons_to_use,
    'UMAP1': umap_embedding[:, 0],
    'UMAP2': umap_embedding[:, 1]
})

# Add metadata
meta_subset = meta.set_index(DATASET_ID).loc[neurons_to_use]
umap_df = umap_df.merge(
    meta_subset[['cell_type', 'super_class', 'cell_class', 'flow']].reset_index(),
    left_on='neuron_id',
    right_on=DATASET_ID,
    how='left'
)

print(f"✓ Created UMAP DataFrame with {len(umap_df):,} neurons")

Using 3,532 neurons with ≥10 connections


Connectivity matrix: (3532, 7060)
Sparsity: 97.20%
✓ Similarity matrix: (3532, 3532)



using precomputed metric; inverse_transform will be unavailable


n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.


✓ UMAP complete
✓ Created UMAP DataFrame with 3,532 neurons


### Visualize UMAP by Super Class

In [21]:
# Plot UMAP colored by super_class
fig = px.scatter(
    umap_df,
    x='UMAP1',
    y='UMAP2',
    color='super_class',
    hover_data=['neuron_id', 'cell_type'],
    title='UMAP of Connectivity Patterns<br><sub>Colored by Super Class</sub>',
    template='plotly_white',
    height=600,
    width=800
)

fig.update_traces(marker=dict(size=4, opacity=0.6))
fig.write_html(f"{IMG_DIR}/{DATASET}_umap_super_class.html")
fig.show()

print(f"✓ Saved UMAP super_class plot")

✓ Saved UMAP super_class plot


## Connectivity Clusters

There are many ways to cluster nodes by connectivity. Here we use a simple but effective method: hierarchical clustering based on connectivity similarity.

### Hierarchical Clustering

Cluster neurons based on their connectivity patterns using Ward's method.

In [22]:
# Perform hierarchical clustering

# Use distance matrix
condensed_distance = squareform(distance_matrix, checks=False)

# Hierarchical clustering
linkage_matrix = linkage(condensed_distance, method='ward')

# Cut tree to get clusters (e.g., 12 clusters)
n_clusters = 12
clusters = cut_tree(linkage_matrix, n_clusters=n_clusters).flatten()

umap_df['cluster'] = clusters + 1  # 1-indexed for readability

print(f"✓ Created {n_clusters} clusters")
print(f"\nCluster sizes:")
print(umap_df['cluster'].value_counts().sort_index())

✓ Created 12 clusters

Cluster sizes:
cluster
1      623
2       34
3      155
4     1420
5       84
6      267
7      143
8      498
9       80
10      70
11     114
12      44
Name: count, dtype: int64


### Visualize UMAP by Cluster

In [23]:
# Calculate cluster centroids for labeling
cluster_centroids = umap_df.groupby('cluster').agg({
    'UMAP1': 'mean',
    'UMAP2': 'mean'
}).reset_index()

# Create color palette
import plotly.colors as pc
cluster_colors = pc.sample_colorscale('Viridis', [i/(n_clusters-1) for i in range(n_clusters)])

# Plot UMAP colored by cluster
fig = px.scatter(
    umap_df,
    x='UMAP1',
    y='UMAP2',
    color='cluster',
    hover_data=['neuron_id', 'cell_type'],
    title=f'Connectivity-Based Clusters<br><sub>{n_clusters} clusters identified by hierarchical clustering</sub>',
    template='plotly_white',
    height=600,
    width=800,
    color_continuous_scale='Viridis'
)

fig.update_traces(marker=dict(size=5, opacity=0.7))

# Add cluster labels
for _, row in cluster_centroids.iterrows():
    fig.add_annotation(
        x=row['UMAP1'],
        y=row['UMAP2'],
        text=str(int(row['cluster'])),
        showarrow=False,
        font=dict(size=14, color='black'),
        bgcolor='white',
        opacity=0.8
    )

fig.write_html(f"{IMG_DIR}/{DATASET}_umap_clusters.html")
fig.show()

print(f"✓ Saved UMAP clusters plot")

✓ Saved UMAP clusters plot


### Cluster Composition Analysis

Examine what cell types are present in each cluster.

In [24]:
# Count super_classes within each cluster
cluster_composition = umap_df.groupby(['cluster', 'super_class']).size().reset_index(name='count')

# Get top 5 super_classes per cluster
top_per_cluster = cluster_composition.sort_values(['cluster', 'count'], ascending=[True, False])
top_per_cluster = top_per_cluster.groupby('cluster').head(5)

# Create stacked bar chart
fig = px.bar(
    top_per_cluster,
    x='cluster',
    y='count',
    color='super_class',
    title='Cluster Composition by Super Class<br><sub>Top 5 super classes per cluster</sub>',
    labels={'cluster': 'Cluster', 'count': 'Number of Neurons'},
    template='plotly_white',
    height=500,
    width=900
)

fig.write_html(f"{IMG_DIR}/{DATASET}_cluster_composition.html")
fig.show()

print(f"✓ Saved cluster composition plot")

# Print detailed composition for first 3 clusters
print("\nDetailed composition for clusters 1-3:")
for cluster_id in range(1, 4):
    cluster_data = cluster_composition[cluster_composition['cluster'] == cluster_id].sort_values('count', ascending=False)
    print(f"\nCluster {cluster_id}:")
    print(cluster_data.head(5).to_string(index=False))

✓ Saved cluster composition plot

Detailed composition for clusters 1-3:

Cluster 1:
 cluster                  super_class  count
       1 ventral_nerve_cord_intrinsic    516
       1                        motor     58
       1                    ascending     24
       1                      sensory     11
       1                   descending     10

Cluster 2:
 cluster super_class  count
       2     sensory     34

Cluster 3:
 cluster                  super_class  count
       3                      sensory    147
       3 ventral_nerve_cord_intrinsic      7
       3                    ascending      1


### Sensory and Effector Neurons

Highlight sensory (afferent) and effector (efferent) neurons in the UMAP space to understand their distribution across connectivity clusters.

In [25]:
# Identify sensory and effector neurons
umap_df['is_sensory'] = umap_df['flow'] == 'afferent'
umap_df['is_effector'] = umap_df['flow'] == 'efferent'

# Create display categories
umap_df['neuron_type'] = 'Other'
umap_df.loc[umap_df['is_sensory'], 'neuron_type'] = 'Sensory (afferent)'
umap_df.loc[umap_df['is_effector'], 'neuron_type'] = 'Effector (efferent)'

# Create plot with three layers
fig = go.Figure()

# Plot "Other" neurons first (grey, low opacity)
other_df = umap_df[umap_df['neuron_type'] == 'Other']
fig.add_trace(go.Scatter(
    x=other_df['UMAP1'],
    y=other_df['UMAP2'],
    mode='markers',
    name='Other',
    marker=dict(color='grey', size=4, opacity=0.3),
    hovertext=other_df['cell_type'],
    hoverinfo='text'
))

# Plot sensory neurons (circles)
sensory_df = umap_df[umap_df['is_sensory']]
if len(sensory_df) > 0:
    fig.add_trace(go.Scatter(
        x=sensory_df['UMAP1'],
        y=sensory_df['UMAP2'],
        mode='markers',
        name='Sensory (afferent)',
        marker=dict(color='red', size=6, opacity=0.8, symbol='circle'),
        hovertext=sensory_df['cell_type'],
        hoverinfo='text'
    ))

# Plot effector neurons (squares)
effector_df = umap_df[umap_df['is_effector']]
if len(effector_df) > 0:
    fig.add_trace(go.Scatter(
        x=effector_df['UMAP1'],
        y=effector_df['UMAP2'],
        mode='markers',
        name='Effector (efferent)',
        marker=dict(color='blue', size=6, opacity=0.8, symbol='square'),
        hovertext=effector_df['cell_type'],
        hoverinfo='text'
    ))

fig.update_layout(
    title='UMAP: Sensory and Effector Neurons<br><sub>Sensory = red circles, Effector = blue squares</sub>',
    xaxis_title='UMAP1',
    yaxis_title='UMAP2',
    template='plotly_white',
    height=600,
    width=800
)

fig.write_html(f"{IMG_DIR}/{DATASET}_umap_sensory_effector.html")
fig.show()

print(f"✓ Saved sensory/effector UMAP plot")
print(f"Sensory neurons: {umap_df['is_sensory'].sum():,}")
print(f"Effector neurons: {umap_df['is_effector'].sum():,}")

✓ Saved sensory/effector UMAP plot
Sensory neurons: 834
Effector neurons: 84


## Summary

In this tutorial, we covered comprehensive connectivity analysis methods for the BANC dataset:

### Core Analyses

1. **Loading connectivity data** - Working with edgelists and meta data
2. **Neurotransmitter prediction** - Classifying connections as excitatory/inhibitory
3. **Basic network statistics** - Degree distributions, weight correlations
4. **Network visualization** - Graph plots by super_class
5. **Connectivity matrices** - Heatmaps of cell type connectivity
6. **Sensory-effector analysis** - Interpretable input-output patterns

### Extended Analyses

7. **UMAP dimensionality reduction** - Projecting connectivity patterns into 2D space
8. **Hierarchical clustering** - Grouping neurons by connectivity similarity
9. **Cluster composition** - Understanding what cell types form each cluster

### Key Takeaways

- **Edgelists** describe directed, weighted graphs of neural connectivity
- **Signed connectivity** provides insights into excitatory/inhibitory balance, but should be interpreted cautiously
- **Cosine similarity** is effective for comparing connectivity patterns
- **UMAP** reveals hidden structure in high-dimensional connectivity data
- **Sensory and effector neurons** provide interpretable anchors for circuit analysis
- **Hierarchical clustering** can identify functional groups based purely on connectivity

### Files Generated

All plots have been saved to `images/tutorial_03/` as interactive HTML and static PNG files for publication.

### Next Steps

- Compare connectivity patterns across different datasets (FAFB, MANC, hemibrain)
- Analyze specific pathways or circuits of interest
- Investigate connection specificity and network motifs
- Combine morphological and connectivity features for multimodal analysis

---

**Tutorial Complete!** 

You now have a comprehensive pipeline for analyzing fly connectome connectivity data in Python.