# Tutorial 04: Indirect Connectivity and Influence

**Author:** Alexander Bates  
**Date:** 2025-12-15

## Introduction

This tutorial introduces the **influence metric**, a measure of indirect connectivity developed and used in the [BANC paper](https://www.biorxiv.org/content/10.1101/2024.12.28.630584v1) (Eckstein et al., 2025).

### What is Influence?

While direct synaptic connections tell us which neurons are connected, they don't capture the full picture of how signals propagate through neural circuits. The influence metric quantifies how strongly a neuron or group of neurons can affect downstream targets through both direct and indirect pathways.

### How It Works

The InfluenceCalculator package implements a linear dynamical model of neural signal propagation:

**Model equation:** τ dr(t)/dt = (W - I)r(t) + s(t)

Where:
- **r(t)** = neural activity vector
- **W** = connectivity matrix (scaled by synapse counts)
- **s(t)** = stimulation input to seed neurons
- **τ** = time constant

At steady state, the influence score equals:

**r∞ = -(W̃ - I)⁻¹s**

Where W̃ is rescaled to ensure network stability. Results are log-transformed with a constant (+24) to produce "adjusted influence" scores above zero.

### Key Advantages

1. **Captures indirect effects**: Quantifies multi-synaptic pathways
2. **Accounts for network structure**: Considers convergent and divergent connections
3. **Computationally efficient**: Uses sparse matrix decomposition with caching for repeated calculations
4. **Biologically validated**: Correlates with optogenetic activation experiments

**Currently working with dataset:** banc_746

## Setup and Load Data

In [None]:
# Import required packages
import pandas as pd
import numpy as np
import pyarrow.feather as feather
import gcsfs
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import umap
from scipy.cluster.hierarchy import linkage, dendrogram
from InfluenceCalculator import InfluenceCalculator
from joblib import Parallel, delayed
import multiprocessing
import warnings
warnings.filterwarnings('ignore')

# Set up parallelization
n_cores = max(1, multiprocessing.cpu_count() - 1)
print(f"✓ All packages imported successfully")
print(f"Using {n_cores} cores for parallel processing")

In [None]:
# ConfigurationDATASET = "banc_746"DATASET_ID = "banc_746_id"DATA_PATH = "gs://sjcabs_2025_data"USE_GCS = DATA_PATH.startswith("gs://")# Setup image output directoryimport osIMG_DIR = "images/tutorial_04"os.makedirs(IMG_DIR, exist_ok=True)print(f"Working with dataset: {DATASET}")print(f"Data location: {DATA_PATH}")

In [None]:
# Helper functions
def construct_path(data_root, dataset, file_type="meta"):
    """Construct file paths for dataset files."""
    dataset_name = dataset.split("_")[0]
    
    file_mappings = {
        "meta": f"{dataset}_meta.feather",
        "edgelist": f"{dataset}_edgelist.feather",
        "edgelist_simple": f"{dataset}_simple_edgelist.feather",
        "synapses": f"{dataset}_synapses.parquet"
    }
    
    if file_type not in file_mappings:
        raise ValueError(f"Unknown file_type: {file_type}")
    
    filename = file_mappings[file_type]
    full_path = f"{data_root}/{dataset_name}/{filename}"
    
    return full_path

def read_feather_gcs(path, gcs_fs=None):
    """Read Feather file from GCS or local path."""
    if path.startswith("gs://"):
        if gcs_fs is None:
            raise ValueError("gcs_fs required for GCS paths")
        
        print(f"Reading from GCS: {path}")
        gcs_path = path.replace("gs://", "")
        
        with gcs_fs.open(gcs_path, 'rb') as f:
            df = feather.read_feather(f)
        
        print(f"✓ Loaded {len(df):,} rows")
        return df
    else:
        print(f"Reading from local path: {path}")
        df = pd.read_feather(path)
        print(f"✓ Loaded {len(df):,} rows")
        return df

print("✓ Helper functions defined")

In [None]:
# Setup GCS access if needed
if USE_GCS:
    print("Setting up Google Cloud Storage access...")
    gcs = gcsfs.GCSFileSystem(token='google_default')
    print("✓ GCS filesystem initialized")
else:
    gcs = None
    print("Using local filesystem")

In [None]:
# Load metadata
meta_path = construct_path(DATA_PATH, DATASET, "meta")
print(f"Loading metadata from: {meta_path}")
meta = read_feather_gcs(meta_path, gcs_fs=gcs)
print(f"✓ Loaded {len(meta):,} neurons")
print(f"\nMetadata columns: {list(meta.columns)}")
meta.head(3)

In [None]:
# Load edgelist
edgelist_path = construct_path(DATA_PATH, DATASET, "edgelist_simple")
print(f"Loading edgelist from: {edgelist_path}")
edgelist_simple = read_feather_gcs(edgelist_path, gcs_fs=gcs)
print(f"✓ Loaded {len(edgelist_simple):,} connections")
print(f"\nEdgelist columns: {list(edgelist_simple.columns)}")
edgelist_simple.head(3)

## Filter Strong Connections

To speed up influence calculations, we filter out weak connections (fewer than 5 synapses):

In [None]:
# Filter for connections with at least 5 synapses
edgelist_filtered = edgelist_simple[edgelist_simple['count'] >= 5].copy()

print(f"Original connections: {len(edgelist_simple):,}")
print(f"After filtering (≥5 synapses): {len(edgelist_filtered):,}")
print(f"Retained: {100 * len(edgelist_filtered) / len(edgelist_simple):.1f}%")

## Example: Sensory Influence on Dopaminergic Neurons

Let's examine how sensory neurons influence mushroom body dopaminergic neurons. This is biologically relevant because:

- Dopaminergic neurons provide **teaching signals** for associative memory
- They are hypothesised to receive unconditioned sensory information
- **PAM** dopamine neurons are involved in appetitive (reward) learning
- **PPL1** dopamine neurons are involved in aversive (punishment) learning

### Define Source and Target Neurons

In [None]:
# Source: All sensory neurons (afferent flow)
sensory_neurons = meta[meta['flow'] == 'afferent'][['banc_746_id', 'cell_sub_class', 'cell_type']].drop_duplicates()

print(f"Found {len(sensory_neurons):,} sensory neurons")

# Get unique sensory sub-classes
sensory_sub_classes = sorted(sensory_neurons['cell_sub_class'].unique())

print(f"\nSensory sub-classes (n={len(sensory_sub_classes)}):")
print(", ".join(sensory_sub_classes[:10]))

# Target: All mushroom body dopaminergic neurons
mb_dopamine_neurons = meta[
    meta['cell_class'] == 'mushroom_body_dopaminergic_neuron'
][['banc_746_id', 'cell_sub_class', 'cell_type']].drop_duplicates()

print(f"\nFound {len(mb_dopamine_neurons):,} mushroom body dopamine neurons")

# Get unique MB dopamine types
mb_da_types = sorted(mb_dopamine_neurons['cell_type'].unique())

print(f"MB dopamine types (n={len(mb_da_types)}):")
print(", ".join(mb_da_types[:10]))

### Set Up Influence Calculator

In [None]:
print("Initializing influence calculator...")
print("This may take a few minutes for large networks...\n")

# Prepare data for InfluenceCalculator
# The package expects a SQLite database with:
# - 'edgelist_simple' table with columns: pre, post, count, norm, post_count
# - 'meta' table with column: root_id (plus any other metadata)

# Check edgelist column names and rename if needed
edgelist_cols = list(edgelist_filtered.columns)
print(f"Edgelist columns: {', '.join(edgelist_cols)}")

if 'pre' in edgelist_cols and 'post' in edgelist_cols:
    edgelist_for_ic = edgelist_filtered.copy()
else:
    # Need to rename columns
    pre_col = f"pre_{DATASET_ID}"
    post_col = f"post_{DATASET_ID}"
    
    edgelist_for_ic = edgelist_filtered.rename(columns={
        pre_col: 'pre',
        post_col: 'post'
    })

# Add post_count column if not present (required by InfluenceCalculator)
if 'post_count' not in edgelist_for_ic.columns:
    edgelist_for_ic['post_count'] = edgelist_for_ic['count'] / edgelist_for_ic['norm']

# Prepare metadata with root_id column
meta_for_ic = meta.rename(columns={DATASET_ID: 'root_id'})

# Convert ID columns to string for SQLite compatibility
meta_for_ic['root_id'] = meta_for_ic['root_id'].astype(str)
edgelist_for_ic['pre'] = edgelist_for_ic['pre'].astype(str)
edgelist_for_ic['post'] = edgelist_for_ic['post'].astype(str)

print(f"\n✓ Data prepared for influence calculator")
print(f"  Edgelist: {len(edgelist_for_ic):,} connections")
print(f"  Metadata: {len(meta_for_ic):,} neurons\n")

# Create temporary SQLite database
import sqlite3
import tempfile

temp_db = tempfile.NamedTemporaryFile(suffix='.sqlite', delete=False)
temp_db_path = temp_db.name
temp_db.close()

print(f"Creating temporary SQLite database: {temp_db_path}")

conn = sqlite3.connect(temp_db_path)

# Write tables to database
print("Writing edgelist to database...")
edgelist_for_ic.to_sql('edgelist_simple', conn, if_exists='replace', index=False)

print("Writing metadata to database...")
meta_for_ic.to_sql('meta', conn, if_exists='replace', index=False)

conn.close()

print("✓ Database created successfully\n")

# Initialize the influence calculator
# This uses the InfluenceCalculator Python package
print("Initializing calculator (this may take several minutes)...")

ic_dataset = InfluenceCalculator(
    filename=temp_db_path,
    signed=False,
    count_thresh=5
)

print("\n✓ Influence calculator initialized")
print("Network ready for influence calculations")

### Calculate Influence Scores

Now we calculate influence scores from each sensory sub-class to all MB dopaminergic neurons:

In [None]:
print(f"Calculating influence scores for {len(sensory_sub_classes)} sensory sub-classes...")
print(f"Using {n_cores} cores for parallel processing\n")
print("Note: This will take time - influence calculations involve matrix operations on the full network\n")

# Get MB dopamine neuron IDs for filtering
mb_dopamine_ids = set(mb_dopamine_neurons['banc_746_id'].astype(str).values)

# Define function to calculate influence for one sensory sub-class
def calculate_influence_for_subclass(i, sensory_sub_class):
    """Calculate influence from one sensory sub-class to MB dopamine neurons."""
    # Get IDs for this sensory sub-class (as strings to match database)
    sensory_ids = sensory_neurons[
        sensory_neurons['cell_sub_class'] == sensory_sub_class
    ]['banc_746_id'].astype(str).tolist()
    
    # Skip if no neurons found
    if len(sensory_ids) == 0:
        return None
    
    # Calculate influence from this sensory sub-class
    influence_df = ic_dataset.calculate_influence(
        seed_ids=sensory_ids,
        silenced_neurons=[]
    )
    
    # Ensure id is string type
    influence_df['id'] = influence_df['id'].astype(str)
    
    # Find the influence score column (may have different names)
    influence_col = [col for col in influence_df.columns if 'Influence_score' in col][0]
    
    # Add adjusted influence (log-transform with offset, floor at 0)
    adjusted_inf = np.log(influence_df[influence_col]) + 24
    adjusted_inf[adjusted_inf < 0] = 0
    influence_df['adjusted_influence'] = adjusted_inf
    
    # Filter to MB dopamine neurons and join with metadata
    influence_scores = influence_df[
        influence_df['id'].isin(mb_dopamine_ids)
    ].merge(
        meta[['banc_746_id', 'cell_sub_class', 'cell_type']].drop_duplicates().assign(
            banc_746_id=lambda x: x['banc_746_id'].astype(str)
        ),
        left_on='id',
        right_on='banc_746_id',
        how='left'
    ).rename(columns={
        'cell_sub_class': 'target_class',
        'cell_type': 'target_type'
    })
    
    influence_scores['source'] = sensory_sub_class
    
    return influence_scores

# Run calculations in parallel with progress reporting
from tqdm.auto import tqdm

print("Running parallel influence calculations...")
all_influence_scores_list = Parallel(n_jobs=n_cores, verbose=10)(
    delayed(calculate_influence_for_subclass)(i, sc) 
    for i, sc in enumerate(sensory_sub_classes)
)

# Remove None results (from empty sensory sub-classes)
all_influence_scores_list = [df for df in all_influence_scores_list if df is not None]

print("\n✓ Influence calculations complete\n")

# Combine all results
all_influence_scores = pd.concat(all_influence_scores_list, ignore_index=True)

print(f"Total influence scores calculated: {len(all_influence_scores):,}")

# Show sample of results
print("\nSample of influence scores:")
display_cols = ['source', 'id', 'adjusted_influence', 'target_type']
# Find the influence column name
influence_col = [col for col in all_influence_scores.columns if 'Influence_score' in col]
if influence_col:
    display_cols.insert(2, influence_col[0])

print(all_influence_scores[display_cols].head(10).to_string())

### Aggregate by Cell Type

In [None]:
# Find the influence column name dynamically
influence_col = [col for col in all_influence_scores.columns if 'Influence_score' in col]
if not influence_col:
    raise ValueError("No Influence_score column found in results")
influence_col = influence_col[0]

# Aggregate influence scores by source and target cell type
all_influence_scores_ct = all_influence_scores.groupby(
    ['source', 'target_type']
).agg({
    influence_col: 'sum',
    'adjusted_influence': 'sum',
    'id': 'count'
}).reset_index().rename(columns={
    'id': 'n_targets',
    influence_col: 'influence'
})

# Filter out missing values
all_influence_scores_ct = all_influence_scores_ct[
    all_influence_scores_ct['target_type'].notna() & 
    all_influence_scores_ct['source'].notna()
]

print(f"Aggregated to {len(all_influence_scores_ct):,} source-target type pairs")

# Show top influences
print("\nTop 10 sensory → dopamine influences:")
print(all_influence_scores_ct.nlargest(10, 'adjusted_influence')[[
    'source', 'target_type', 'adjusted_influence', 'n_targets'
]].to_string())

## Visualisation: Influence Heatmap

Let's create an interactive heatmap showing influence from sensory sub-classes to dopamine neuron types:

In [None]:
# Create a matrix for heatmap
influence_matrix = all_influence_scores_ct.pivot(
    index='source',
    columns='target_type',
    values='adjusted_influence'
).fillna(0)

print(f"Heatmap matrix dimensions: {influence_matrix.shape[0]} x {influence_matrix.shape[1]}\n")

# Perform hierarchical clustering
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import pdist

# Cluster rows (sources)
row_linkage = linkage(pdist(influence_matrix.values, metric='euclidean'), method='ward')
row_order = leaves_list(row_linkage)

# Cluster columns (targets)
col_linkage = linkage(pdist(influence_matrix.T.values, metric='euclidean'), method='ward')
col_order = leaves_list(col_linkage)

# Reorder matrix
influence_matrix_ordered = influence_matrix.iloc[row_order, col_order]

# Create interactive heatmap with plotly
fig = go.Figure(data=go.Heatmap(
    z=influence_matrix_ordered.values,
    x=influence_matrix_ordered.columns,
    y=influence_matrix_ordered.index,
    colorscale='YlOrRd',
    hovertemplate='Source: %{y}<br>Target: %{x}<br>Adjusted Influence: %{z:.2f}<extra></extra>'
))

fig.update_layout(
    title='Sensory Influence on MB Dopaminergic Neurons',
    xaxis_title='Target: MB Dopamine Neuron Type',
    yaxis_title='Source: Sensory Sub-Class',
    width=1200,
    height=1000,
    xaxis={'tickangle': -45, 'tickfont': {'size': 8}},
    yaxis={'tickfont': {'size': 8}}
)

# Save as HTML
fig.write_html(f"{IMG_DIR}/{DATASET}_influence_heatmap.html")

print("✓ Interactive heatmap saved")
fig.show()

## Visualisation: UMAP of Influence Patterns

We can also visualise the influence patterns using UMAP, where each point is a dopaminergic neuron:

In [None]:
# Aggregate influence by individual neuron
all_influence_scores_n = all_influence_scores.groupby(
    ['source', 'id']
).agg({
    'Influence_score_(unsigned)': 'sum',
    'adjusted_influence': 'sum',
    'target_type': 'first',
    'target_class': 'first'
}).reset_index().rename(columns={
    'Influence_score_(unsigned)': 'influence'
})

all_influence_scores_n = all_influence_scores_n[
    all_influence_scores_n['id'].notna() & 
    all_influence_scores_n['source'].notna()
]

# Create matrix: rows = neurons, columns = sensory sub-classes
influence_matrix_umap = all_influence_scores_n.pivot(
    index='id',
    columns='source',
    values='adjusted_influence'
).fillna(0)

print(f"UMAP input matrix: {influence_matrix_umap.shape[0]} neurons x {influence_matrix_umap.shape[1]} sensory sub-classes\n")

# Run UMAP
import umap.umap_ as umap_lib

reducer = umap_lib.UMAP(n_neighbors=15, min_dist=0.1, random_state=42)
umap_result = reducer.fit_transform(influence_matrix_umap.values)

# Create data frame for plotting
umap_df = pd.DataFrame({
    'id': influence_matrix_umap.index,
    'UMAP1': umap_result[:, 0],
    'UMAP2': umap_result[:, 1]
}).merge(
    meta[['banc_746_id', 'cell_type', 'cell_sub_class', 'cell_class']].drop_duplicates(),
    left_on='id',
    right_on='banc_746_id'
)

# Plot by cell sub-class
fig = px.scatter(
    umap_df,
    x='UMAP1',
    y='UMAP2',
    color='cell_sub_class',
    title=f'UMAP of MB Dopamine Neurons by Sensory Influence Patterns (n = {len(umap_df)})',
    labels={'cell_sub_class': 'Cell Sub-Class'},
    width=1200,
    height=600,
    template='plotly_white'
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.write_html(f"{IMG_DIR}/{DATASET}_influence_umap_subclass.html")
fig.show()

# Plot by cell type
fig = px.scatter(
    umap_df,
    x='UMAP1',
    y='UMAP2',
    color='cell_type',
    title=f'UMAP of MB Dopamine Neurons by Sensory Influence Patterns (n = {len(umap_df)})',
    labels={'cell_type': 'Cell Type'},
    width=1200,
    height=600,
    template='plotly_white'
)

fig.update_traces(marker=dict(size=8, opacity=0.7))
fig.write_html(f"{IMG_DIR}/{DATASET}_influence_umap_type.html")
fig.show()

## Key Insights

From this analysis, we can see:

1. **Diverse sensory influences**: MB dopaminergic neurons receive influence from many sensory modalities through both direct and indirect pathways
2. **Cell type specificity**: Different dopamine neuron types show distinct sensory influence patterns
3. **Indirect pathways**: Influence scores capture multi-synaptic signal propagation beyond direct connections

## Summary

In this tutorial, you learned how to:

1. ✓ Calculate indirect connectivity using the influence metric
2. ✓ Set up and use the InfluenceCalculator package
3. ✓ Analyse influence from sensory neurons to dopaminergic neurons
4. ✓ Visualise influence patterns with heatmaps and UMAP
5. ✓ Interpret biological significance of influence scores

The influence metric provides a powerful way to understand how signals propagate through neural circuits beyond direct synaptic connections.