# DRIADA overview

[**DRIADA**](https://driada.readthedocs.io) (Dimensionality Reduction for
Integrated Activity Data) is a Python framework for neural data analysis.
It bridges two perspectives that are usually treated separately: what
*individual* neurons encode, and how the *population as a whole* represents
information.  The typical analysis workflow looks like this:

| Step | Notebook | What it does |
|---|---|---|
| **Overview** | **00 -- this notebook** | Core data structures, quick tour of INTENSE, DR, networks |
| Load & inspect | [01 -- Data loading](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/01_data_loading_and_neurons.ipynb) | Wrap recording into `Experiment`, reconstruct spikes, assess quality |
| Single-neuron selectivity | [02 -- INTENSE](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/02_selectivity_detection_intense.ipynb) | Detect which neurons encode which behavioral variables |
| Population geometry | [03 -- DR](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/03_population_geometry_dr.ipynb) | Extract low-dimensional manifolds from population activity |
| Network analysis | [04 -- Networks](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/04_network_analysis.ipynb) | Build and analyze cell-cell interaction graphs |
| Putting it together | [05 -- Advanced](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/05_advanced_capabilities.ipynb) | Combine INTENSE + DR, leave-one-out importance, RSA, RNN analysis |

**What you will learn:**

1. **Loading data into an Experiment** -- wrap numpy arrays into a DRIADA [`Experiment`](https://driada.readthedocs.io/en/latest/api/experiment/core.html#driada.experiment.exp_base.Experiment).
2. **Feature types and TimeSeries** -- understand how DRIADA represents and auto-detects behavioral variables.
3. **Quick tour: selectivity, dimensionality reduction, networks** -- run INTENSE, project onto UMAP, and build a functional connectivity graph.

In [None]:
# TODO: revert to '!pip install -q driada' after v1.0.0 PyPI release
!pip install -q git+https://github.com/iabs-neuro/driada.git@main
%matplotlib inline

import os
import tempfile

import numpy as np
import matplotlib.pyplot as plt

from driada.experiment import (
    load_exp_from_aligned_data,
    save_exp_to_pickle,
    load_exp_from_pickle,
)
from driada.experiment.synthetic import (
    generate_pseudo_calcium_multisignal,
    generate_tuned_selectivity_exp,
)

## 1. Loading your data into DRIADA

You have numpy arrays from your recording pipeline (Suite2P, CaImAn,
DeepLabCut, etc.).  [`load_exp_from_aligned_data`](https://driada.readthedocs.io/en/latest/api/experiment/loading.html#driada.experiment.exp_build.load_exp_from_aligned_data)
wraps them into an **Experiment** object that
keeps neural activity and behavioral features aligned and annotated.

The data dict must contain one neural-data key -- any of `'calcium'`,
`'activations'`, `'neural_data'`, `'activity'`, or `'rates'` -- holding a
`(n_neurons, n_frames)` array.  Everything else you pass becomes a
**dynamic feature** (one value per timepoint).

Below we use DRIADA's
[`generate_pseudo_calcium_multisignal`](https://driada.readthedocs.io/en/latest/api/experiment/synthetic.html#driada.experiment.synthetic.core.generate_pseudo_calcium_multisignal)
to create realistic synthetic calcium traces with GCaMP-like dynamics
(transient events, exponential decay, baseline noise).  In your own work,
replace this with your actual recording data.

In [None]:
# In practice: calcium = np.load('your_recording.npz')['calcium']

n_neurons = 20
fps = 30.0
duration = 200.0  # seconds

calcium = generate_pseudo_calcium_multisignal(
    n=n_neurons,
    duration=duration,
    sampling_rate=fps,
    event_rate=0.15,
    amplitude_range=(0.5, 2.0),
    decay_time=1.5,
    rise_time=0.15,
    noise_std=0.05,
    kernel='double_exponential',
    seed=0,
)
n_frames = calcium.shape[1]

# Behavioral variables (one value per timepoint)
np.random.seed(0)
x_pos = np.cumsum(np.random.randn(n_frames) * 0.5)            # continuous
y_pos = np.cumsum(np.random.randn(n_frames) * 0.5)            # continuous
speed = np.abs(np.random.randn(n_frames)) * 5.0               # continuous
head_direction = np.random.uniform(0, 2 * np.pi, n_frames)    # circular (radians)
trial_type = np.random.choice([0, 1, 2], size=n_frames)       # discrete labels

print(f"calcium:        shape={calcium.shape}, dtype={calcium.dtype}")
print(f"x_pos:          shape={x_pos.shape}, dtype={x_pos.dtype}")
print(f"y_pos:          shape={y_pos.shape}, dtype={y_pos.dtype}")
print(f"speed:          shape={speed.shape}, dtype={speed.dtype}")
print(f"head_direction: shape={head_direction.shape}, dtype={head_direction.dtype}")
print(f"trial_type:     shape={trial_type.shape}, dtype={trial_type.dtype}")

In [None]:
# Quick look at the calcium traces
fig, axes = plt.subplots(2, 1, figsize=(14, 5), sharex=True)

time_sec = np.arange(n_frames) / fps
n_show = min(5, n_neurons)

# Top: overlaid traces (first few neurons)
ax = axes[0]
for i in range(n_show):
    ax.plot(time_sec, calcium[i], linewidth=0.8, label=f'neuron {i}')
ax.set_ylabel('dF/F0')
ax.set_title(f'Synthetic calcium traces ({n_neurons} neurons, {duration:.0f}s @ {fps:.0f} Hz)')
ax.legend(loc='upper right', fontsize=8)
ax.grid(True, alpha=0.3)

# Bottom: offset traces for clearer event structure
ax = axes[1]
offsets = np.arange(n_show) * 3
for i in range(n_show):
    ax.plot(time_sec, calcium[i] + offsets[i], 'k', linewidth=0.6)
ax.set_xlabel('Time (s)')
ax.set_ylabel('dF/F0 + offset')
ax.set_title('Offset view (same neurons)')
ax.set_yticks(offsets)
ax.set_yticklabels([f'n{i}' for i in range(n_show)])
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Feature types and aggregation

DRIADA runs a multi-stage
[auto-detection pipeline](https://driada.readthedocs.io/en/latest/api/information/utilities.html#module-driada.information.time_series_types)
on every feature to determine its type.  The pipeline considers uniqueness
ratio, integer fraction, gap statistics, distribution tests, and -- for
circular candidates -- variable name, value range, wraparound jumps, and
Von Mises fit.  The result is a `primary_type` (continuous / discrete /
ambiguous) plus a `subtype`:

| Primary type | Subtypes |
|---|---|
| continuous | `linear`, `circular` |
| discrete | `binary`, `categorical`, `count`, `timeline` |

You can override the detection with a
[`feature_types`](https://driada.readthedocs.io/en/latest/api/experiment/loading.html#driada.experiment.exp_build.load_exp_from_aligned_data)
dict mapping feature names to type strings.  Valid strings: `continuous`,
`linear`, `circular`, `phase`, `angle`, `discrete`, `binary`, `categorical`,
`count`, `timeline`.  When `feature_types` is provided, any auto-detected
circular feature **not** listed in it is overridden to `linear` (whitelist
behaviour).

`aggregate_features` groups related 1D features into a single
[`MultiTimeSeries`](https://driada.readthedocs.io/en/latest/api/information/core.html#driada.information.info_base.MultiTimeSeries)
(e.g. x_pos + y_pos -> position_2d).

`create_circular_2d=True` (the default) auto-creates a `(cos, sin)` encoding
for every circular feature.  This is important because MI estimators (GCMI,
KSG) assume the real line -- a raw angle wraps at 0 / 2*pi, breaking the
distance metric.  The `(cos, sin)` encoding maps the circle onto R^2 where
Euclidean distance is meaningful.

In [None]:
# Build the data dict
data = {
    # --- neural activity (required) --------------------------------
    "calcium": calcium,               # (n_neurons, n_frames)
    # "spikes": my_spikes_array,      # optional, same shape as calcium
    # --- dynamic features: behavioral variables (one per timepoint) -
    "x_pos": x_pos,                   # continuous
    "y_pos": y_pos,                   # continuous
    "speed": speed,                   # continuous
    "head_direction": head_direction,  # circular (radians)
    "trial_type": trial_type,         # discrete labels
}

# Override auto-detected feature types (optional)
feature_types = {
    "head_direction": "circular",   # auto-detection may miss this
    "trial_type": "categorical",    # refine from generic discrete
}

# Aggregate multi-component features (optional)
aggregate_features = {
    ("x_pos", "y_pos"): "position_2d",
}

# Build the Experiment
exp = load_exp_from_aligned_data(
    data_source="MyLab",
    exp_params={"name": "demo_recording"},
    data=data,
    feature_types=feature_types,
    aggregate_features=aggregate_features,
    static_features={"fps": 30.0},
    create_circular_2d=True,  # auto-create (cos, sin) for circular features
    verbose=True,
)

### Inspecting the Experiment

Note the auto-generated features in the list below:
- **position_2d** -- from `aggregate_features` (x_pos + y_pos)
- **head_direction_2d** -- from `create_circular_2d` (cos + sin encoding)

In [None]:
print(f"Neurons:     {exp.n_cells}")
print(f"Timepoints:  {exp.n_frames}")
print(f"FPS:         {exp.static_features.get('fps', 'unknown')}")
print(f"Calcium:     {exp.calcium.data.shape}")

print("\nDynamic features (time-varying behavioral variables):")
for name, ts in sorted(exp.dynamic_features.items()):
    ti = getattr(ts, "type_info", None)
    if ti and hasattr(ti, "primary_type"):
        dtype_str = f"{ti.primary_type}/{ti.subtype}"
        if ti.is_circular:
            dtype_str += " (circular)"
    else:
        dtype_str = "discrete" if ts.discrete else "continuous"
    shape = ts.data.shape
    print(f"  {name:25s}  shape={str(shape):15s}  type={dtype_str}")

### TimeSeries and MultiTimeSeries

Each dynamic feature is stored as one of two classes:

| Class | Description |
|---|---|
| [**`TimeSeries`**](https://driada.readthedocs.io/en/latest/api/information/core.html#driada.information.info_base.TimeSeries) | A single 1D variable (e.g. `speed`) |
| [**`MultiTimeSeries`**](https://driada.readthedocs.io/en/latest/api/information/core.html#driada.information.info_base.MultiTimeSeries) | Multiple aligned 1D variables stacked into a 2D array (e.g. `position_2d = [x, y]`) |

Key attributes on both:
- `.data` -- raw numpy array (1D or 2D)
- `.discrete` -- True if discrete, False if continuous
- `.type_info` -- rich type metadata (subtype, circularity)
- `.copula_normal_data` -- GCMI-ready transform (continuous only)
- `.int_data` -- integer-coded values (discrete only)

MultiTimeSeries additionally has `.ts_list` (list of component TimeSeries)
and `.n_dim` (number of components).

In [None]:
# Features are accessible as attributes: exp.speed, exp.position_2d, etc.
# This is equivalent to exp.dynamic_features["speed"].
speed_ts = exp.speed
print(f"speed.data.shape:   {speed_ts.data.shape}")
print(f"speed.discrete:     {speed_ts.discrete}")
print(f"speed.type_info:    {speed_ts.type_info.primary_type}"
      f"/{speed_ts.type_info.subtype}")
print(f"speed has copula:   {speed_ts.copula_normal_data is not None}")

# Access a 2D feature (MultiTimeSeries)
pos_mts = exp.position_2d
print(f"\nposition_2d.data.shape: {pos_mts.data.shape}")
print(f"position_2d.n_dim:      {pos_mts.n_dim}  (x and y)")
# Individual components are full TimeSeries objects:
print(f"position_2d.ts_list[0]: {pos_mts.ts_list[0].name}"
      f"  shape={pos_mts.ts_list[0].data.shape}")

# Discrete feature
trial_ts = exp.trial_type
print(f"\ntrial_type.discrete:  {trial_ts.discrete}")
print(f"trial_type.int_data:  {trial_ts.int_data[:8]}...")
print(f"trial_type has copula: {trial_ts.copula_normal_data is not None}")

### Batch spike reconstruction

`reconstruct_all_neurons()` applies the same reconstruction method across
the whole population.  Key parameters include `method` (`'wavelet'` or
`'threshold'`), `n_iter` (number of iterative detection passes), and
`show_progress` (display a progress bar).  After reconstruction, per-neuron
quality metrics (wavelet SNR, R-squared, event counts) are available.

In [None]:
exp.reconstruct_all_neurons(method='threshold', n_iter=3, show_progress=True)
print(f"[OK] Reconstructed spikes for {exp.n_cells} neurons")

# Collect per-neuron quality metrics
snr_list = []
r2_list = []
event_counts = []

for n in exp.neurons:
    snr_list.append(n.get_wavelet_snr())
    r2_list.append(n.get_reconstruction_r2())
    event_counts.append(n.get_event_count())

snr_arr = np.array(snr_list)
r2_arr = np.array(r2_list)
evt_arr = np.array(event_counts)

print(f"\nPopulation quality summary ({exp.n_cells} neurons):")
print(f"  Wavelet SNR:  {np.mean(snr_arr):.2f} +/- {np.std(snr_arr):.2f}"
      f"  (range {np.min(snr_arr):.2f} - {np.max(snr_arr):.2f})")
print(f"  Recon R2:     {np.mean(r2_arr):.4f} +/- {np.std(r2_arr):.4f}"
      f"  (range {np.min(r2_arr):.4f} - {np.max(r2_arr):.4f})")
print(f"  Event count:  {np.mean(evt_arr):.1f} +/- {np.std(evt_arr):.1f}"
      f"  (range {np.min(evt_arr)} - {np.max(evt_arr)})")

### Neural data access

Neural activity is stored in two complementary ways:

| View | Description |
|---|---|
| `exp.calcium` | `MultiTimeSeries` (n_neurons, n_frames) -- convenient for population-level analysis (DR, RSA, decoding) |
| `exp.neurons` | List of `Neuron` objects -- for single-cell analysis (reconstruction, kinetics, quality) |

In [None]:
# Population-level: full calcium matrix as MultiTimeSeries
print(f"exp.calcium:        {type(exp.calcium).__name__}"
      f"  shape={exp.calcium.data.shape}")
has_spikes = exp.spikes is not None and exp.spikes.data.any()
print(f"exp.spikes:         {'available' if has_spikes else 'not provided'}")

# Single-neuron level: list of Neuron objects
neuron = exp.neurons[0]
print(f"\nexp.neurons:        {len(exp.neurons)} Neuron objects")
print(f"neuron.cell_id:     {neuron.cell_id}")
print(f"neuron.ca:          {type(neuron.ca).__name__}"
      f"  shape={neuron.ca.data.shape}")
print(f"neuron.sp:          "
      f"{'shape=' + str(neuron.sp.data.shape) if neuron.sp else 'None (no spikes provided)'}")
print(f"neuron.fps:         {neuron.fps}")
# See Section 2 for spike reconstruction, event detection,
# kinetics optimization, and other Neuron methods.

### Save and reload

The entire Experiment (neural data + features + metadata) can be serialized
with [`save_exp_to_pickle`](https://driada.readthedocs.io/en/latest/api/experiment/loading.html#driada.experiment.exp_build.save_exp_to_pickle)
and restored with [`load_exp_from_pickle`](https://driada.readthedocs.io/en/latest/api/experiment/loading.html#driada.experiment.exp_build.load_exp_from_pickle)
for fast roundtrip storage.

In [None]:
pkl_path = os.path.join(tempfile.gettempdir(), "demo_experiment.pkl")
save_exp_to_pickle(exp, pkl_path, verbose=False)
file_size_mb = os.path.getsize(pkl_path) / 1024 / 1024
print(f"Saved:  {pkl_path} ({file_size_mb:.1f} MB)")

exp_loaded = load_exp_from_pickle(pkl_path, verbose=False)
print(f"Loaded: {exp_loaded.n_cells} neurons, {exp_loaded.n_frames} frames")

# Verify roundtrip
assert exp_loaded.n_cells == exp.n_cells
assert exp_loaded.n_frames == exp.n_frames
assert np.allclose(exp_loaded.calcium.data, exp.calcium.data)
print("Roundtrip verified -- data matches.")

# Clean up
os.remove(pkl_path)
print(f"Cleaned up {pkl_path}")

## 2. Quick tour with a synthetic population

### 2.1 Synthetic experiment with ground truth

[`generate_tuned_selectivity_exp`](https://driada.readthedocs.io/en/latest/api/experiment/synthetic.html#driada.experiment.synthetic.generators.generate_tuned_selectivity_exp)
creates a synthetic population where each neuron group has known selectivity
to specific features.  This lets us verify that downstream analyses recover
the ground truth.

In [None]:
population = [
    {'name': 'hd_cells',    'count': 10, 'features': ['head_direction']},
    {'name': 'speed_cells', 'count': 10, 'features': ['speed']},
    {'name': 'event_cells', 'count': 10, 'features': ['event_0']},
    {'name': 'mixed',       'count': 5,  'features': ['head_direction', 'speed']},
    {'name': 'background',  'count': 15, 'features': []},
]

exp_demo = generate_tuned_selectivity_exp(
    population=population,
    duration=600,
    fps=20,
    n_discrete_features=1,
    seed=42,
    verbose=True,
)

print(f"\nNeurons: {exp_demo.n_cells}, Frames: {exp_demo.n_frames}")
print(f"Features: {sorted(exp_demo.dynamic_features.keys())}")
print(f"Ground-truth selective pairs: {len(exp_demo.ground_truth['expected_pairs'])}")

### 2.2 INTENSE -- single-neuron selectivity

[INTENSE](https://driada.readthedocs.io/en/latest/api/intense/pipelines.html)
tests every neuron-feature pair for significant mutual information using a
two-stage permutation test.  See
[Notebook 02](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/02_selectivity_detection_intense.ipynb)
for the full walkthrough.

In [None]:
from driada.intense import compute_cell_feat_significance

stats, significance, info, results = compute_cell_feat_significance(
    exp_demo, verbose=True,
)

# significance is a nested dict: significance[neuron_id][feat_name] -> bool
neuron_ids = sorted(significance.keys())
feat_names = sorted(next(iter(significance.values())).keys())

print("\nSelective neurons per feature:")
for feat_name in feat_names:
    sig_neurons = [nid for nid in neuron_ids if significance[nid][feat_name]]
    print(f"  {feat_name:25s}  {len(sig_neurons):3d} neurons")

In [None]:
# stats is a nested dict: stats[neuron_id][feat_name] -> {'me': ..., ...}
# Pairs not tested in stage 1 have empty dicts, so use .get() with default 0
neuron_ids = sorted(stats.keys())
feat_names = sorted(next(iter(stats.values())).keys())
mi_matrix = np.array([[stats[nid][fn].get('me', 0.0) for fn in feat_names]
                       for nid in neuron_ids])

fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(mi_matrix.T, aspect='auto', cmap='viridis')
ax.set_xlabel('Neuron index')
ax.set_yticks(range(len(feat_names)))
ax.set_yticklabels(feat_names, fontsize=9)
ax.set_title('Mutual information (neuron x feature)')
plt.colorbar(im, ax=ax, label='MI (bits)')
plt.tight_layout()
plt.show()

### 2.3 Dimensionality reduction -- population geometry

Project population activity onto a 2D UMAP embedding to see how behavioral
variables are encoded in the neural manifold.  See
[Notebook 03](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/03_population_geometry_dr.ipynb)
for the full walkthrough.

In [None]:
embedding = exp_demo.create_embedding('umap', n_components=2, seed=0)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

ax = axes[0]
sc = ax.scatter(embedding[:, 0], embedding[:, 1],
                c=exp_demo.head_direction.data, cmap='hsv',
                s=1, alpha=0.5)
ax.set_title('UMAP colored by head direction')
plt.colorbar(sc, ax=ax, label='head direction (rad)')

ax = axes[1]
sc = ax.scatter(embedding[:, 0], embedding[:, 1],
                c=exp_demo.speed.data, cmap='plasma',
                s=1, alpha=0.5)
ax.set_title('UMAP colored by speed')
plt.colorbar(sc, ax=ax, label='speed')

for ax in axes:
    ax.set_xlabel('UMAP 1')
    ax.set_ylabel('UMAP 2')

plt.tight_layout()
plt.show()

### 2.4 Network -- functional connectivity

Test all neuron pairs for shared mutual information, build a binary
connectivity graph, and inspect its topology.  See
[Notebook 04](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/04_network_analysis.ipynb)
for the full walkthrough.

In [None]:
from driada.intense import compute_cell_cell_significance
from driada.network import Network
import scipy.sparse as sp
import networkx as nx

cell_sim, cell_sig, cell_pvals, cell_ids, cell_info = compute_cell_cell_significance(
    exp_demo, verbose=True,
)

net = Network(adj=sp.csr_matrix(cell_sig), preprocessing='giant_cc')
degrees = [d for _, d in net.graph.degree()]
clustering = nx.average_clustering(net.graph)
print(f"\nNetwork: {net.n} nodes, {net.graph.number_of_edges()} edges")
print(f"Mean degree: {np.mean(degrees):.1f}")
print(f"Clustering coefficient: {clustering:.3f}")

In [None]:
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(cell_sig, cmap='Greys', interpolation='nearest')
ax.set_xlabel('Neuron')
ax.set_ylabel('Neuron')
ax.set_title('Functional connectivity (significant pairs)')
plt.tight_layout()
plt.show()

## Next steps

This notebook gave you a quick tour of DRIADA's core capabilities.
Dive deeper with the detailed tutorials:

1. [**Data loading & neuron analysis**](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/01_data_loading_and_neurons.ipynb) -- spike reconstruction, kinetics optimization, quality metrics, surrogates.
2. [**INTENSE selectivity detection**](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/02_selectivity_detection_intense.ipynb) -- two-stage permutation test, tuning curves, ground-truth validation.
3. [**Population geometry (DR)**](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/03_population_geometry_dr.ipynb) -- PCA, UMAP, Isomap, Laplacian Eigenmaps, sequential DR, alignment metrics.
4. [**Network analysis**](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/04_network_analysis.ipynb) -- degree distributions, community detection, spectral analysis, null models.
5. [**Advanced capabilities**](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/05_advanced_capabilities.ipynb) -- INTENSE + DR pipeline, leave-one-out importance, RSA, RNN analysis.