# Population geometry & dimensionality reduction

Individual neurons encode specific variables (Notebook 02), but the
population *as a whole* forms a low-dimensional manifold whose geometry
reflects the task.  [**DRIADA**](https://driada.readthedocs.io) provides
a unified DR toolkit to extract, compare, and evaluate these manifolds.

| Step | Notebook | What it does |
|---|---|---|
| **Overview** | [00 -- DRIADA overview](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/00_driada_overview.ipynb) | Core data structures, quick tour of INTENSE, DR, networks |
| Neuron analysis | [01 -- Neuron analysis](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/01_data_loading_and_neurons.ipynb) | Spike reconstruction, kinetics optimization, quality metrics, surrogates |
| Single-neuron selectivity | [02 -- INTENSE](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/02_selectivity_detection_intense.ipynb) | Detect which neurons encode which behavioral variables |
| **Population geometry** | **03 -- this notebook** | Extract low-dimensional manifolds from population activity |
| Network analysis | [04 -- Networks](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/04_network_analysis.ipynb) | Build and analyze cell-cell interaction graphs |
| Putting it together | [05 -- Advanced](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/05_advanced_capabilities.ipynb) | Combine INTENSE + DR, leave-one-out importance, RSA, RNN analysis |

**Sections:**

1. **DR API quick reference** -- [`MVData`](https://driada.readthedocs.io/en/latest/api/dim_reduction/data_structures.html#driada.dim_reduction.data.MVData) wraps a matrix and provides
   one-line access to 15 DR methods.
2. **Method comparison** -- Benchmark PCA, Isomap, and UMAP on a Swiss
   roll with quality metrics.
3. **Circular manifold & dimensionality estimation** -- Head direction
   cells encode a ring. Extract it via DR and estimate intrinsic
   dimensionality.
4. **Autoencoder-based DR** -- Standard AE with `continue_learning`,
   Beta-VAE on the same circular data. Requires PyTorch.

In [None]:
# TODO: revert to '!pip install -q driada' after v1.0.0 PyPI release
!pip install -q git+https://github.com/iabs-neuro/driada.git@main
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import time
import warnings

from sklearn.datasets import make_swiss_roll
from scipy.sparse import csr_matrix

# DRIADA dimensionality reduction
from driada.dim_reduction import (
    MVData,
    dr_sequence,
    knn_preservation_rate,
    trustworthiness,
    continuity,
    stress,
)
from driada.dim_reduction.manifold_metrics import (
    compute_embedding_alignment_metrics,
)

# DRIADA network analysis (used for ProximityGraph demo in Section 1.3)
from driada.network import Network

# DRIADA experiment / synthetic data
from driada.experiment import generate_circular_manifold_exp

# DRIADA dimensionality estimation
from driada.dimensionality import (
    eff_dim,
    correlation_dimension,
    geodesic_dimension,
    pca_dimension,
)

# DRIADA visualization
from driada.utils.visual import (
    visualize_circular_manifold,
    plot_trajectories,
    DEFAULT_DPI,
)

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore', category=UserWarning)

## 1. DR API quick reference

[`MVData`](https://driada.readthedocs.io/en/latest/api/dim_reduction/data_structures.html#driada.dim_reduction.data.MVData) wraps a *(n_features x n_samples)* matrix and provides one-line
DR via [`get_embedding`](https://driada.readthedocs.io/en/latest/api/dim_reduction/data_structures.html#driada.dim_reduction.data.MVData.get_embedding).
For multi-step pipelines, see [`dr_sequence`](https://driada.readthedocs.io/en/latest/api/dim_reduction/algorithms.html#driada.dim_reduction.sequences.dr_sequence).

Each method is described by a
[`DRMethod`](https://driada.readthedocs.io/en/latest/api/dim_reduction/data_structures.html#driada.dim_reduction.dr_base.DRMethod)
object that specifies whether it is linear, requires a proximity graph,
distance matrix, or neural network.

| Method | Type | Graph | Disconnected |
|---|---|---|---|
| `pca` | Linear | -- | -- |
| `le` / `auto_le` | Spectral | k-NN | No |
| `dmaps` / `auto_dmaps` | Spectral | k-NN (weighted) | No |
| `isomap` | Geodesic | k-NN | No |
| `lle` / `hlle` | Local linear | k-NN | No |
| `mvu` | SDP | k-NN | No |
| `mds` | Distance | -- (needs dist matrix) | -- |
| `tsne` | Probabilistic | -- | -- |
| `umap` | Topological | k-NN | **Yes** |
| `ae` / `vae` / `flexible_ae` | Neural network | -- | -- |

Graph-based methods (9 of 15) construct a
[`ProximityGraph`](https://driada.readthedocs.io/en/latest/api/dim_reduction/data_structures.html#driada.dim_reduction.graph.ProximityGraph)
that inherits from
[`Network`](https://driada.readthedocs.io/en/latest/api/network/core.html#driada.network.net_base.Network),
giving access to spectral analysis, entropy, and community detection
(see cells below and [Notebook 04](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/04_network_analysis.ipynb)).

In [None]:
# Generate Swiss roll data for demonstration
n_samples = 1000
X_raw, color = make_swiss_roll(n_samples, noise=0.1, random_state=42)
X = X_raw.T  # Transpose to match MVData format (features x samples)

# Create MVData object
mvdata = MVData(X)

emb_pca = mvdata.get_embedding(method='pca')
print(f'PCA: {emb_pca.coords.shape}')

emb_iso = mvdata.get_embedding(method='isomap')
print(f'Isomap: {emb_iso.coords.shape}')

emb_le = mvdata.get_embedding(method='le')
print(f'Laplacian Eigenmaps: {emb_le.coords.shape}')

emb_umap = mvdata.get_embedding(method='umap', n_neighbors=50, min_dist=0.3)
print(f'UMAP: {emb_umap.coords.shape}')

In [None]:
# Visualize PCA, Isomap, UMAP, LE side by side
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.ravel()

for i, (emb, name) in enumerate(zip(
    [emb_pca, emb_iso, emb_umap, emb_le],
    ['PCA', 'Isomap', 'UMAP', 'Laplacian Eigenmaps'],
)):
    ax = axes[i]
    coords = emb.coords

    scatter = ax.scatter(
        coords[0, :], coords[1, :], c=color, cmap='viridis', s=20, alpha=0.7
    )
    ax.set_title(f'{name} Embedding')
    ax.set_xlabel('Component 1')
    ax.set_ylabel('Component 2')

    # Add colorbar to first subplot
    if i == 0:
        plt.colorbar(scatter, ax=ax, label='Position on roll')

plt.tight_layout()
plt.show()

### Advanced: sequential DR, custom metrics

In [None]:
# Sequential dimensionality reduction (PCA -> UMAP)

# Generate high-dimensional data
high_dim_data = np.random.randn(100, 500)  # 100 features, 500 samples
mvdata_highdim = MVData(high_dim_data)

print('\n1. High-dimensional data:')
emb_10d = mvdata_highdim.get_embedding(method='pca', dim=10)
print(f'   -> 10D embedding shape: {emb_10d.coords.shape}')

mvdata_10d = MVData(emb_10d.coords)
emb_2d = mvdata_10d.get_embedding(method='umap')
print(f'   -> Final 2D embedding shape: {emb_2d.coords.shape}')

print('\n2. Using custom metrics:')
emb_cosine = mvdata.get_embedding(method='isomap', metric='cosine')
print(f'   -> Cosine metric embedding shape: {emb_cosine.coords.shape}')

print('\n3. Handling sparse data:')
sparse_data = csr_matrix(X)
print(f'   -> Sparse matrix shape: {sparse_data.shape}')
mvdata_sparse = MVData(sparse_data)
emb_sparse = mvdata_sparse.get_embedding(method='pca')
print(f'   -> Sparse data embedding shape: {emb_sparse.coords.shape}')

print('\n4. Sequential dimensionality reduction (use high-dim data):')
print('   Method 1 (intuitive - manual chaining):')
emb1_seq = mvdata_highdim.get_embedding(method='pca', dim=20)
mvdata2_seq = MVData(emb1_seq.coords)
emb2_seq = mvdata2_seq.get_embedding(method='umap', dim=2)
print(f'   -> Result shape: {emb2_seq.coords.shape}')

print('\n   Method 2 (recommended - using dr_sequence):')
emb_seq = dr_sequence(mvdata_highdim, steps=[
    ('pca', {'dim': 20}),
    ('umap', {'dim': 2})
])
print(f'   -> Result shape: {emb_seq.coords.shape}')

### Graph structure behind DR

Graph-based DR methods (Isomap, LLE, Laplacian Eigenmaps) don't just
produce coordinates -- they construct an internal **proximity graph** where
nodes are data points and edges connect neighbors. In DRIADA, this graph
is a [`ProximityGraph`](https://driada.readthedocs.io/en/latest/api/dim_reduction/data_structures.html#driada.dim_reduction.graph.ProximityGraph)
that **inherits from [`Network`](https://driada.readthedocs.io/en/latest/api/network/core.html#driada.network.net_base.Network)**,
giving you access to spectral decomposition, entropy, community detection,
and all other `Network` analysis methods.

Access it via `embedding.graph` after running any graph-based method.
For a full treatment of network spectral analysis, see
[Notebook 04 -- Network analysis](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/04_network_analysis.ipynb).

**Lost nodes:** When the k-NN graph is disconnected, DRIADA extracts the
giant connected component and discards all nodes outside it. The removed
indices are stored in `embedding.graph.lost_nodes` (a set, empty if no
nodes were lost). The `max_deleted_nodes` parameter (default 0.5)
controls the maximum fraction of points that can be discarded before an
error is raised -- set it higher if your data is expected to have
outliers or sparse regions.

In [None]:
pgraph = emb_iso.graph

print(f"Type: {type(pgraph).__name__}")
print(f"  inherits from Network: {isinstance(pgraph, Network)}")
print(f"Nodes: {pgraph.n}")
print(f"Edges: {pgraph.adj.nnz // 2}")
print(f"Mean degree: {pgraph.deg.mean():.1f}")
print(f"Metric used: {pgraph.metric}")

In [None]:
pgraph.diagonalize(mode='nlap')
nlap_spectrum = pgraph.get_spectrum('nlap')
ipr = pgraph.get_ipr('nlap')

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Normalized Laplacian spectrum
sorted_spec = np.sort(np.real(nlap_spectrum))
axes[0].plot(sorted_spec, 'o', markersize=2)
axes[0].set_xlabel('Index')
axes[0].set_ylabel('Eigenvalue')
axes[0].set_title('Normalized Laplacian spectrum')
axes[0].grid(True, alpha=0.3)

# IPR -- eigenvector localization
axes[1].plot(np.sort(ipr), 'o', markersize=2)
axes[1].axhline(1.0 / pgraph.n, color='r', linestyle='--',
                label=f'1/N = {1.0/pgraph.n:.4f}')
axes[1].set_xlabel('Eigenvector index (sorted)')
axes[1].set_ylabel('IPR')
axes[1].set_title('Inverse participation ratio')
axes[1].legend(fontsize=9)
axes[1].grid(True, alpha=0.3)

# Thermodynamic entropy
tlist = np.logspace(-2, 2, 50)
entropy = pgraph.calculate_thermodynamic_entropy(tlist, norm=True)
axes[2].semilogx(tlist, entropy, linewidth=2)
axes[2].set_xlabel('Temperature')
axes[2].set_ylabel('Entropy (bits)')
axes[2].set_title('Von Neumann entropy S(t)')
axes[2].grid(True, alpha=0.3)

plt.suptitle('Spectral analysis of Isomap k-NN graph', fontsize=14)
plt.tight_layout()
plt.show()

print(f'Fiedler value: {sorted_spec[1]:.4f}')
print(f'Spectral gap: {sorted_spec[1] - sorted_spec[0]:.4f}')
print(f'Max entropy: {np.max(entropy):.2f} bits '
      f'(upper bound = log2(N) = {np.log2(pgraph.n):.2f})')

The Laplacian spectrum reveals the graph's connectivity structure:
a large spectral gap indicates the graph is well-connected, while
clustered eigenvalues near zero suggest loosely connected components.
The IPR shows whether eigenvectors are delocalized (spread across
all nodes) or localized (concentrated on a few).

These same spectral tools apply to *any* [`Network`](https://driada.readthedocs.io/en/latest/api/network/core.html#driada.network.net_base.Network) -- functional
connectivity from INTENSE, structural connectomes, or correlation
networks. See [Notebook 04](https://colab.research.google.com/github/iabs-neuro/driada/blob/main/notebooks/04_network_analysis.ipynb)
for the full spectral analysis toolkit.

## 2. Method comparison

The API handles the mechanics. But which method should you use? The following
benchmark on a Swiss Roll manifold reveals the trade-offs.

DRIADA provides four quality metrics to evaluate DR embeddings:

- [`knn_preservation_rate`](https://driada.readthedocs.io/en/latest/api/dim_reduction/manifold_metrics.html#driada.dim_reduction.manifold_metrics.knn_preservation_rate) -- fraction of original k nearest neighbors preserved.
- [`trustworthiness`](https://driada.readthedocs.io/en/latest/api/dim_reduction/manifold_metrics.html#driada.dim_reduction.manifold_metrics.trustworthiness) -- are embedding neighbors real neighbors?
- [`continuity`](https://driada.readthedocs.io/en/latest/api/dim_reduction/manifold_metrics.html#driada.dim_reduction.manifold_metrics.continuity) -- do true neighbors stay close?
- [`stress`](https://driada.readthedocs.io/en/latest/api/dim_reduction/manifold_metrics.html#driada.dim_reduction.manifold_metrics.stress) -- normalized distance distortion.

We compare PCA, Isomap, and UMAP on a Swiss roll.

In [None]:
# Compare PCA, Isomap, UMAP on Swiss roll
X_swiss, color_swiss = make_swiss_roll(1000, noise=0.05, random_state=42)

methods_cmp = {
    'PCA': {},
    'Isomap': {'n_neighbors': 10, 'max_deleted_nodes': 0.3},  # 10 neighbors: local geodesics
    'UMAP': {'n_neighbors': 15, 'min_dist': 0.1},  # min_dist: tighter clusters
}
cmp_results = {}

for name, params in methods_cmp.items():
    mvd = MVData(X_swiss.T)
    t0 = time.time()
    emb = mvd.get_embedding(method=name.lower(), **params)
    dt = time.time() - t0
    coords = emb.coords.T

    # Handle lost nodes for graph-based methods
    data_cmp, color_cmp = X_swiss, color_swiss
    if hasattr(emb, 'graph') and hasattr(emb.graph, 'lost_nodes') and len(emb.graph.lost_nodes) > 0:
        mask = np.ones(len(X_swiss), dtype=bool)
        mask[emb.graph.lost_nodes] = False
        data_cmp, color_cmp = X_swiss[mask], color_swiss[mask]

    cmp_results[name] = {
        'coords': coords, 'color': color_cmp, 'time': dt,
        'knn': knn_preservation_rate(data_cmp, coords, k=10),
        'trust': trustworthiness(data_cmp, coords, k=10),
        'cont': continuity(data_cmp, coords, k=10),
        'stress': stress(data_cmp, coords, normalized=True),
    }
    print(f'{name:8s}  k-NN={cmp_results[name]["knn"]:.3f}  '
          f'Trust={cmp_results[name]["trust"]:.3f}  '
          f'Stress={cmp_results[name]["stress"]:.3f}  '
          f'Time={dt:.2f}s')

In [None]:
# Visualize Swiss roll embeddings
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for ax, (name, res) in zip(axes, cmp_results.items()):
    sc = ax.scatter(
        res['coords'][:, 0], res['coords'][:, 1],
        c=res['color'], cmap='viridis', s=20, alpha=0.7,
    )
    ax.set_title(
        f'{name}\n'
        f'k-NN={res["knn"]:.3f}  Trust={res["trust"]:.3f}'
    )
    ax.set_xlabel('Dim 1')
    ax.set_ylabel('Dim 2')

plt.suptitle('Swiss roll embeddings', fontsize=14)
plt.tight_layout()
plt.show()

## 3. Circular manifold & dimensionality estimation

Synthetic benchmarks show what methods *can* do; real neural data shows what
they *must* handle. Head direction cells encode a circular variable — a
topology that some DR methods preserve better than others.

Head direction cells encode a ring. We generate 100 HD cells with
[`generate_circular_manifold_exp`](https://driada.readthedocs.io/en/latest/api/experiment/synthetic.html#driada.experiment.generate_circular_manifold_exp)
and estimate intrinsic dimensionality using the
[`driada.dimensionality`](https://driada.readthedocs.io/en/latest/api/dimensionality/index.html) module.

This module provides several estimators:
- **Linear:** [`pca_dimension`](https://driada.readthedocs.io/en/latest/api/dimensionality/linear.html#driada.dimensionality.linear.pca_dimension) (variance threshold), [`effective_rank`](https://driada.readthedocs.io/en/latest/api/dimensionality/linear.html#driada.dimensionality.linear.effective_rank), [`pca_dimension_profile`](https://driada.readthedocs.io/en/latest/api/dimensionality/linear.html#driada.dimensionality.linear.pca_dimension_profile)
- **Intrinsic:** [`correlation_dimension`](https://driada.readthedocs.io/en/latest/api/dimensionality/intrinsic.html#driada.dimensionality.intrinsic.correlation_dimension), [`geodesic_dimension`](https://driada.readthedocs.io/en/latest/api/dimensionality/intrinsic.html#driada.dimensionality.intrinsic.geodesic_dimension), [`nn_dimension`](https://driada.readthedocs.io/en/latest/api/dimensionality/intrinsic.html#driada.dimensionality.intrinsic.nn_dimension) (TWO-NN)
- **Effective:** [`eff_dim`](https://driada.readthedocs.io/en/latest/api/dimensionality/effective.html#driada.dimensionality.effective.eff_dim) (Renyi entropy of eigenvalues)

Below we use a subset of these, compare real vs shuffled data,
and extract the circular manifold via DR.

In [None]:
def estimate_dimensionality(neural_data, methods=None, ds=1):
    """Estimate intrinsic dimensionality using multiple DRIADA methods."""
    if methods is None:
        methods = [
            'pca_90', 'pca_95', 'participation_ratio',
            'correlation_dim', 'geodesic_dim',
        ]

    dim_estimates = {}

    # Downsample data if requested
    if ds > 1:
        neural_data_ds = neural_data[:, ::ds]
        print(f'  Downsampled: {neural_data.shape} -> {neural_data_ds.shape}')
    else:
        neural_data_ds = neural_data

    # Transpose data for methods that expect (n_samples, n_features)
    data_transposed = neural_data_ds.T

    # Linear methods
    if 'pca_90' in methods:
        dim_estimates['pca_90'] = pca_dimension(data_transposed, threshold=0.90)
    if 'pca_95' in methods:
        dim_estimates['pca_95'] = pca_dimension(data_transposed, threshold=0.95)

    # Nonlinear intrinsic methods
    if 'correlation_dim' in methods:
        try:
            print('  Computing correlation dimension...')
            dim_estimates['correlation_dim'] = correlation_dimension(data_transposed)
        except Exception as e:
            print(f'  Warning: correlation_dimension failed: {e}')
            dim_estimates['correlation_dim'] = np.nan

    if 'geodesic_dim' in methods:
        try:
            print('  Computing geodesic dimension (this may take time)...')
            dim_estimates['geodesic_dim'] = geodesic_dimension(
                data_transposed, k=20, mode='fast', factor=4
            )
        except Exception as e:
            print(f'  Warning: geodesic_dimension failed: {e}')
            dim_estimates['geodesic_dim'] = np.nan

    # Effective dimensionality (participation ratio)
    if 'participation_ratio' in methods:
        dim_estimates['participation_ratio'] = eff_dim(
            neural_data_ds.T, enable_correction=False, q=2
        )

    return dim_estimates

In [None]:
print('1. Generating head direction cell population...')

# Generate synthetic head direction cells
exp_circ, info_circ = generate_circular_manifold_exp(
    n_neurons=100,
    duration=600,  # 10 minutes
    kappa=4.0,     # Tuning width
    seed=42,
    verbose=True,
    return_info=True,
)

# Extract neural activity and true head directions
neural_data_circ = exp_circ.calcium.scdata  # Shape: (n_neurons, n_timepoints)
true_angles = info_circ['head_direction']   # Ground truth angles

print(f'\nGenerated {neural_data_circ.shape[0]} neurons, '
      f'{neural_data_circ.shape[1]} timepoints')
print(f'Neural activity shape: {neural_data_circ.shape}')

In [None]:
# Estimate intrinsic dimensionality
print('\n2. Estimating intrinsic dimensionality of neural population...')
print('-' * 50)

dim_methods = [
    'pca_90', 'pca_95', 'participation_ratio',
    'correlation_dim', 'geodesic_dim',
]

# Use ds=5 downsampling for faster computation
dim_estimates = estimate_dimensionality(
    neural_data_circ, methods=dim_methods, ds=5
)

print('Dimensionality estimates:')
for method, estimate in dim_estimates.items():
    print(f'  {method:20s}: {estimate:.2f}')

print('\nNote: Head direction cells should have intrinsic dimensionality ~ 1')
print('      (circular manifold), but finite sampling may increase estimates')

# Compare with temporally shuffled data to demonstrate manifold structure
print('\n2b. Comparing with temporally shuffled data (destroys manifold)...')
print('-' * 50)

# Get shuffled calcium data from experiment
shuffled_calcium = exp_circ.get_multicell_shuffled_calcium()

# Estimate dimensionality on shuffled data
dim_estimates_shuffled = estimate_dimensionality(
    shuffled_calcium, methods=dim_methods, ds=5
)

print('\nDimensionality estimates (SHUFFLED data):')
for method, estimate in dim_estimates_shuffled.items():
    print(f'  {method:20s}: {estimate:.2f}')

print('\nComparison (Real vs Shuffled):')
print(f'{"Method":<20s} {"Real":>8s} {"Shuffled":>8s} {"Increase":>10s}')
print('-' * 50)
for method in dim_methods:
    real = dim_estimates[method]
    shuffled = dim_estimates_shuffled[method]
    increase = ((shuffled - real) / real) * 100
    print(f'{method:<20s} {real:8.2f} {shuffled:8.2f} {increase:+9.1f}%')

print('\nInterpretation: Temporal shuffling destroys the circular manifold structure,')
print('                dramatically increasing dimensionality.')

In [None]:
# Plot eigenvalue spectrum
print('\n3. Plotting eigenvalue spectrum...')

# Compute correlation matrix
data_centered = neural_data_circ - np.mean(neural_data_circ, axis=1, keepdims=True)
corr_mat = np.corrcoef(data_centered)

# Get eigenvalues
eigenvalues = np.linalg.eigvalsh(corr_mat)[::-1]  # Descending order
eigenvalues = eigenvalues[eigenvalues > 0]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Eigenvalue spectrum
ax1.plot(eigenvalues, 'o-', markersize=4)
ax1.set_xlabel('Component')
ax1.set_ylabel('Eigenvalue')
ax1.set_title('Eigenvalue spectrum')
ax1.set_yscale('log')
ax1.grid(True, alpha=0.3)

# Cumulative variance explained
cumvar = np.cumsum(eigenvalues) / np.sum(eigenvalues)
ax2.plot(cumvar, 'o-', markersize=4)
ax2.axhline(0.9, color='r', linestyle='--', label='90% variance')
ax2.axhline(0.95, color='orange', linestyle='--', label='95% variance')
ax2.set_xlabel('Number of Components')
ax2.set_ylabel('Cumulative Variance Explained')
ax2.set_title('Cumulative variance explained')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Apply dimensionality reduction using MVData
print('\n4. Applying dimensionality reduction methods using MVData...')
print('-' * 50)

# Create MVData object from calcium data with downsampling
downsampling_circ = 10  # use every 10th frame for speed
mvdata_circ = MVData(neural_data_circ, downsampling=downsampling_circ)

# Downsample true angles to match
true_angles_ds = true_angles[::downsampling_circ]

# Dictionary to store embeddings
embeddings_dict_circ = {}

# PCA
print('- PCA...')
pca_embedding_circ = mvdata_circ.get_embedding(method='pca', dim=2)
embeddings_dict_circ['PCA'] = pca_embedding_circ.coords.T
print(
    f'  First 2 PCs explain '
    f'{100 * sum(pca_embedding_circ.reducer_.explained_variance_ratio_):.1f}% of variance'
)

# Isomap
print('- Isomap...')
isomap_embedding_circ = mvdata_circ.get_embedding(
    method='isomap', dim=2, n_neighbors=50
)
embeddings_dict_circ['Isomap'] = isomap_embedding_circ.coords.T

# UMAP with increased parameters for better global structure
print('- UMAP...')
umap_embedding_circ = mvdata_circ.get_embedding(
    method='umap', n_components=2, n_neighbors=100, min_dist=0.5
)
embeddings_dict_circ['UMAP'] = umap_embedding_circ.coords.T

In [None]:
# Visualize extracted manifolds
print('\n5. Visualizing extracted manifolds...')

# Create embedding comparison visualization
embeddings_list_circ = [
    embeddings_dict_circ[m] for m in ['PCA', 'Isomap', 'UMAP']
]
fig_embedding = visualize_circular_manifold(
    embeddings_list_circ, true_angles_ds, ['PCA', 'Isomap', 'UMAP']
)
plt.show()

# Trajectory visualization
print('\n6. Analyzing temporal continuity of extracted manifolds...')

# Use only first 1000 timepoints for trajectory visualization
traj_len = min(1000, embeddings_dict_circ['PCA'].shape[0])
trajectories_dict = {
    method: emb[:traj_len] for method, emb in embeddings_dict_circ.items()
}

fig3 = plot_trajectories(
    embeddings=trajectories_dict,
    trajectory_kwargs={'arrow_spacing': 50, 'linewidth': 0.5, 'alpha': 0.5},
    figsize=(15, 5),
    dpi=DEFAULT_DPI,
)
plt.show()

In [None]:
# Summary statistics
print('\n7. Summary of manifold extraction quality:')
print('-' * 60)
print(f'{"Method":10s} | {"Correlation":12s} | {"Mean Error":12s} | {"Quality":8s}')
print('-' * 60)

for method, embedding in embeddings_dict_circ.items():
    # Use manifold metrics API
    alignment_metrics = compute_embedding_alignment_metrics(
        embedding, true_angles_ds, 'circular'
    )
    r = alignment_metrics['correlation']
    error = alignment_metrics['error']

    # Quality assessment
    if abs(r) > 0.95:
        quality_str = 'Excellent'
    elif abs(r) > 0.85:
        quality_str = 'Good'
    elif abs(r) > 0.70:
        quality_str = 'Fair'
    else:
        quality_str = 'Poor'

    print(f'{method:10s} | {r:12.3f} | {error:9.3f} rad | {quality_str:8s}')

### Conclusions

- Head direction cells encode a 1D ring, but finite sampling, noise, and
  calcium dynamics inflate dimensionality estimates above the true value.
- Temporal shuffling destroys manifold structure (dimensionality increases),
  confirming the manifold is real.
- Nonlinear methods (Isomap, UMAP) better preserve circular topology.
- PCA captures variance but may distort circular structure.
- Higher `n_neighbors` helps preserve global structure.

## 4. Autoencoder-based DR

The methods above (PCA, Isomap, UMAP) use fixed neighborhoods or geodesics.
How do neural network methods handle the same circular manifold data?
Autoencoders learn a flexible nonlinear mapping — at the cost of more
hyperparameters and training time.

Neural network DR via [`flexible_ae`](https://driada.readthedocs.io/en/latest/api/dim_reduction/neural_methods.html): **standard autoencoder** (AE) with
`continue_learning`, and **Beta-VAE** with KL divergence, compared against PCA.
Key parameters: `architecture` (`'ae'` or `'vae'`) selects the model type,
and `continue_learning` resumes training without resetting weights.
Requires PyTorch.

In [None]:
try:
    import torch  # noqa: F401
    HAS_TORCH = True
    print('PyTorch available -- autoencoder examples will run.')
except ImportError:
    HAS_TORCH = False
    print(
        'PyTorch not found. Install with: pip install torch\n'
        'Autoencoder cells will be skipped.'
    )

In [None]:
if HAS_TORCH:
    print('[1] Reusing head direction cell data from Section 3')
    print('-' * 40)
    calcium_ae = neural_data_circ  # (n_neurons, n_timepoints) from Section 3
    head_direction_ae = true_angles  # from Section 3
    print(f'  Calcium shape: {calcium_ae.shape}')
    print(f'  Head direction shape: {head_direction_ae.shape}')

    mvdata_ae = MVData(calcium_ae, downsampling=5, verbose=False)
    color_ae = head_direction_ae[::5]

In [None]:
if HAS_TORCH:
    print('\n[2] Standard autoencoder')
    print('-' * 40)

    # Train for 5 epochs (not fully converged)
    emb_ae = mvdata_ae.get_embedding(
        method='flexible_ae',
        dim=2,
        architecture='ae',
        inter_dim=64,  # bottleneck width
        epochs=5,  # under-trained for demo
        lr=1e-3,
        feature_dropout=0.1,
        loss_components=[{'name': 'reconstruction', 'weight': 1.0, 'loss_type': 'mse'}],
        verbose=False,
    )
    print(f'  After 5 epochs   - loss: {emb_ae.nn_loss:.4f}')

    # Continue training for 25 more epochs
    emb_ae.continue_learning(25, lr=1e-3, verbose=False)
    print(f'  After 25 more    - loss: {emb_ae.nn_loss:.4f}')

    # Fine-tune with lower learning rate
    emb_ae.continue_learning(20, lr=1e-4, verbose=False)
    print(f'  After 20 fine-tune - loss: {emb_ae.nn_loss:.4f}')

    print('\n[3] Beta-VAE (beta=4.0)')
    print('-' * 40)
    emb_vae = mvdata_ae.get_embedding(
        method='flexible_ae',
        dim=2,
        architecture='vae',
        inter_dim=64,
        epochs=150,
        lr=1e-3,
        feature_dropout=0.1,
        loss_components=[
            {'name': 'reconstruction', 'weight': 1.0, 'loss_type': 'mse'},
            {'name': 'beta_vae', 'weight': 1.0, 'beta': 4.0},  # beta > 1: encourages disentanglement
        ],
        verbose=False,
    )
    print(f'  Embedding shape: {emb_vae.coords.shape}')
    print(f'  Test loss (reconstruction + KL): {emb_vae.nn_loss:.4f}')

    print('\n[4] PCA (for comparison)')
    print('-' * 40)
    emb_pca_ae = mvdata_ae.get_embedding(method='pca', dim=2)
    print(f'  Embedding shape: {emb_pca_ae.coords.shape}')

In [None]:
if HAS_TORCH:
    print('[5] Creating comparison plot')
    print('-' * 40)

    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    embeddings_ae = [
        (emb_pca_ae, 'PCA'),
        (emb_ae, 'AE (5 + 25 + 20 epochs)'),
        (emb_vae, 'Beta-VAE'),
    ]

    for ax, (emb, title) in zip(axes, embeddings_ae):
        coords = emb.coords  # (dim, n_samples)
        sc = ax.scatter(
            coords[0], coords[1], c=color_ae, cmap='hsv',
            s=2, alpha=0.5, vmin=0, vmax=2 * np.pi
        )
        ax.set_title(title)
        ax.set_xlabel('Dim 1')
        ax.set_ylabel('Dim 2')

    fig.colorbar(sc, ax=axes[-1], label='Head direction (rad)')
    plt.suptitle('Circular manifold recovery (colored by head direction)')
    plt.tight_layout()
    plt.show()

In [None]:
if HAS_TORCH:
    print('[6] Alignment metrics for autoencoder methods')
    print('-' * 60)
    print(f'{"Method":20s} | {"Correlation":12s} | {"Mean Error":12s} | {"Quality":8s}')
    print('-' * 60)

    ae_embeddings = {
        'PCA': emb_pca_ae.coords.T,
        'AE': emb_ae.coords.T,
        'Beta-VAE': emb_vae.coords.T,
    }

    for method, embedding in ae_embeddings.items():
        alignment_metrics = compute_embedding_alignment_metrics(
            embedding, color_ae, 'circular'
        )
        r = alignment_metrics['correlation']
        error = alignment_metrics['error']

        if abs(r) > 0.95:
            quality_str = 'Excellent'
        elif abs(r) > 0.85:
            quality_str = 'Good'
        elif abs(r) > 0.70:
            quality_str = 'Fair'
        else:
            quality_str = 'Poor'

        print(f'{method:20s} | {r:12.3f} | {error:9.3f} rad | {quality_str:8s}')

## Further reading

Standalone examples (run directly, no external data needed):
- [compare_dr_methods](https://github.com/iabs-neuro/driada/tree/main/examples/compare_dr_methods) -- DR method comparison with quality metrics
- [autoencoder_dr](https://github.com/iabs-neuro/driada/tree/main/examples/autoencoder_dr) -- AE and Beta-VAE on circular manifold data
- [circular_manifold](https://github.com/iabs-neuro/driada/tree/main/examples/circular_manifold) -- Dimensionality estimation and manifold extraction
- [intense_dr_pipeline](https://github.com/iabs-neuro/driada/tree/main/examples/intense_dr_pipeline) -- INTENSE-guided neuron selection for DR

[All examples](https://github.com/iabs-neuro/driada/tree/main/examples)