# Working with Real Neural Data

This notebook demonstrates how to use INTENSE with your own neural recordings. We'll cover:
- Data preparation and formatting
- Creating Experiment objects from various data sources
- Handling common data issues
- Advanced analysis workflows
- Integration with other analysis pipelines

In [None]:
# Setup
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(os.getcwd()), 'src'))

import driada
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import signal
import h5py

print("Setup complete! ✓")

## Part 1: Data Formats and Requirements

INTENSE requires:
1. **Neural signals**: Calcium traces or spike trains
2. **Behavioral variables**: Time-aligned features
3. **Metadata**: Sampling rate, calcium dynamics

In [None]:
# Example: Simulating realistic data structure
# In practice, you would load this from your files

# Simulate realistic calcium imaging data
n_neurons = 100
n_timepoints = 36000  # 30 minutes at 20 Hz
fps = 20.0

# Generate base calcium traces with realistic properties
time = np.arange(n_timepoints) / fps
calcium_traces = np.zeros((n_neurons, n_timepoints))

# Add realistic calcium dynamics
for i in range(n_neurons):
    # Baseline fluorescence
    baseline = 100 + np.random.randn() * 10
    
    # Add calcium transients
    n_events = np.random.poisson(0.1 * n_timepoints / fps)  # ~0.1 Hz firing rate
    event_times = np.random.choice(n_timepoints - 100, n_events, replace=False)
    
    trace = np.ones(n_timepoints) * baseline
    for event_time in event_times:
        # Calcium transient with rise and decay
        t_event = np.arange(100)
        rise_time = 0.5 * fps  # 0.5 seconds rise
        decay_time = 2.0 * fps  # 2 seconds decay
        transient = 20 * (1 - np.exp(-t_event/rise_time)) * np.exp(-t_event/decay_time)
        trace[event_time:event_time+100] += transient
    
    # Add noise
    trace += np.random.randn(n_timepoints) * 2
    calcium_traces[i] = trace

print(f"Generated calcium traces: {calcium_traces.shape}")
print(f"Duration: {n_timepoints/fps/60:.1f} minutes at {fps} Hz")

In [None]:
# Generate realistic behavioral data

# Continuous variables: Animal position in a 2D arena
# Simulate exploration with realistic movement patterns
velocity = 5.0  # cm/s average
dt = 1/fps

# Random walk with momentum
x_pos = np.zeros(n_timepoints)
y_pos = np.zeros(n_timepoints)
vx, vy = 0, 0

for t in range(1, n_timepoints):
    # Update velocity with small random changes
    vx += (np.random.randn() * 2 - 0.1 * vx) * dt
    vy += (np.random.randn() * 2 - 0.1 * vy) * dt
    
    # Limit velocity
    speed = np.sqrt(vx**2 + vy**2)
    if speed > velocity * 2:
        vx *= velocity * 2 / speed
        vy *= velocity * 2 / speed
    
    # Update position
    x_pos[t] = x_pos[t-1] + vx * dt
    y_pos[t] = y_pos[t-1] + vy * dt
    
    # Boundary conditions (50x50 cm arena)
    if abs(x_pos[t]) > 25:
        vx *= -1
        x_pos[t] = np.clip(x_pos[t], -25, 25)
    if abs(y_pos[t]) > 25:
        vy *= -1
        y_pos[t] = np.clip(y_pos[t], -25, 25)

# Calculate derived features
speed = np.sqrt(np.gradient(x_pos)**2 + np.gradient(y_pos)**2) * fps
head_direction = np.arctan2(np.gradient(y_pos), np.gradient(x_pos))

# Discrete variables: Trial types and rewards
# Simulate task structure with trials
trial_duration = int(30 * fps)  # 30 second trials
n_trials = n_timepoints // trial_duration

trial_type = np.zeros(n_timepoints, dtype=int)
reward = np.zeros(n_timepoints, dtype=int)

for trial in range(n_trials):
    start = trial * trial_duration
    end = (trial + 1) * trial_duration
    
    # Randomly assign trial type (0: left, 1: right)
    ttype = np.random.choice([0, 1])
    trial_type[start:end] = ttype
    
    # Reward at end of trial (80% correct)
    if np.random.rand() < 0.8:
        reward[end-int(2*fps):end] = 1  # 2 second reward period

print(f"Generated behavioral variables:")
print(f"  - Position (x, y): continuous, {x_pos.shape[0]} samples")
print(f"  - Speed: continuous, range {speed.min():.1f} to {speed.max():.1f} cm/s")
print(f"  - Head direction: continuous, -π to π")
print(f"  - Trial type: discrete, {len(np.unique(trial_type))} types")
print(f"  - Reward: discrete, {reward.sum()/fps:.1f} seconds total")

## Part 2: Creating an Experiment Object

Now let's create a proper Experiment object with this data:

In [None]:
# Method 1: Basic Experiment creation
exp = driada.Experiment(
    signature='RealDataExample',
    calcium=calcium_traces,
    spikes=None,  # Will be inferred from calcium
    exp_identificators={
        'animal_id': 'mouse_001',
        'session': 'session_001',
        'date': '2024-01-15',
        'experiment_type': 'navigation_task'
    },
    static_features={
        'fps': fps,
        't_rise_sec': 0.5,    # GCaMP6f rise time
        't_off_sec': 2.0,     # GCaMP6f decay time
        'imaging_depth': 150,  # microns
        'objective': '16x',
        'indicator': 'GCaMP6f'
    },
    dynamic_features={
        'x_position': x_pos,
        'y_position': y_pos,
        'speed': speed,
        'head_direction': head_direction,
        'trial_type': trial_type,
        'reward': reward
    }
)

print(f"Created Experiment:")
print(f"  - Neurons: {exp.n_cells}")
print(f"  - Duration: {exp.n_frames / exp.fps / 60:.1f} minutes")
print(f"  - Features: {list(exp.dynamic_features.keys())}")
print(f"  - Memory usage: ~{exp.calcium.data.nbytes / 1e6:.1f} MB")

## Part 3: Data Quality Checks

Before running INTENSE, let's check data quality:

In [None]:
# Check 1: Visualize calcium traces
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)

# Plot a few example neurons
time_slice = slice(0, int(60 * fps))  # First minute
time_axis = np.arange(len(calcium_traces[0, time_slice])) / fps

for i, ax in enumerate(axes):
    neuron_idx = i * 10  # Sample neurons
    trace = calcium_traces[neuron_idx, time_slice]
    
    ax.plot(time_axis, trace, 'k-', linewidth=0.5)
    ax.set_ylabel(f'Neuron {neuron_idx}\nΔF')
    ax.grid(True, alpha=0.3)
    
    # Mark detected spikes if available
    if hasattr(exp.neurons[neuron_idx], 'sp') and exp.neurons[neuron_idx].sp is not None:
        spike_times = np.where(exp.neurons[neuron_idx].sp.data[time_slice] > 0)[0] / fps
        ax.scatter(spike_times, 
                  trace[exp.neurons[neuron_idx].sp.data[time_slice] > 0], 
                  color='red', s=20, zorder=5)

axes[-1].set_xlabel('Time (seconds)')
axes[0].set_title('Example Calcium Traces')
plt.tight_layout()
plt.show()

In [None]:
# Check 2: Behavioral data coverage
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Spatial coverage
axes[0, 0].plot(x_pos[::10], y_pos[::10], 'k-', alpha=0.3, linewidth=0.5)
axes[0, 0].scatter(x_pos[::100], y_pos[::100], c=time[::100], 
                   cmap='viridis', s=2, alpha=0.5)
axes[0, 0].set_xlabel('X position (cm)')
axes[0, 0].set_ylabel('Y position (cm)')
axes[0, 0].set_title('Spatial Coverage')
axes[0, 0].set_aspect('equal')

# Speed distribution
axes[0, 1].hist(speed, bins=50, alpha=0.7, edgecolor='black')
axes[0, 1].set_xlabel('Speed (cm/s)')
axes[0, 1].set_ylabel('Count')
axes[0, 1].set_title('Speed Distribution')
axes[0, 1].axvline(speed.mean(), color='red', linestyle='--', 
                   label=f'Mean: {speed.mean():.1f} cm/s')
axes[0, 1].legend()

# Head direction coverage
axes[1, 0].hist(head_direction, bins=36, alpha=0.7, edgecolor='black')
axes[1, 0].set_xlabel('Head Direction (radians)')
axes[1, 0].set_ylabel('Count')
axes[1, 0].set_title('Head Direction Distribution')
axes[1, 0].set_xlim(-np.pi, np.pi)

# Trial structure
trial_starts = np.where(np.diff(trial_type) != 0)[0]
reward_times = np.where(reward > 0)[0] / fps / 60  # Convert to minutes

axes[1, 1].plot(time / 60, trial_type, 'k-', linewidth=1)
axes[1, 1].scatter(reward_times, np.ones_like(reward_times) * 0.5, 
                   color='green', s=20, marker='o', label='Rewards')
axes[1, 1].set_xlabel('Time (minutes)')
axes[1, 1].set_ylabel('Trial Type')
axes[1, 1].set_title('Task Structure')
axes[1, 1].set_ylim(-0.5, 1.5)
axes[1, 1].legend()

plt.tight_layout()
plt.show()

print("Data Quality Summary:")
print(f"  - Spatial coverage: {(len(np.unique(x_pos.astype(int))) * len(np.unique(y_pos.astype(int)))) / (50*50) * 100:.1f}% of arena")
print(f"  - Mean firing rate: {np.mean([len(np.where(exp.neurons[i].sp.data > 0)[0]) / (n_timepoints/fps) for i in range(10)]):.2f} Hz")
print(f"  - Number of trials: {len(trial_starts)}")
print(f"  - Reward rate: {len(reward_times) / len(trial_starts) * 100:.1f}%")

## Part 4: Running INTENSE with Real Data Considerations

When working with real data, consider:
1. Appropriate shuffle numbers
2. Feature selection
3. Computational resources

In [None]:
# Select features for analysis
# For spatial analysis
spatial_features = ['x_position', 'y_position', 'speed', 'head_direction']

# For task analysis  
task_features = ['trial_type', 'reward']

# Run INTENSE on spatial features
print("Analyzing spatial selectivity...")
stats_spatial, sig_spatial, info_spatial, results_spatial = driada.compute_cell_feat_significance(
    exp,
    cell_bunch=None,  # Analyze all neurons
    feat_bunch=spatial_features,
    metric='mi',
    mode='two_stage',
    n_shuffles_stage1=100,
    n_shuffles_stage2=2000,  # Increase for publication
    pval_thr=0.01,
    find_optimal_delays=True,
    shift_window=2,  # ±2 seconds for calcium dynamics
    parallelize=True,  # Use multiple cores
    n_jobs=-1,  # Use all available cores
    verbose=True
)

print("\nAnalyzing task selectivity...")
stats_task, sig_task, info_task, results_task = driada.compute_cell_feat_significance(
    exp,
    feat_bunch=task_features,
    metric='mi',
    mode='two_stage',
    n_shuffles_stage1=100,
    n_shuffles_stage2=2000,
    pval_thr=0.01,
    find_optimal_delays=True,
    shift_window=5,  # Longer window for task events
    verbose=True
)

## Part 5: Interpreting Results in Context

Let's analyze results with domain knowledge:

In [None]:
# Categorize neurons by selectivity type
spatial_neurons = exp.get_significant_neurons()

# Identify different cell types
place_cells = []  # Selective to x AND y
speed_cells = []  # Selective to speed
head_direction_cells = []  # Selective to head direction
task_cells = []  # Selective to trial type or reward
mixed_cells = []  # Selective to multiple categories

for neuron_id, features in spatial_neurons.items():
    spatial_sel = [f for f in features if f in spatial_features]
    task_sel = [f for f in features if f in task_features]
    
    if 'x_position' in features and 'y_position' in features:
        place_cells.append(neuron_id)
    if 'speed' in features:
        speed_cells.append(neuron_id)
    if 'head_direction' in features:
        head_direction_cells.append(neuron_id)
    if len(task_sel) > 0:
        task_cells.append(neuron_id)
    if len(spatial_sel) > 0 and len(task_sel) > 0:
        mixed_cells.append(neuron_id)

print("Cell Type Classification:")
print(f"  - Place cells: {len(place_cells)} ({len(place_cells)/exp.n_cells*100:.1f}%)")
print(f"  - Speed cells: {len(speed_cells)} ({len(speed_cells)/exp.n_cells*100:.1f}%)")
print(f"  - Head direction cells: {len(head_direction_cells)} ({len(head_direction_cells)/exp.n_cells*100:.1f}%)")
print(f"  - Task-modulated cells: {len(task_cells)} ({len(task_cells)/exp.n_cells*100:.1f}%)")
print(f"  - Mixed selectivity: {len(mixed_cells)} ({len(mixed_cells)/exp.n_cells*100:.1f}%)")

In [None]:
# Visualize place cell example
if place_cells:
    # Pick a place cell
    pc_id = place_cells[0]
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Trajectory with neural activity
    neural_activity = exp.neurons[pc_id].ca.data
    scatter = axes[0].scatter(x_pos[::10], y_pos[::10], 
                             c=neural_activity[::10], 
                             cmap='hot', s=2, alpha=0.7)
    axes[0].set_xlabel('X position (cm)')
    axes[0].set_ylabel('Y position (cm)')
    axes[0].set_title(f'Place Cell {pc_id} Activity')
    plt.colorbar(scatter, ax=axes[0], label='Activity')
    
    # Place field (2D histogram)
    occupancy, xedges, yedges = np.histogram2d(x_pos, y_pos, bins=20)
    activity_map, _, _ = np.histogram2d(x_pos, y_pos, bins=20, 
                                       weights=neural_activity)
    
    # Normalize by occupancy
    with np.errstate(divide='ignore', invalid='ignore'):
        place_field = activity_map / occupancy
        place_field[occupancy < 10] = np.nan  # Mask unvisited bins
    
    im = axes[1].imshow(place_field.T, origin='lower', 
                       extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]],
                       cmap='jet', interpolation='gaussian')
    axes[1].set_xlabel('X position (cm)')
    axes[1].set_ylabel('Y position (cm)')
    axes[1].set_title('Place Field')
    plt.colorbar(im, ax=axes[1], label='Firing Rate')
    
    # MI values for this neuron
    mi_values = []
    features_list = ['x_position', 'y_position', 'speed', 'head_direction']
    for feat in features_list:
        if feat in spatial_neurons[pc_id]:
            stats = exp.get_neuron_feature_pair_stats(pc_id, feat)
            mi_values.append(stats.get('me', 0))
        else:
            mi_values.append(0)
    
    axes[2].bar(range(len(features_list)), mi_values, 
               color=['green' if mi > 0 else 'gray' for mi in mi_values])
    axes[2].set_xticks(range(len(features_list)))
    axes[2].set_xticklabels(features_list, rotation=45, ha='right')
    axes[2].set_ylabel('Mutual Information')
    axes[2].set_title('Feature Selectivity')
    
    plt.tight_layout()
    plt.show()

## Part 6: Advanced Workflows

### Mixed Selectivity Analysis

When neurons respond to multiple correlated features:

In [None]:
# Check for neurons with mixed spatial/task selectivity
if mixed_cells:
    print(f"Analyzing mixed selectivity for {len(mixed_cells)} neurons...")
    
    # Example: Disentangle position vs reward selectivity
    mixed_id = mixed_cells[0]
    
    # Get the stats for each feature
    features_to_analyze = []
    mi_values = {}
    
    for feat in spatial_neurons[mixed_id]:
        stats = exp.get_neuron_feature_pair_stats(mixed_id, feat)
        mi_values[feat] = stats.get('me', 0)
        features_to_analyze.append(feat)
    
    print(f"\nNeuron {mixed_id} selectivity:")
    for feat, mi in sorted(mi_values.items(), key=lambda x: x[1], reverse=True):
        print(f"  - {feat}: MI = {mi:.3f}")
    
    # You can use disentanglement analysis here
    # See examples/mixed_selectivity.py for full implementation

## Part 7: Saving and Loading Results

For large datasets, save intermediate results:

In [None]:
# Save results to HDF5 format
def save_intense_results(filename, exp, stats, significance, info, results):
    """Save INTENSE results to HDF5 file."""
    with h5py.File(filename, 'w') as f:
        # Save metadata
        meta = f.create_group('metadata')
        meta.attrs['signature'] = exp.signature
        meta.attrs['n_cells'] = exp.n_cells
        meta.attrs['n_frames'] = exp.n_frames
        meta.attrs['fps'] = exp.fps
        
        # Save significance matrix
        f.create_dataset('significance', data=significance)
        
        # Save feature names
        feat_names = list(exp.dynamic_features.keys())
        f.create_dataset('feature_names', 
                        data=[n.encode() for n in feat_names])
        
        # Save detailed results
        if results:
            results_grp = f.create_group('results')
            for i, res in enumerate(results):
                res_grp = results_grp.create_group(f'pair_{i}')
                for key, value in res.items():
                    if value is not None:
                        try:
                            res_grp.attrs[key] = value
                        except:
                            pass  # Skip non-serializable values
    
    print(f"Results saved to {filename}")

# Example usage (uncomment to save)
# save_intense_results('intense_results.h5', exp, stats_spatial, 
#                     sig_spatial, info_spatial, results_spatial)

In [None]:
# Export summary for further analysis
def export_summary_table(exp):
    """Export a summary table of all significant relationships."""
    sig_neurons = exp.get_significant_neurons()
    
    summary_data = []
    for neuron_id, features in sig_neurons.items():
        for feat in features:
            stats = exp.get_neuron_feature_pair_stats(neuron_id, feat)
            
            summary_data.append({
                'neuron_id': neuron_id,
                'feature': feat,
                'mi': stats.get('me', np.nan),
                'pvalue': stats.get('pval', np.nan),
                'optimal_delay': stats.get('shift_used', 0),
                'effect_size': stats.get('me', 0) / stats.get('mean_bs', 1)
            })
    
    df = pd.DataFrame(summary_data)
    return df

# Create summary
summary_df = export_summary_table(exp)
if not summary_df.empty:
    print("Summary Statistics:")
    print(summary_df.groupby('feature')['mi'].describe())
    
    # Save to CSV (uncomment to save)
    # summary_df.to_csv('neuron_selectivity_summary.csv', index=False)

## Best Practices for Real Data

### 1. Data Preparation
- **Synchronization**: Ensure neural and behavioral data are perfectly aligned
- **Sampling rates**: Downsample if needed (typically 10-30 Hz for calcium)
- **Quality control**: Remove motion artifacts and dead neurons

### 2. Feature Engineering
- **Continuous features**: Consider smoothing noisy measurements
- **Discrete features**: Ensure sufficient samples per category
- **Derived features**: Create task-relevant variables (e.g., distance to goal)

### 3. Computational Considerations
- **Start small**: Test with subset of neurons first
- **Parallelize**: Use n_jobs=-1 for all cores
- **Save progress**: Export intermediate results
- **Memory**: ~8 bytes per neuron × timepoint × shuffle

### 4. Statistical Rigor
- **Shuffle numbers**: 10,000+ for publication-quality results
- **Multiple comparisons**: Built-in Holm-Bonferroni correction
- **Effect sizes**: Report both significance and effect magnitude

### 5. Validation
- **Positive controls**: Include known relationships
- **Negative controls**: Test with shuffled neuron identities
- **Cross-validation**: Split data temporally

## Troubleshooting Common Issues

**Issue: No significant results**
- Check behavioral variable coverage
- Increase recording duration
- Verify spike detection from calcium

**Issue: Too many significant results**
- Increase shuffle numbers
- Check for global artifacts
- Consider more stringent p-value threshold

**Issue: Slow computation**
- Downsample temporally
- Reduce delay search window
- Use two-stage testing
- Parallelize with n_jobs

## Next Steps

You now have the tools to:
1. Load and format your neural data
2. Run comprehensive INTENSE analysis
3. Interpret results in biological context
4. Export findings for publication

For more advanced analyses:
- See `examples/mixed_selectivity.py` for disentanglement
- Use `compute_feat_feat_significance` for behavioral relationships
- Explore `compute_cell_cell_significance` for neural correlations

Happy discovering! 🔬🧠