# Peak Detection Functions Guide

## 📖 Overview
This notebook provides a comprehensive guide to the peak detection functions implemented in `src/routes/peak_detection.py`. The system includes multiple peak detection algorithms, baseline correction methods, and data processing utilities for cyclic voltammetry analysis.

### 🎯 Key Components:
1. **Peak Detection Algorithms** - Multiple methods for detecting peaks
2. **Baseline Correction** - Advanced baseline detection and correction
3. **Data Processing** - File handling, normalization, and validation
4. **API Endpoints** - Flask routes for web interface integration
5. **Utility Functions** - Supporting functions for analysis

## 1. Peak Detection Algorithms 🔍

The system implements 7 different peak detection methods:

In [1]:
# Peak Detection Methods Available
peak_detection_methods = {
    'prominence': 'Traditional prominence-based detection with enhanced baseline',
    'derivative': 'Derivative-based method using zero crossings',
    'ml': 'Machine Learning enhanced detection (DeepCV)',
    'enhanced_v3': 'Enhanced method version 3',
    'enhanced_v4': 'Enhanced method version 4', 
    'enhanced_v4_improved': 'Improved version 4 with better filtering',
    'enhanced_v5': 'Latest enhanced method version 5'
}

for method, description in peak_detection_methods.items():
    print(f"• {method}: {description}")

• prominence: Traditional prominence-based detection with enhanced baseline
• derivative: Derivative-based method using zero crossings
• ml: Machine Learning enhanced detection (DeepCV)
• enhanced_v3: Enhanced method version 3
• enhanced_v4: Enhanced method version 4
• enhanced_v4_improved: Improved version 4 with better filtering
• enhanced_v5: Latest enhanced method version 5


### 1.1 Main Detection Function: `detect_cv_peaks()`

```python
def detect_cv_peaks(voltage, current, method='prominence'):
    """
    Main dispatcher function for peak detection
    
    Parameters:
    - voltage: numpy array of voltage values
    - current: numpy array of current values
    - method: string specifying detection method
    
    Returns:
    - Dictionary with peaks list and metadata
    """
```

### 1.2 Prominence Method: `detect_peaks_prominence()`

**Most commonly used method** combining traditional peak detection with advanced baseline correction.

In [2]:
# Prominence Method Workflow
prominence_workflow = [
    "1. Load configuration settings (prominence threshold, peak width)",
    "2. Apply Enhanced Baseline Detector v2.1 or Voltage Window Detector v4", 
    "3. Normalize current data for peak detection",
    "4. Find positive peaks (oxidation candidates)",
    "5. Find negative peaks (reduction candidates)", 
    "6. Validate peaks using electrochemical rules",
    "7. Calculate peak characteristics (height, baseline current)",
    "8. Return formatted results with baseline data"
]

for i, step in enumerate(prominence_workflow, 1):
    print(f"{step}")

1. Load configuration settings (prominence threshold, peak width)
2. Apply Enhanced Baseline Detector v2.1 or Voltage Window Detector v4
3. Normalize current data for peak detection
4. Find positive peaks (oxidation candidates)
5. Find negative peaks (reduction candidates)
6. Validate peaks using electrochemical rules
7. Calculate peak characteristics (height, baseline current)
8. Return formatted results with baseline data


#### Key Features of Prominence Method:

- **Voltage Zone Validation**: Ensures peaks are in appropriate voltage ranges
- **Current Direction Validation**: Oxidation peaks must have positive current, reduction peaks negative
- **Peak Size Validation**: Filters out peaks below minimum height threshold
- **Enhanced Baseline Correction**: Uses advanced algorithms for accurate baseline estimation

### 1.3 Derivative Method: `detect_peaks_derivative()`

Uses mathematical derivatives to identify peak positions.

In [3]:
# Derivative Method Process
derivative_process = {
    'smoothing': 'Apply Savitzky-Golay filter to reduce noise',
    'first_derivative': 'Calculate di/dv (slope)',
    'second_derivative': 'Calculate d²i/dv² (curvature)',
    'zero_crossings': 'Find zero crossings in second derivative',
    'filtering': 'Apply significance filters (height, prominence)',
    'fallback': 'Use scipy.find_peaks if no peaks found'
}

print("Derivative Method Steps:")
for step, description in derivative_process.items():
    print(f"• {step.replace('_', ' ').title()}: {description}")

Derivative Method Steps:
• Smoothing: Apply Savitzky-Golay filter to reduce noise
• First Derivative: Calculate di/dv (slope)
• Second Derivative: Calculate d²i/dv² (curvature)
• Zero Crossings: Find zero crossings in second derivative
• Filtering: Apply significance filters (height, prominence)
• Fallback: Use scipy.find_peaks if no peaks found


### 1.4 Machine Learning Method: `detect_peaks_ml()`

**DeepCV Implementation** - Combines traditional methods with ML enhancement.

In [4]:
# ML Method Architecture
ml_architecture = {
    'base_detection': 'Start with prominence method for baseline peaks',
    'feature_extraction': 'Extract 8 key features per peak',
    'ml_enhancement': 'Apply neural network classification',
    'confidence_scoring': 'Calculate ML-based confidence scores',
    'validation': 'Apply electrochemical validation rules'
}

print("ML Method (DeepCV) Components:")
for component, description in ml_architecture.items():
    print(f"• {component.replace('_', ' ').title()}: {description}")

ML Method (DeepCV) Components:
• Base Detection: Start with prominence method for baseline peaks
• Feature Extraction: Extract 8 key features per peak
• Ml Enhancement: Apply neural network classification
• Confidence Scoring: Calculate ML-based confidence scores
• Validation: Apply electrochemical validation rules


#### ML Features Extracted:
```python
features = {
    'peak_width': 'Full Width at Half Maximum (FWHM)',
    'peak_symmetry': 'Left-right symmetry ratio', 
    'local_slope': 'Average slope in peak region',
    'noise_level': 'Local noise estimation',
    'baseline_quality': 'Quality of baseline fit',
    'position_score': 'Electrochemical position validity',
    'shape_factor': 'Peak shape characteristics',
    'snr_estimate': 'Signal-to-noise ratio'
}
```

## 2. Baseline Detection & Correction 📈

Advanced baseline correction is crucial for accurate peak detection.

### 2.1 Voltage Window Baseline Detector v4

**Primary Method** - Uses voltage windows to find stable baseline segments.

In [5]:
# Voltage Window Method Parameters
voltage_window_params = {
    'voltage_windows': [0.010, 0.020, 0.030, 0.050],  # 10-50 mV windows
    'r2_threshold': 0.95,  # Minimum R² for linear segments
    'min_length': 5,       # Minimum segment length
    'max_iterations': 1000, # Computation limit
    'skip_regions': 'Skip first 10% and last 5% (steep regions)'
}

print("Voltage Window Baseline Detection Parameters:")
for param, value in voltage_window_params.items():
    print(f"• {param}: {value}")

Voltage Window Baseline Detection Parameters:
• voltage_windows: [0.01, 0.02, 0.03, 0.05]
• r2_threshold: 0.95
• min_length: 5
• max_iterations: 1000
• skip_regions: Skip first 10% and last 5% (steep regions)


### 2.2 Enhanced Baseline Detector v2.1

**Fallback Method** - Traditional approach with improvements.

In [6]:
# Enhanced Baseline Workflow
enhanced_baseline_steps = [
    "1. Detect linear segments using detect_linear_segments()",
    "2. Find peak regions for baseline avoidance", 
    "3. Select optimal segments for forward/reverse scans",
    "4. Score segments based on position and quality",
    "5. Generate baseline arrays for each scan direction",
    "6. Validate baseline quality and consistency"
]

for step in enhanced_baseline_steps:
    print(step)

1. Detect linear segments using detect_linear_segments()
2. Find peak regions for baseline avoidance
3. Select optimal segments for forward/reverse scans
4. Score segments based on position and quality
5. Generate baseline arrays for each scan direction
6. Validate baseline quality and consistency


### 2.3 Critical Baseline Functions

#### `detect_linear_segments()`
Finds all potential baseline segments using voltage windows.

#### `detect_improved_baseline_2step()`
Two-step process: find segments, then select best ones for each scan direction.

#### Segment Scoring
```python
def score_baseline_segment(segment, scan_direction):
    score = segment['r2'] * 100  # Base R² score
    score += stability_score      # Low std deviation bonus
    score += position_score       # Correct position for scan direction
    score += slope_penalty        # Penalty for steep slopes
    return score
```

## 3. Data Processing Functions 🔧

### 3.1 File Loading Functions

#### `load_csv_file(file_path)`
- Handles multiple CSV formats (Palmsens, PiPot, STM32)
- Automatic header detection
- Unit conversion (mA, µA, nA to µA)
- Data validation and error handling

In [7]:
# Supported File Formats
file_formats = {
    'palmsens': {
        'headers': ['V', 'A'] or ['voltage', 'current'],
        'units': 'Usually Amperes (A)',
        'format': 'Standard CSV with headers'
    },
    'pipot': {
        'headers': ['FileName:', 'V', 'uA'],
        'units': 'microAmps (µA)', 
        'format': 'Instrument format with filename header'
    },
    'stm32': {
        'headers': ['voltage', 'current'],
        'units': 'Various (auto-detected)',
        'format': 'Standard CSV'
    }
}

print("Supported Instrument File Formats:")
for instrument, details in file_formats.items():
    print(f"\n{instrument.upper()}:")
    for key, value in details.items():
        print(f"  • {key}: {value}")

Supported Instrument File Formats:

PALMSENS:
  • headers: ['V', 'A']
  • units: Usually Amperes (A)
  • format: Standard CSV with headers

PIPOT:
  • headers: ['FileName:', 'V', 'uA']
  • units: microAmps (µA)
  • format: Instrument format with filename header

STM32:
  • headers: ['voltage', 'current']
  • units: Various (auto-detected)
  • format: Standard CSV


### 3.2 Sample Information Extraction

#### `extract_sample_info_from_filename(filename)`
Extracts metadata from filename patterns.

In [8]:
# Filename Pattern Examples
filename_patterns = {
    'concentration': {
        '5.0mM': 'Decimal format (5.0mm)',
        '5_0mM': 'Underscore format (5_0mm)', 
        '0_5mM': 'Decimal underscore (0_5mm for 0.5mM)',
        '5mM': 'Simple integer (5mm)'
    },
    'scan_rate': {
        '100mVpS': 'Standard format (100mVpS)',
        '100mvps': 'Lowercase variant',
        '100_mV_s': 'Underscore separated'
    },
    'instrument': {
        'palmsens_': 'Palmsens files',
        'pipot_ferro_': 'PiPot Ferro files',
        'stm32_': 'STM32 files'
    }
}

print("Filename Pattern Recognition:")
for category, patterns in filename_patterns.items():
    print(f"\n{category.title()}:")
    for pattern, description in patterns.items():
        print(f"  • {pattern}: {description}")

Filename Pattern Recognition:

Concentration:
  • 5.0mM: Decimal format (5.0mm)
  • 5_0mM: Underscore format (5_0mm)
  • 0_5mM: Decimal underscore (0_5mm for 0.5mM)
  • 5mM: Simple integer (5mm)

Scan_Rate:
  • 100mVpS: Standard format (100mVpS)
  • 100mvps: Lowercase variant
  • 100_mV_s: Underscore separated

Instrument:
  • palmsens_: Palmsens files
  • pipot_ferro_: PiPot Ferro files
  • stm32_: STM32 files


### 3.3 Analysis Session Management

#### In-Memory Storage
```python
analysis_sessions = {}  # Global session storage

def create_analysis_session():
    session_id = str(uuid.uuid4())
    analysis_sessions[session_id] = {
        'peaks': peak_data,
        'data': trace_data, 
        'method': detection_method,
        'created_at': timestamp
    }
```

## 4. API Endpoints 🌐

Flask routes providing web interface integration.

In [9]:
# API Endpoints Overview
api_endpoints = {
    'GET /api/get_saved_files': 'List all saved CSV files',
    'GET /api/load_saved_file_by_name/<filename>': 'Load specific CSV file',
    'POST /get-peaks/<method>': 'Detect peaks using specified method',
    'GET /get-progress': 'Get peak detection progress',
    'GET /get-settings': 'Get current detection settings',
    'POST /update-settings': 'Update detection parameters',
    'POST /create_analysis_session': 'Create new analysis session',
    'GET /peak_analysis/<session_id>': 'Render analysis results page'
}

print("API Endpoints:")
for endpoint, description in api_endpoints.items():
    method = endpoint.split(' ')[0]
    route = endpoint.split(' ')[1]
    print(f"• {method:4} {route:35} - {description}")

API Endpoints:
• GET  /api/get_saved_files                - List all saved CSV files
• GET  /api/load_saved_file_by_name/<filename> - Load specific CSV file
• POST /get-peaks/<method>                 - Detect peaks using specified method
• GET  /get-progress                       - Get peak detection progress
• GET  /get-settings                       - Get current detection settings
• POST /update-settings                    - Update detection parameters
• POST /create_analysis_session            - Create new analysis session
• GET  /peak_analysis/<session_id>         - Render analysis results page


### 4.1 Main Peak Detection Endpoint

#### `POST /get-peaks/<method>`
**Primary endpoint** for peak detection analysis.

In [10]:
# Peak Detection Endpoint Workflow
endpoint_workflow = {
    'input_validation': 'Validate method and request data',
    'progress_init': 'Initialize progress tracking',
    'data_processing': 'Handle single/multi-trace data',
    'peak_detection': 'Call appropriate detection method',
    'result_formatting': 'Format results for JSON response',
    'logging': 'Optional save to parameter log',
    'progress_complete': 'Mark detection as complete'
}

print("Peak Detection Endpoint Process:")
for step, description in endpoint_workflow.items():
    print(f"• {step.replace('_', ' ').title()}: {description}")

Peak Detection Endpoint Process:
• Input Validation: Validate method and request data
• Progress Init: Initialize progress tracking
• Data Processing: Handle single/multi-trace data
• Peak Detection: Call appropriate detection method
• Result Formatting: Format results for JSON response
• Logging: Optional save to parameter log
• Progress Complete: Mark detection as complete


### 4.2 Progress Tracking

Real-time progress monitoring for long-running analyses.

In [11]:
# Progress Tracking Structure
progress_structure = {
    'active': 'Boolean - whether detection is running',
    'current_file': 'Current file number being processed',
    'total_files': 'Total number of files to process',
    'percent': 'Completion percentage (0-100)',
    'message': 'Current status message',
    'start_time': 'Processing start timestamp',
    'elapsed_time': 'Time elapsed since start'
}

print("Progress Tracking Fields:")
for field, description in progress_structure.items():
    print(f"• {field}: {description}")

Progress Tracking Fields:
• active: Boolean - whether detection is running
• current_file: Current file number being processed
• total_files: Total number of files to process
• percent: Completion percentage (0-100)
• message: Current status message
• start_time: Processing start timestamp
• elapsed_time: Time elapsed since start


## 5. Utility Functions ⚙️

### 5.1 Data Validation

#### `ensure_json_serializable(data)`
Ensures all data can be serialized to JSON (converts numpy types, handles NaN/Inf).

#### Peak Validation Rules
```python
def validate_peak_pre_detection(voltage_val, current_val, peak_type):
    # Voltage zone validation
    if peak_type == 'oxidation':
        if not (OX_VOLTAGE_MIN <= voltage_val <= OX_VOLTAGE_MAX):
            return False
    
    # Current direction validation  
    if peak_type == 'oxidation' and current_val < 0:
        return False
        
    # Peak size validation
    if abs(current_val) < MIN_PEAK_HEIGHT:
        return False
        
    return True
```

### 5.2 Parameter Logging

#### `save_analysis_to_log(voltage, current, peaks, metadata)`
Saves analysis results to database for later review.

In [12]:
# Logged Data Structure
logged_data = {
    'measurement_data': {
        'sample_id': 'Unique sample identifier',
        'instrument_type': 'palmsens/pipot/stm32',
        'timestamp': 'Analysis timestamp',
        'scan_rate': 'mV/s scan rate',
        'voltage_range': 'Min/max voltage values',
        'data_points': 'Number of data points',
        'user_notes': 'Optional user annotations'
    },
    'peak_data': {
        'type': 'oxidation/reduction',
        'voltage': 'Peak potential',
        'current': 'Peak current', 
        'height': 'Peak height from baseline',
        'baseline_info': 'Baseline calculation details',
        'enabled': 'User selection status'
    }
}

print("Database Logging Structure:")
for category, fields in logged_data.items():
    print(f"\n{category.replace('_', ' ').title()}:")
    for field, description in fields.items():
        print(f"  • {field}: {description}")

Database Logging Structure:

Measurement Data:
  • sample_id: Unique sample identifier
  • instrument_type: palmsens/pipot/stm32
  • timestamp: Analysis timestamp
  • scan_rate: mV/s scan rate
  • voltage_range: Min/max voltage values
  • data_points: Number of data points
  • user_notes: Optional user annotations

Peak Data:
  • type: oxidation/reduction
  • voltage: Peak potential
  • current: Peak current
  • height: Peak height from baseline
  • baseline_info: Baseline calculation details
  • enabled: User selection status


## 6. Configuration & Settings ⚙️

In [13]:
# Default Configuration Settings
default_settings = {
    'PEAK_PROMINENCE': 0.1,    # Minimum peak prominence (normalized)
    'PEAK_WIDTH': 5,           # Minimum peak width in data points
    'MIN_PEAK_HEIGHT': 1.0,    # Minimum peak height (µA)
    'OX_VOLTAGE_RANGE': (-0.3, 0.8),   # Oxidation voltage range (V)
    'RED_VOLTAGE_RANGE': (-0.8, 0.4),  # Reduction voltage range (V)
    'R2_THRESHOLD': 0.95,      # Minimum R² for baseline segments
    'MAX_ITERATIONS': 1000     # Maximum iterations for baseline detection
}

print("Default Configuration Settings:")
for setting, value in default_settings.items():
    print(f"• {setting}: {value}")

Default Configuration Settings:
• PEAK_PROMINENCE: 0.1
• PEAK_WIDTH: 5
• MIN_PEAK_HEIGHT: 1.0
• OX_VOLTAGE_RANGE: (-0.3, 0.8)
• RED_VOLTAGE_RANGE: (-0.8, 0.4)
• R2_THRESHOLD: 0.95
• MAX_ITERATIONS: 1000


## 7. Error Handling & Fallbacks 🛡️

In [14]:
# Error Handling Strategy
error_handling = {
    'scipy_fallback': 'If SciPy not available, use simple implementations',
    'baseline_fallback': 'Multiple baseline methods with fallback chain',
    'peak_fallback': 'Simple peak detection if advanced methods fail',
    'data_validation': 'Check for NaN/Inf values and handle gracefully',
    'file_format': 'Flexible parsing for different instrument formats',
    'memory_management': 'Chunk processing for large datasets',
    'timeout_protection': 'Maximum iteration limits to prevent hangs'
}

print("Error Handling & Fallback Mechanisms:")
for mechanism, description in error_handling.items():
    print(f"• {mechanism.replace('_', ' ').title()}: {description}")

Error Handling & Fallback Mechanisms:
• Scipy Fallback: If SciPy not available, use simple implementations
• Baseline Fallback: Multiple baseline methods with fallback chain
• Peak Fallback: Simple peak detection if advanced methods fail
• Data Validation: Check for NaN/Inf values and handle gracefully
• File Format: Flexible parsing for different instrument formats
• Memory Management: Chunk processing for large datasets
• Timeout Protection: Maximum iteration limits to prevent hangs


## 8. Performance Considerations 🚀

In [15]:
# Performance Optimizations
optimizations = {
    'vectorized_operations': 'Use NumPy vectorized operations where possible',
    'early_termination': 'Stop processing when sufficient quality reached',
    'adaptive_windowing': 'Adjust window sizes based on data characteristics',
    'memory_efficiency': 'Avoid unnecessary data copies',
    'caching': 'Cache computed baselines and intermediate results',
    'parallel_processing': 'Process multiple files concurrently',
    'progress_tracking': 'Provide user feedback for long operations'
}

print("Performance Optimization Strategies:")
for strategy, description in optimizations.items():
    print(f"• {strategy.replace('_', ' ').title()}: {description}")

Performance Optimization Strategies:
• Vectorized Operations: Use NumPy vectorized operations where possible
• Early Termination: Stop processing when sufficient quality reached
• Adaptive Windowing: Adjust window sizes based on data characteristics
• Memory Efficiency: Avoid unnecessary data copies
• Caching: Cache computed baselines and intermediate results
• Parallel Processing: Process multiple files concurrently
• Progress Tracking: Provide user feedback for long operations


## 9. Usage Examples 💡

In [16]:
# Example: Basic Peak Detection
import numpy as np

# Simulated CV data
voltage = np.linspace(-0.5, 0.5, 1000)
current = np.random.normal(0, 0.1, 1000)  # Base noise

# Add synthetic peaks
# Oxidation peak at +0.2V
ox_peak = np.exp(-((voltage - 0.2) / 0.05)**2) * 5.0
# Reduction peak at -0.2V  
red_peak = -np.exp(-((voltage + 0.2) / 0.05)**2) * 3.0

current += ox_peak + red_peak

print("Example CV data created:")
print(f"• Voltage range: {voltage.min():.2f} to {voltage.max():.2f} V")
print(f"• Current range: {current.min():.2f} to {current.max():.2f} µA")
print(f"• Data points: {len(voltage)}")

# This data could be passed to detect_cv_peaks(voltage, current, method='prominence')

Example CV data created:
• Voltage range: -0.50 to 0.50 V
• Current range: -3.17 to 5.10 µA
• Data points: 1000


## 10. Troubleshooting Guide 🔧

In [17]:
# Common Issues and Solutions
troubleshooting = {
    'no_peaks_detected': {
        'causes': ['Prominence threshold too high', 'Poor baseline correction', 'Noisy data'],
        'solutions': ['Lower prominence threshold', 'Try different baseline method', 'Apply smoothing']
    },
    'false_positives': {
        'causes': ['Prominence threshold too low', 'Baseline artifacts', 'Noise spikes'],
        'solutions': ['Increase prominence threshold', 'Improve baseline correction', 'Filter noise']
    },
    'baseline_errors': {
        'causes': ['Insufficient linear segments', 'Peak interference', 'Data quality issues'],
        'solutions': ['Adjust R² threshold', 'Use peak avoidance', 'Check data quality']
    },
    'processing_slow': {
        'causes': ['Large datasets', 'Complex baseline detection', 'Multiple files'],
        'solutions': ['Reduce max iterations', 'Use simpler methods', 'Process in chunks']
    }
}

print("Troubleshooting Guide:")
for issue, details in troubleshooting.items():
    print(f"\n{issue.replace('_', ' ').title()}:")
    print(f"  Causes: {', '.join(details['causes'])}")
    print(f"  Solutions: {', '.join(details['solutions'])}")

Troubleshooting Guide:

No Peaks Detected:
  Causes: Prominence threshold too high, Poor baseline correction, Noisy data
  Solutions: Lower prominence threshold, Try different baseline method, Apply smoothing

False Positives:
  Causes: Prominence threshold too low, Baseline artifacts, Noise spikes
  Solutions: Increase prominence threshold, Improve baseline correction, Filter noise

Baseline Errors:
  Causes: Insufficient linear segments, Peak interference, Data quality issues
  Solutions: Adjust R² threshold, Use peak avoidance, Check data quality

Processing Slow:
  Causes: Large datasets, Complex baseline detection, Multiple files
  Solutions: Reduce max iterations, Use simpler methods, Process in chunks


## 📚 Summary

The `peak_detection.py` module provides a comprehensive system for cyclic voltammetry analysis with:

✅ **7 different peak detection algorithms**  
✅ **Advanced baseline correction methods**  
✅ **Robust error handling and fallbacks**  
✅ **Web API integration**  
✅ **Multi-format file support**  
✅ **Real-time progress tracking**  
✅ **Parameter logging and session management**  

### Key Strengths:
- **Flexibility**: Multiple algorithms for different use cases
- **Reliability**: Extensive error handling and validation
- **Performance**: Optimized for real-time analysis
- **Integration**: Seamless web interface support

### Recommended Usage:
1. **Start with prominence method** for most applications
2. **Use ML method** for enhanced accuracy when trained
3. **Try derivative method** for noisy or complex data
4. **Adjust settings** based on your specific requirements