<a href="https://colab.research.google.com/github/fjadidi2001/AD_Prediction/blob/main/Alz_voice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Complete ADReSSo Multi-Modal Analysis Pipeline Steps

## Project Overview
This project implements a comprehensive multi-modal machine learning pipeline for Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSSo). The system combines audio processing, text analysis, and advanced deep learning architectures to classify speech samples as either cognitively normal (CN) or showing signs of Alzheimer's disease (AD).

## Pipeline Architecture Components

### Core Models Used:
- **Graph-based Attention Module**: For semantic relationship modeling
- **Vision Transformer (ViT)**: For spectrogram analysis
- **U-Net**: For audio feature processing
- **AlexNet**: For additional feature extraction
- **BERT**: For text processing and linguistic analysis
- **Wav2Vec2**: For audio feature extraction

---

## Step-by-Step Pipeline Process

### Step 0: Environment Setup and Data Preparation
**Purpose**: Initialize the analysis environment and prepare the dataset

**Actions**:
- Mount Google Drive
- Install required packages: `librosa`, `soundfile`, `opensmile`, `speechbrain`, `transformers`, `torch`, `openai-whisper`, `pandas`, `numpy`, `matplotlib`, `seaborn`, `torch-geometric`
- Set up base directory structure
- Initialize output directories for results

**Key Files**:
- Audio files organized by categories (diagnosis_ad, diagnosis_cn, progression_decline, progression_no_decline)
- Configuration files and model checkpoints

### Step 1: Audio File Discovery and Organization
**Purpose**: Scan and categorize all available audio files

**Actions**:
- Recursively search for audio files (.wav, .mp3, .m4a, .flac)
- Categorize files based on directory structure:
  - `diagnosis_ad/`: Alzheimer's disease diagnosis files
  - `diagnosis_cn/`: Cognitively normal diagnosis files
  - `progression_decline/`: Disease progression (decline) files
  - `progression_no_decline/`: Disease progression (stable) files
- Generate file inventory and statistics

**Output**: Dictionary of categorized audio file paths

### Step 2: Audio Processing and Feature Extraction
**Purpose**: Extract comprehensive acoustic features from audio files

**Feature Types Extracted**:
- **Wav2Vec2 Features**: Deep learning-based audio representations
- **Mel-frequency Cepstral Coefficients (MFCCs)**: Traditional audio features
- **Spectral Features**: Spectral centroid, bandwidth, rolloff
- **Prosodic Features**: Pitch, energy, rhythm patterns
- **OpenSMILE Features**: Comprehensive acoustic feature set

**Processing Steps**:
- Load and preprocess audio files
- Extract multi-dimensional feature vectors
- Apply feature normalization and scaling
- Generate mel-spectrograms for visual analysis

### Step 3: Speech-to-Text Conversion
**Purpose**: Convert audio to text for linguistic analysis

**Tools Used**:
- **OpenAI Whisper**: For high-quality speech transcription
- **Alternative ASR systems**: Fallback options for different audio qualities

**Processing**:
- Transcribe all audio files to text
- Handle different audio qualities and accents
- Store transcriptions with confidence scores
- Generate metadata for each transcription

### Step 4: Linguistic Feature Analysis
**Purpose**: Extract detailed linguistic and semantic features from transcribed text

**Feature Categories**:
- **Semantic Features**: Word embeddings, semantic density
- **Syntactic Features**: Part-of-speech patterns, sentence structure
- **Lexical Features**: Vocabulary diversity, word frequency
- **Discourse Features**: Coherence, topic transitions
- **Fluency Measures**: Pause patterns, disfluencies

**Processing**:
- Use BERT for semantic embeddings
- Apply NLP tools for syntactic analysis
- Calculate linguistic complexity metrics
- Extract discourse markers and patterns

### Step 5: Comprehensive Analysis and Visualization
**Purpose**: Analyze extracted features and generate comprehensive reports

**Analysis Types**:
- **Statistical Analysis**: Feature distributions, correlations
- **Visualization**: Feature plots, spectrograms, linguistic patterns
- **Comparative Analysis**: AD vs CN differences
- **Quality Assessment**: Data quality metrics

**Outputs**:
- Feature correlation matrices
- Statistical summary reports
- Visualization plots and charts
- Data quality assessments

### Step 6: Multi-Modal Model Architecture Definition
**Purpose**: Define the complex neural network architecture for classification

**Architecture Components**:
```
MultiModalADReSSoModel:
├── Graph Attention Module
│   ├── Semantic graph construction
│   └── Graph attention networks
├── Vision Transformer Module
│   ├── Spectrogram patch embedding
│   └── Transformer encoder layers
├── U-Net Module
│   ├── Audio signal processing
│   └── Feature extraction layers
├── AlexNet Module
│   ├── Convolutional feature extraction
│   └── Classification layers
├── BERT Module
│   ├── Text embedding
│   └── Linguistic feature extraction
└── Fusion Layer
    ├── Multi-modal feature fusion
    └── Final classification
```

**Key Parameters**:
- Audio feature dimension: 768 (Wav2Vec2)
- Text feature dimension: 768 (BERT)
- Spectrogram height: 80 (Mel bins)
- Number of classes: 2 (AD vs CN)

### Step 7: Model Training
**Purpose**: Train the multi-modal model on the processed dataset

**Training Configuration**:
- **Batch Size**: 8 (adjustable based on GPU memory)
- **Epochs**: 30 (with early stopping)
- **Learning Rate**: Adaptive with scheduler
- **Optimization**: Adam optimizer
- **Loss Function**: Cross-entropy loss

**Data Splitting**:
- Training: 60% of data
- Validation: 20% of data
- Testing: 20% of data
- Stratified splitting to maintain class balance

**Training Process**:
- Initialize model with random weights
- Create data loaders for each split
- Implement training loop with validation
- Save best model based on validation performance
- Monitor training metrics and convergence

### Step 8: Model Evaluation and Semantic Analysis
**Purpose**: Evaluate model performance and analyze semantic relationships

**Evaluation Metrics**:
- **Accuracy**: Overall classification accuracy
- **Precision**: True positive rate
- **Recall**: Sensitivity
- **F1-Score**: Harmonic mean of precision and recall
- **ROC AUC**: Area under ROC curve

**Semantic Analysis**:
- Visualize semantic relationships between audio and text features
- Generate semantic graphs showing feature correlations
- Analyze modality contributions to predictions
- Create interpretability visualizations

**Analysis Outputs**:
- Confusion matrices
- ROC curves
- Feature importance plots
- Semantic relationship graphs
- Detailed classification reports

### Step 9: Checkpointing and Incremental Processing
**Purpose**: Implement robust checkpointing system for large-scale processing

**Checkpointing Features**:
- **Incremental Processing**: Resume from last checkpoint
- **Individual Feature Saving**: Save features for each file separately
- **Progress Tracking**: Monitor processing status
- **Error Recovery**: Handle processing failures gracefully

**Checkpoint Structure**:
```
checkpoints/
├── checkpoint.pkl (main checkpoint file)
├── features/ (individual feature files)
│   ├── diagnosis_ad_file1_features.pkl
│   ├── diagnosis_cn_file1_features.pkl
│   └── ...
└── logs/ (processing logs)
```

---

## Advanced Features

### Semantic Graph Visualization
- Create networkx graphs showing relationships between audio and text features
- Visualize cosine similarity between modalities
- Generate interactive relationship plots
- Analyze semantic coherence between speech and content

### Feature Importance Analysis
- Calculate contribution of each modality to final predictions
- Analyze which features are most discriminative
- Generate feature importance rankings
- Create modality-specific performance metrics

### Comprehensive Reporting
- Generate detailed evaluation reports
- Create performance summaries by category
- Analyze misclassification patterns
- Provide confidence-based analysis

---

## Usage Instructions

### Basic Usage:
```python
# Initialize the extended analyzer
ExtendedAnalyzer = extend_analyzer_with_model()
analyzer = ExtendedAnalyzer(base_path="/path/to/ADReSSo21")

# Create checkpointer
checkpointer = FeatureExtractionCheckpointer(analyzer)

# Run complete pipeline
results = checkpointer.run_pipeline_with_checkpoints(
    num_epochs=30,
    batch_size=8
)
```

### Advanced Configuration:
```python
# Custom training parameters
results = checkpointer.run_pipeline_with_checkpoints(
    num_epochs=50,
    batch_size=4  # Reduce for limited GPU memory
)

# Individual steps
analyzer.step_6_define_model_architecture()
analyzer.step_7_train_model(features_dict, linguistic_features)
analyzer.step_8_evaluate_model(visualize_graphs=True)
```

---

## Output Files and Results

### Generated Files:
- `detailed_evaluation_results.csv`: Per-sample predictions and confidence scores
- `evaluation_summary.json`: Overall performance metrics
- `semantic_graph_*.png`: Semantic relationship visualizations
- `best_adresso_model.pth`: Trained model weights
- `checkpoint.pkl`: Processing checkpoint data
- Individual feature files for each audio sample

### Key Metrics Tracked:
- Overall classification accuracy
- Per-category performance (AD, CN, decline, stable)
- Confidence distributions
- Misclassification analysis
- Feature importance by modality
- Semantic relationship strengths

This comprehensive pipeline provides a complete solution for Alzheimer's dementia recognition through multi-modal analysis of spontaneous speech, combining state-of-the-art deep learning techniques with robust feature extraction and evaluation methodologies.