<a href="https://colab.research.google.com/github/fjadidi2001/AD_Prediction/blob/main/Speech_only.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Step 1: Setting Up the Environment in Google Colab

In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [3]:
# Install required libraries
!pip install opensmile speechbrain librosa



In [4]:
# Extract .tgz files
import tarfile
import os

data_path = '/content/drive/MyDrive/Voice/'  # Adjust to your dataset path
tgz_files = [
    'ADReSSo21-progression-train.tgz',
    'ADReSSo21-progression-test.tgz',
    'ADReSSo21-diagnosis-train.tgz'
]

for tgz in tgz_files:
    tar = tarfile.open(os.path.join(data_path, tgz), 'r:gz')
    tar.extractall('/content/data')
    tar.close()

# Step 2: Loading and Preprocessing Datasets

## Load Label Files

In [5]:
import pandas as pd

# Load label files
task1 = pd.read_csv(os.path.join(data_path, 'task1.csv'))  # AD classification labels
task2 = pd.read_csv(os.path.join(data_path, 'task2.csv'))  # MMSE regression labels
task3 = pd.read_csv(os.path.join(data_path, 'task3.csv'))  # Cognitive decline labels

In [6]:
task1.head()

Unnamed: 0,ID,Dx
0,adrsdt15,Control
1,adrsdt40,Control
2,adrsdt26,Control
3,adrsdt67,Control
4,adrsdt58,Control


- task1.csv: ID and AD/Control labels for diagnosis (AD classification).
- task2.csv: ID and MMSE scores for regression.
- task3.csv: ID and Decline/Non-Decline labels for cognitive decline prediction.

## Map Audio Files to Labels

In [7]:
# Function to map audio files to labels
def load_audio_label_pairs(audio_dir, label_df, label_col, id_col='ID'):
    audio_files = []
    labels = []
    for _, row in label_df.iterrows():
        audio_path = os.path.join(audio_dir, f"{row[id_col]}.wav")
        if os.path.exists(audio_path):
            audio_files.append(audio_path)
            labels.append(row[label_col])
    return audio_files, labels

# AD Classification (Cookie Theft, AD vs CN)
ad_audio_dir = '/content/data/ADReSSo21/diagnosis/train/audio'
ad_audio_files, ad_labels = load_audio_label_pairs(
    ad_audio_dir, task1, 'Dx'
)

# MMSE Regression (same audio files as AD classification)
mmse_audio_files, mmse_labels = load_audio_label_pairs(
    ad_audio_dir, task2, 'MMSE'
)

# Cognitive Decline (Category Fluency Task)
prog_audio_dir = '/content/data/ADReSSo21/progression/train/audio'
decline_audio_files, decline_labels = load_audio_label_pairs(
    prog_audio_dir, task3, 'Decline'
)

- For AD classification, use ADReSSo21-diagnosis-train.tgz audio files and task1.csv.
- For MMSE regression, use the same audio files as AD classification, paired with task2.csv.
- For cognitive decline, use ADReSSo21-progression-train.tgz audio files and task3.csv
- Ensure audio files match the IDs in the CSV files. If some IDs are missing audio, you may need to filter them out.
- The segmentation directories contain CSV files with transcriptions or timings, which can be used for linguistic feature extraction.

# Step 3: Feature Extraction

You need to extract acoustic and linguistic features as specified.

Acoustic Features
- eGeMAPS: Use opensmile to extract eGeMAPS features.
- Active Data Representation (ADR): This may require a custom implementation or pre-trained model.

In [8]:
import opensmile
import librosa
import numpy as np

# Initialize opensmile for eGeMAPS
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.eGeMAPSv02,
    feature_level=opensmile.FeatureLevel.Functionals
)

# Function to extract eGeMAPS features from audio
def extract_egemaps(audio_files):
    features = []
    for audio in audio_files:
        y, sr = librosa.load(audio, sr=16000)  # Load audio
        egemaps = smile.process_signal(y, sr)
        features.append(egemaps.values.flatten())
    return np.array(features)

# Extract eGeMAPS for each task
ad_egemaps = extract_egemaps(ad_audio_files)
mmse_egemaps = extract_egemaps(mmse_audio_files)
decline_egemaps = extract_egemaps(decline_audio_files)

- ADR Suggestion:

ADR typically involves learning representations from raw audio using unsupervised or self-supervised methods. You could use a pre-trained model like wav2vec2 from speechbrain or train a custom model on a larger dataset. For now, we'll proceed with eGeMAPS, but you can explore speechbrain’s Wav2Vec2 for ADR-like features.

- Linguistic Features:
<br>
Use Automatic Speech Recognition (ASR) to transcribe audio, then process transcripts in CHAT format for CLAN analysis (MOR, EVAL, FREQ).
We'll use speechbrain’s pre-trained ASR model for transcription.

In [9]:
from speechbrain.pretrained import EncoderDecoderASR

# Load pre-trained ASR model
asr_model = EncoderDecoderASR.from_hparams(
    source="speechbrain/asr-crdnn-rnnlm-librispeech",
    savedir="pretrained_models/asr-crdnn-rnnlm-librispeech"
)

# Function to transcribe audio
def transcribe_audio(audio_files):
    transcripts = []
    for audio in audio_files:
        transcription = asr_model.transcribe_file(audio)
        transcripts.append(transcription)
    return transcripts

# Transcribe audio for each task
ad_transcripts = transcribe_audio(ad_audio_files)
mmse_transcripts = transcribe_audio(mmse_audio_files)
decline_transcripts = transcribe_audio(decline_audio_files)

DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _speechbrain_save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _speechbrain_load
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for load
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _recover
  from speechbrain.pretrained import EncoderDecoderASR
INFO:speechbrain.utils.fetching:Fetch hyperparams.yaml: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-librispeech' if not cached


hyperparams.yaml:   0%|          | 0.00/4.83k [00:00<?, ?B/s]

DEBUG:speechbrain.utils.fetching:Fetch: Local file found, creating symlink '/root/.cache/huggingface/hub/models--speechbrain--asr-crdnn-rnnlm-librispeech/snapshots/979a53a7a3f6c9291c02c040fd8ebfb2471cf8a3/hyperparams.yaml' -> '/content/pretrained_models/asr-crdnn-rnnlm-librispeech/hyperparams.yaml'
INFO:speechbrain.utils.fetching:Fetch custom.py: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-librispeech' if not cached
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _load
DEBUG:speechbrain.utils.checkpoints:Registered parameter transfer hook for _load
  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)
DEBUG:speechbrain.utils.parameter_transfer:Collecting files (or symlinks) for pretraining in pretrained_models/asr-crdnn-rnnlm-librispeech.
INFO:speechbrain.utils.fetching:Fetch normalizer.ckpt: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-li

normalizer.ckpt:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

DEBUG:speechbrain.utils.fetching:Fetch: Local file found, creating symlink '/root/.cache/huggingface/hub/models--speechbrain--asr-crdnn-rnnlm-librispeech/snapshots/979a53a7a3f6c9291c02c040fd8ebfb2471cf8a3/normalizer.ckpt' -> '/content/pretrained_models/asr-crdnn-rnnlm-librispeech/normalizer.ckpt'
DEBUG:speechbrain.utils.parameter_transfer:Set local path in self.paths["normalizer"] = /content/pretrained_models/asr-crdnn-rnnlm-librispeech/normalizer.ckpt
INFO:speechbrain.utils.fetching:Fetch asr.ckpt: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-librispeech' if not cached


asr.ckpt:   0%|          | 0.00/480M [00:00<?, ?B/s]

DEBUG:speechbrain.utils.fetching:Fetch: Local file found, creating symlink '/root/.cache/huggingface/hub/models--speechbrain--asr-crdnn-rnnlm-librispeech/snapshots/979a53a7a3f6c9291c02c040fd8ebfb2471cf8a3/asr.ckpt' -> '/content/pretrained_models/asr-crdnn-rnnlm-librispeech/asr.ckpt'
DEBUG:speechbrain.utils.parameter_transfer:Set local path in self.paths["asr"] = /content/pretrained_models/asr-crdnn-rnnlm-librispeech/asr.ckpt
INFO:speechbrain.utils.fetching:Fetch lm.ckpt: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-librispeech' if not cached


lm.ckpt:   0%|          | 0.00/212M [00:00<?, ?B/s]

DEBUG:speechbrain.utils.fetching:Fetch: Local file found, creating symlink '/root/.cache/huggingface/hub/models--speechbrain--asr-crdnn-rnnlm-librispeech/snapshots/979a53a7a3f6c9291c02c040fd8ebfb2471cf8a3/lm.ckpt' -> '/content/pretrained_models/asr-crdnn-rnnlm-librispeech/lm.ckpt'
DEBUG:speechbrain.utils.parameter_transfer:Set local path in self.paths["lm"] = /content/pretrained_models/asr-crdnn-rnnlm-librispeech/lm.ckpt
INFO:speechbrain.utils.fetching:Fetch tokenizer.ckpt: Fetching from HuggingFace Hub 'speechbrain/asr-crdnn-rnnlm-librispeech' if not cached


tokenizer.ckpt:   0%|          | 0.00/253k [00:00<?, ?B/s]

DEBUG:speechbrain.utils.fetching:Fetch: Local file found, creating symlink '/root/.cache/huggingface/hub/models--speechbrain--asr-crdnn-rnnlm-librispeech/snapshots/979a53a7a3f6c9291c02c040fd8ebfb2471cf8a3/tokenizer.ckpt' -> '/content/pretrained_models/asr-crdnn-rnnlm-librispeech/tokenizer.ckpt'
DEBUG:speechbrain.utils.parameter_transfer:Set local path in self.paths["tokenizer"] = /content/pretrained_models/asr-crdnn-rnnlm-librispeech/tokenizer.ckpt
INFO:speechbrain.utils.parameter_transfer:Loading pretrained files for: normalizer, asr, lm, tokenizer
DEBUG:speechbrain.utils.parameter_transfer:Redirecting (loading from local path): normalizer -> /content/pretrained_models/asr-crdnn-rnnlm-librispeech/normalizer.ckpt
DEBUG:speechbrain.utils.parameter_transfer:Redirecting (loading from local path): asr -> /content/pretrained_models/asr-crdnn-rnnlm-librispeech/asr.ckpt
DEBUG:speechbrain.utils.parameter_transfer:Redirecting (loading from local path): lm -> /content/pretrained_models/asr-crdnn

# Convert to CHAT Format

- Convert to CHAT Format:
<br> CHAT is a specific format for linguistic analysis. You’ll need to structure the transcripts with metadata (e.g., speaker ID, timestamps). Below is a basic example of saving transcripts in CHAT-like format.

In [10]:
def save_to_chat(transcripts, audio_files, output_dir, task_name):
    os.makedirs(output_dir, exist_ok=True)
    for i, (transcript, audio_file) in enumerate(zip(transcripts, audio_files)):
        chat_file = os.path.join(output_dir, f"{task_name}_{i}.cha")
        with open(chat_file, 'w') as f:
            f.write(f"@Begin\n")
            f.write(f"@Participants: PAR Participant\n")
            f.write(f"@ID: language|corpus|PAR|||||Participant||\n")
            f.write(f"*PAR:\t{transcript}\n")
            f.write(f"@End\n")

# Save transcripts to CHAT format
save_to_chat(ad_transcripts, ad_audio_files, '/content/chat/ad', 'ad')
save_to_chat(mmse_transcripts, mmse_audio_files, '/content/chat/mmse', 'mmse')
save_to_chat(decline_transcripts, decline_audio_files, '/content/chat/decline', 'decline')