# Notebook B: Feature Extraction (Robust Multi-Strategy)

## ⚠️ IMPORTANT
**Ensure you are running `notebooks/B_feature_extraction.ipynb`, NOT a file in `.ipynb_checkpoints`.**

## Overview
This notebook generates audio embeddings for the classifiers. It employs a **Multi-Strategy Loading Mechanism** to handle environment compatibility issues:
1.  **Strategy A**: Try loading **HeAR** (Health Acoustic Representations) using `transformers` (Recommended).
2.  **Strategy B**: Try loading **HeAR** using native `tf.saved_model` or `hub.load` (Backup).
3.  **Strategy C**: Fallback to **YAMNet** (Google's Audio Event Model) via TensorFlow Hub if HeAR fails.

This ensures the pipeline, including Notebook C and Validation, can proceed even if the primary HeAR model faces compatibility issues with Python 3.13/TF 2.16.

In [1]:
# Install necessary packages including transformers and tensorflow_hub
!pip install transformers tensorflow_hub tf-keras

import os
from pathlib import Path
import numpy as np
import pandas as pd
import librosa
import tensorflow as tf
import tensorflow_hub as hub
from tqdm.notebook import tqdm
import json
import pickle

DATASETS_ROOT = Path(r"D:\datasets")
PROCESSED_ROOT = DATASETS_ROOT / 'processed'
EMBEDDINGS_DIR = DATASETS_ROOT / 'embeddings'
EMBEDDINGS_DIR.mkdir(parents=True, exist_ok=True)

TARGET_SR = 16000
N_SAMPLES = 32000  # 2 seconds at 16kHz

print(f"Embeddings output: {EMBEDDINGS_DIR}")
print(f"TensorFlow version: {tf.__version__}")

Defaulting to user installation because normal site-packages is not writeable


  from pkg_resources import parse_version



Embeddings output: D:\datasets\embeddings
TensorFlow version: 2.20.0


In [2]:
# Login to Hugging Face (Required for HeAR)
from huggingface_hub import notebook_login
try:
    from huggingface_hub import get_token
except ImportError:
    from huggingface_hub import HfFolder
    get_token = HfFolder.get_token

if get_token() is None:
    print("Please login to Hugging Face to access HeAR model:")
    notebook_login()
else:
    print("✓ Already logged in to Hugging Face")

✓ Already logged in to Hugging Face


In [3]:
# --- MODEL LOADING SECTION ---
model_name = None
embedding_model = None
EMBEDDING_DIM = 0

# Strategy A: Transformers (HeAR)
try:
    print("\n--- Attempting Strategy A: HeAR via Transformers ---")
    from transformers import TFAutoModel
    embedding_model = TFAutoModel.from_pretrained("google/hear", trust_remote_code=True)
    model_name = "HeAR_Transformers"
    EMBEDDING_DIM = 768  # HeAR base usually 768
    print("✅ Success: Loaded HeAR via Transformers")
except Exception as e:
    print(f"❌ Failed Strategy A: {e}")

# Strategy B: Native TF (HeAR) - Only if A failed
if model_name is None:
    try:
        print("\n--- Attempting Strategy B: HeAR via Native TF/Hub ---")
        from huggingface_hub import snapshot_download
        path = snapshot_download("google/hear")
        # Try loading sub-model first as it showed more promise
        sub_path = os.path.join(path, "event_detector", "event_detector_large")
        try:
            # Try loading sub-model
            embedding_model = tf.saved_model.load(sub_path, tags=['serve'])
            model_name = "HeAR_SubModel"
        except:
            # Try root model
             embedding_model = tf.saved_model.load(path)
             model_name = "HeAR_Root"
        
        EMBEDDING_DIM = 768 # Assumption
        print(f"✅ Success: Loaded HeAR via Native TF ({model_name})")
    except Exception as e:
        print(f"❌ Failed Strategy B: {e}")

# Strategy C: YAMNet Fallback - If A and B failed
if model_name is None:
    try:
        print("\n--- Attempting Strategy C: YAMNet Fallback (TFHub) ---")
        # YAMNet is robust standard feature extractor
        embedding_model = hub.load('https://tfhub.dev/google/yamnet/1')
        model_name = "YAMNet"
        EMBEDDING_DIM = 1024 # YAMNet dimensions
        print("✅ Success: Loaded YAMNet explicitly.")
        print("⚠️ NOTE: Using YAMNet instead of HeAR due to environment compatibility issues.")
        print("   This allows you to proceed with the pipeline. You can switch back later.")
    except Exception as e:
        print(f"❌ Failed Strategy C: {e}")
        raise RuntimeError("Could not load ANY feature extraction model. Check internet/installation.")

print(f"\nFINAL MODEL SELECTION: {model_name} (Dim: {EMBEDDING_DIM})")


--- Attempting Strategy A: HeAR via Transformers ---
❌ Failed Strategy A: Unrecognized model in google/hear. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, fl

Fetching 24 files:   0%|          | 0/24 [00:00<?, ?it/s]

❌ Failed Strategy B: Importing a SavedModel with `tf.saved_model.load` requires a `tags=` argument if there is more than one MetaGraph. Got `tags=None`, but there are 0 MetaGraphs in the SavedModel with tag sets: []. Pass a `tags=` argument to load this SavedModel.

--- Attempting Strategy C: YAMNet Fallback (TFHub) ---












✅ Success: Loaded YAMNet explicitly.
⚠️ NOTE: Using YAMNet instead of HeAR due to environment compatibility issues.
   This allows you to proceed with the pipeline. You can switch back later.

FINAL MODEL SELECTION: YAMNet (Dim: 1024)


In [4]:
def load_audio_fixed(file_path):
    """Load audio and ensure it's exactly 32000 samples (2s @ 16kHz)"""
    audio, _ = librosa.load(str(file_path), sr=TARGET_SR, mono=True)
    if len(audio) < N_SAMPLES:
        audio = np.pad(audio, (0, N_SAMPLES - len(audio)))
    return audio[:N_SAMPLES].astype(np.float32)

def extract_embeddings_batch(audio_batch):
    """Extract embeddings correctly for the selected model"""
    audio_array = np.array(audio_batch, dtype=np.float32)
    
    # Logic for HeAR Transformers
    if model_name == "HeAR_Transformers":
        audio_tensor = tf.convert_to_tensor(audio_array)
        # Transformers model often callable directly
        out = embedding_model(audio_tensor)
        # Use pooler_output if available, else mean of last_hidden_state
        if hasattr(out, 'pooler_output'):
            return out.pooler_output.numpy()
        if hasattr(out, 'last_hidden_state'):
            return tf.reduce_mean(out.last_hidden_state, axis=1).numpy()
        return out[0].numpy()

    # Logic for HeAR Native
    elif "HeAR" in model_name:
        if hasattr(embedding_model, 'signatures') and 'serving_default' in embedding_model.signatures:
            fn = embedding_model.signatures['serving_default']
            res = fn(tf.convert_to_tensor(audio_array))
            return list(res.values())[0].numpy()
        else:
            return embedding_model(audio_array).numpy()

    # Logic for YAMNet
    elif model_name == "YAMNet":
        # YAMNet does not support native batching of waveforms in TFHub version usually
        # It expects 1D tensor. We loop.
        batch_embs = []
        for waveform in audio_array:
            _, embeddings, _ = embedding_model(waveform)
            # embeddings shape: (N, 1024). Average them.
            avg_emb = tf.reduce_mean(embeddings, axis=0)
            batch_embs.append(avg_emb)
        return np.array(batch_embs)

    return np.zeros((len(audio_batch), EMBEDDING_DIM))

In [5]:
def process_dataset_embeddings(dataset_name, batch_size=32):
    """Process all audio files from a dataset and extract embeddings"""
    input_dir = PROCESSED_ROOT / dataset_name
    if not input_dir.exists():
        print(f"⚠ {dataset_name}: Not found at {input_dir}")
        return None

    wav_files = sorted(input_dir.glob("*.wav"))
    if not wav_files:
        print(f"⚠ {dataset_name}: No WAV files in {input_dir}")
        return None

    print(f"\nProcessing {dataset_name}: {len(wav_files)} files")

    embeddings_list = []
    file_names = []

    # Adjust batch size for YAMNet since we loop internally anyway, keep it managed
    for i in tqdm(range(0, len(wav_files), batch_size), desc=f"Extracting {dataset_name}"):
        batch_files = wav_files[i:i+batch_size]
        batch_audio = [load_audio_fixed(f) for f in batch_files]
        try:
            batch_embeddings = extract_embeddings_batch(batch_audio)
            embeddings_list.append(batch_embeddings)
            file_names.extend([f.stem for f in batch_files])
        except Exception as e:
            print(f"Error processing batch starting at {i}: {e}")
            continue

    if not embeddings_list:
        return 0

    embeddings = np.vstack(embeddings_list)

    output_path = EMBEDDINGS_DIR / f"{dataset_name}_embeddings.npz"
    np.savez_compressed(output_path, embeddings=embeddings, file_names=file_names)

    print(f"✓ {dataset_name}: {embeddings.shape[0]} embeddings saved")
    return embeddings.shape[0]

In [6]:
# Process all datasets
datasets = ['coughvid', 'parkinsons', 'respiratory_sounds', 'coswara']
results = {}

for name in datasets:
    results[name] = process_dataset_embeddings(name)

# Save summary
summary = {
    'embedding_dim': EMBEDDING_DIM, 
    'sample_rate': TARGET_SR, 
    'model_name': model_name,
    'datasets': results
}
with open(EMBEDDINGS_DIR / 'embeddings_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print(f"\n" + "="*50)
total = sum(v for v in results.values() if v)
print(f"Total embeddings: {total}")
print(f"Output: {EMBEDDINGS_DIR}")
print("\nProceed to: C_multi_task_classifier.ipynb")


Processing coughvid: 11319 files


Extracting coughvid:   0%|          | 0/354 [00:00<?, ?it/s]

✓ coughvid: 11319 embeddings saved

Processing parkinsons: 5 files


Extracting parkinsons:   0%|          | 0/1 [00:00<?, ?it/s]

✓ parkinsons: 5 embeddings saved

Processing respiratory_sounds: 9841 files


Extracting respiratory_sounds:   0%|          | 0/308 [00:00<?, ?it/s]

✓ respiratory_sounds: 9841 embeddings saved

Processing coswara: 5 files


Extracting coswara:   0%|          | 0/1 [00:00<?, ?it/s]

✓ coswara: 5 embeddings saved

Total embeddings: 21170
Output: D:\datasets\embeddings

Proceed to: C_multi_task_classifier.ipynb
