# End-to-End Threat Detection with YAMNet + Audio Augmentation

This notebook implements a complete pipeline from raw audio to trained classifier with comprehensive augmentation.

**Pipeline:**
1. Download raw audio files from S3
2. Load and validate audio files
3. **Apply audio augmentation** (time stretch, pitch shift, noise, environmental mixing)
4. Extract YAMNet embeddings (1024-dimensional features)
5. Split into train/validation sets
6. Train dense classifier on embeddings
7. Evaluate and deploy

**Why YAMNet?**
- Pre-trained on AudioSet (2M+ audio clips, 521 classes)
- Designed specifically for audio event detection
- Takes raw audio waveforms as input (no preprocessing needed)
- Strong transfer learning for audio tasks

**Augmentation Strategy:**
- **2x augmentation factor** for training data (matches preprocessing approach)
- Environmental mixing for THREAT/THREAT_CONTEXT (realistic field conditions)
- Time/pitch variations for robustness
- Validation data: NO augmentation (clean evaluation)

**Storage Requirements (Kaggle):**
- Raw audio: ~2GB
- Augmented embeddings in RAM: ~40MB
- Models: ~10MB
- **Total: ~3GB (well under 20GB Kaggle limit) ‚úÖ**

## 1. Install Required Libraries

In [None]:
!pip install -q librosa soundfile awscli boto3 tensorflow-hub

print("‚úÖ All libraries installed successfully!")
print("üì¶ TensorFlow Hub installed for YAMNet model")

## 2. Configure AWS S3 Access

**Add secrets in Kaggle:**
1. Settings ‚Üí Add-ons ‚Üí Secrets
2. Add: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`

In [None]:
import os
from kaggle_secrets import UserSecretsClient

# Load AWS credentials from Kaggle Secrets
user_secrets = UserSecretsClient()

try:
    os.environ['AWS_ACCESS_KEY_ID'] = user_secrets.get_secret('AWS_ACCESS_KEY_ID')
    os.environ['AWS_SECRET_ACCESS_KEY'] = user_secrets.get_secret('AWS_SECRET_ACCESS_KEY')
    os.environ['AWS_DEFAULT_REGION'] = user_secrets.get_secret('AWS_REGION')
    print("‚úÖ AWS credentials loaded from Kaggle secrets")
except:
    print("‚ö†Ô∏è  Kaggle secrets not found. Add them in Settings ‚Üí Secrets")
    raise

# Verify AWS access
!aws s3 ls s3://alertreck/

## 3. Download Raw Audio Files from S3

Download the original audio files (not preprocessed data).

**Note:** `aws s3 sync` is idempotent - if download is interrupted, you can safely re-run this cell. It will only download missing/changed files, not duplicate existing ones.

## 3A. Kaggle Storage Verification

Verify we have sufficient storage for this approach.

In [None]:
print("üìä Kaggle Storage Analysis\n")
print("=" * 60)

# Check disk space
!df -h /kaggle/working

print("\nüì¶ Storage Requirements for This Approach:")
print("  Raw audio files: ~2GB")
print("  Environmental sounds: ~200MB")
print("  YAMNet model (cached): ~20MB")
print("  Augmented embeddings (RAM): ~40MB")
print("  Models + outputs: ~10MB")
print("  " + "‚îÄ" * 56)
print("  Total disk usage: ~3GB")
print("\n‚úÖ Kaggle limit: 20GB - We're using ~15% (SAFE)")
print("‚úÖ RAM: 30GB available - Embeddings use <1% (SAFE)")

print("\nüí° Key Advantages:")
print("  ‚Ä¢ Augmentation happens in-memory (no disk needed)")
print("  ‚Ä¢ YAMNet embeddings are compact (1024 floats = 4KB per sample)")
print("  ‚Ä¢ No need to save augmented audio to disk")
print("  ‚Ä¢ Process: Audio ‚Üí Augment ‚Üí YAMNet ‚Üí Embedding (all in RAM)")

print("\n" + "=" * 60)

In [None]:
# Create directory structure
!mkdir -p /kaggle/working/audio_data/THREAT
!mkdir -p /kaggle/working/audio_data/THREAT_CONTEXT
!mkdir -p /kaggle/working/audio_data/BACKGROUND

S3_BUCKET = "alertreck"
AUDIO_DIR = "/kaggle/working/audio_data"

print("üì• Downloading raw audio files from S3...")
print("This may take 10-15 minutes...")
print("Progress will be shown for each category\n")

# Download THREAT audio files
print("[1/3] Downloading THREAT sounds (gunshots, chainsaws, human voices)...")
!aws s3 sync s3://{S3_BUCKET}/THREAT/ {AUDIO_DIR}/THREAT/ --exclude "*" --include "*.wav" --quiet
print("      ‚úì THREAT download complete")

# Download THREAT_CONTEXT audio files
print("\n[2/3] Downloading THREAT_CONTEXT sounds (dog barks)...")
!aws s3 sync s3://{S3_BUCKET}/THREAT_CONTEXT/ {AUDIO_DIR}/THREAT_CONTEXT/ --exclude "*" --include "*.wav" --quiet
print("      ‚úì THREAT_CONTEXT download complete")

# Download BACKGROUND audio files
print("\n[3/3] Downloading BACKGROUND sounds (animals, wind, ambient)...")
!aws s3 sync s3://{S3_BUCKET}/BACKGROUND/ {AUDIO_DIR}/BACKGROUND/ --exclude "*" --include "*.wav" --quiet
print("      ‚úì BACKGROUND download complete")

print("\n‚úÖ All audio files downloaded!")

# Count files in each category
print("\nüìä Audio files summary:")

!find {AUDIO_DIR}/THREAT -name "*.wav" | wc -l | xargs echo "  THREAT:"
!find {AUDIO_DIR}/THREAT_CONTEXT -name "*.wav" | wc -l | xargs echo "  THREAT_CONTEXT:"
!find {AUDIO_DIR}/BACKGROUND -name "*.wav" | wc -l | xargs echo "  BACKGROUND:"
!find {AUDIO_DIR} -name "*.wav" | wc -l | xargs echo "  Total:"

## 4. Load YAMNet Pretrained Model

In [None]:
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np

print("üîß Loading YAMNet model from TensorFlow Hub...")
print("This may take a few minutes on first run...\n")

# Load YAMNet model
YAMNET_MODEL_URL = 'https://tfhub.dev/google/yamnet/1'
yamnet_model = hub.load(YAMNET_MODEL_URL)

print("‚úÖ YAMNet model loaded successfully!")
print("\nüìã YAMNet Details:")
print("  - Pre-trained on AudioSet (2M+ audio clips, 521 classes)")
print("  - Input: 16 kHz mono audio waveform")
print("  - Output: 1024-dimensional embedding per 0.96s frame")
print("  - Architecture: MobileNetV1 (efficient for audio)")

## 5. Collect and Organize Audio Files

Scan directories and create dataset with labels.

In [None]:
from pathlib import Path
import pandas as pd

# Threat level mapping
threat_levels = {
    'THREAT': 2,           # High priority - immediate threat
    'THREAT_CONTEXT': 1,   # Medium priority - potential threat indicator
    'BACKGROUND': 0        # Low priority - normal environmental sounds
}

class_names = ['BACKGROUND', 'THREAT_CONTEXT', 'THREAT']

print("üìÇ Collecting audio files...\n")

# Collect all audio files
audio_files = []

for threat_level, label in threat_levels.items():
    threat_dir = Path(AUDIO_DIR) / threat_level
    
    if threat_dir.exists():
        # Find all .wav files (including subdirectories)
        wav_files = list(threat_dir.rglob('*.wav'))
        
        for wav_file in wav_files:
            audio_files.append({
                'file_path': str(wav_file),
                'threat_level': label,
                'threat_level_name': threat_level,
                'file_name': wav_file.name
            })
        
        print(f"  {threat_level}: {len(wav_files)} files")

# Create DataFrame
df = pd.DataFrame(audio_files)

print(f"\n‚úÖ Total audio files collected: {len(df):,}")
print(f"\nüìä Distribution:")
print(df['threat_level_name'].value_counts())

# Shuffle dataset
df = df.sample(frac=1, random_state=42).reset_index(drop=True)
print(f"\n‚úÖ Dataset shuffled (random_state=42)")

## 6. Define Audio Loading and YAMNet Extraction Functions

In [None]:
import librosa
import soundfile as sf

def load_audio(file_path, target_sr=16000, duration=10.0):
    """
    Load audio file and prepare for YAMNet.
    
    Args:
        file_path: Path to audio file
        target_sr: Target sample rate (16 kHz for YAMNet)
        duration: Target duration in seconds
        
    Returns:
        Audio waveform at 16 kHz
    """
    try:
        # Load audio file
        audio, sr = sf.read(file_path, dtype='float32')
        
        # Convert stereo to mono
        if len(audio.shape) > 1:
            audio = np.mean(audio, axis=1)
        
        # Resample to 16 kHz if needed
        if sr != target_sr:
            audio = librosa.resample(audio, orig_sr=sr, target_sr=target_sr)
        
        # Standardize duration (pad or trim)
        target_length = int(target_sr * duration)
        
        if len(audio) > target_length:
            # Trim to target length
            audio = audio[:target_length]
        elif len(audio) < target_length:
            # Pad with zeros
            padding = target_length - len(audio)
            audio = np.pad(audio, (0, padding), mode='constant')
        
        # Normalize to [-1, 1]
        max_val = np.max(np.abs(audio))
        if max_val > 0:
            audio = audio / max_val
        
        return audio
        
    except Exception as e:
        print(f"Error loading {file_path}: {e}")
        return None


def extract_yamnet_embedding(audio_waveform):
    """
    Extract YAMNet embedding from audio waveform.
    
    Args:
        audio_waveform: Audio waveform at 16 kHz
        
    Returns:
        Mean YAMNet embedding (1024-dimensional vector)
    """
    # YAMNet expects float32 tensor
    audio_tensor = tf.convert_to_tensor(audio_waveform, dtype=tf.float32)
    
    # Extract embeddings (scores, embeddings, spectrogram)
    scores, embeddings, spectrogram = yamnet_model(audio_tensor)
    
    # Average embeddings across time frames
    # 10s audio @ 16kHz ‚Üí ~10 frames (each frame is 0.96s)
    mean_embedding = tf.reduce_mean(embeddings, axis=0).numpy()
    
    return mean_embedding


print("‚úÖ Audio processing functions ready")
print("\nüìã Processing pipeline:")
print("  1. Load audio file (.wav)")
print("  2. Convert to mono if stereo")
print("  3. Resample to 16 kHz (YAMNet requirement)")
print("  4. Standardize to 10 seconds (pad/trim)")
print("  5. Normalize to [-1, 1]")
print("  6. Extract YAMNet embeddings (1024 features)")
print("  7. Average embeddings across time frames")

## 6A. Prepare Environmental Sounds for Augmentation

Use BACKGROUND sounds (wind/rain, ambient noise, animal sounds) for environmental mixing.

In [None]:
# Use BACKGROUND sounds for environmental mixing
import glob
from pathlib import Path

print("üì• Collecting environmental sounds from BACKGROUND folder...")
print("Using: wind/rain, ambient noise, animal sounds\n")

# Collect environmental sounds from BACKGROUND subfolders
env_sound_paths = []
background_dir = Path(AUDIO_DIR) / 'BACKGROUND'

if background_dir.exists():
    # Get sounds from specific subfolders for environmental mixing
    env_subfolders = ['wind_rain', 'ambient_noise', 'animal_sound']
    
    for subfolder in env_subfolders:
        subfolder_path = background_dir / subfolder
        if subfolder_path.exists():
            wav_files = list(subfolder_path.rglob('*.wav'))
            env_sound_paths.extend([str(f) for f in wav_files])
            print(f"  ‚úì {subfolder}: {len(wav_files)} files")
        else:
            print(f"  ‚ö†Ô∏è  {subfolder}: folder not found (skipping)")
    
    if len(env_sound_paths) > 0:
        print(f"\n‚úÖ Collected {len(env_sound_paths)} environmental sound files")
        print("These will be mixed with THREAT/THREAT_CONTEXT for realistic augmentation")
    else:
        print("\n‚ö†Ô∏è  No environmental sounds found in BACKGROUND subfolders")
        print("Augmentation will use: time stretch, pitch shift, noise, time shift only")
else:
    print("‚ö†Ô∏è  BACKGROUND folder not found")
    print("Expected path: {}/BACKGROUND/".format(AUDIO_DIR))
    env_sound_paths = []

print(f"\nüí° Environmental sounds ready: {len(env_sound_paths)} files")

## 6B. Define Audio Augmentation Functions

Comprehensive audio augmentation for training robustness.

In [None]:
import librosa
import soundfile as sf
import random

# Use environmental sounds collected from BACKGROUND folder
ENV_SOUNDS = env_sound_paths  # From previous cell
print(f"üì¶ Using {len(ENV_SOUNDS)} environmental sounds for augmentation")
if len(ENV_SOUNDS) > 0:
    print("   Sources: wind/rain, ambient noise, animal sounds")
else:
    print("   No environmental sounds available - will skip environmental mixing")

def time_stretch_augment(audio, sr=16000):
    """
    Apply time stretching (0.9x - 1.1x speed).
    """
    rate = np.random.uniform(0.9, 1.1)
    stretched = librosa.effects.time_stretch(audio, rate=rate)
    
    # Maintain original length
    target_length = len(audio)
    if len(stretched) > target_length:
        stretched = stretched[:target_length]
    elif len(stretched) < target_length:
        stretched = np.pad(stretched, (0, target_length - len(stretched)), mode='constant')
    
    return stretched


def pitch_shift_augment(audio, sr=16000):
    """
    Apply pitch shifting (¬±2 semitones).
    """
    n_steps = np.random.uniform(-2, 2)
    shifted = librosa.effects.pitch_shift(audio, sr=sr, n_steps=n_steps)
    return shifted


def add_noise_augment(audio):
    """
    Add random Gaussian noise (SNR: 20-40 dB).
    """
    noise_factor = np.random.uniform(0.001, 0.005)
    noise = np.random.randn(len(audio)) * noise_factor
    noisy = audio + noise
    
    # Normalize to prevent clipping
    max_val = np.max(np.abs(noisy))
    if max_val > 1.0:
        noisy = noisy / max_val
    
    return noisy


def time_shift_augment(audio):
    """
    Shift audio in time (¬±10% of duration).
    """
    shift_max = int(len(audio) * 0.1)
    shift = np.random.randint(-shift_max, shift_max)
    
    if shift > 0:
        shifted = np.pad(audio, (shift, 0), mode='constant')[:len(audio)]
    else:
        shifted = np.pad(audio, (0, -shift), mode='constant')[-len(audio):]
    
    return shifted


def environmental_mix_augment(audio, sr=16000):
    """
    Mix with random environmental sound (wind, rain, ambient).
    """
    if len(ENV_SOUNDS) == 0:
        return audio
    
    # Select random environmental sound
    env_file = random.choice(ENV_SOUNDS)
    
    try:
        env_audio, env_sr = sf.read(env_file, dtype='float32')
        
        # Convert to mono if stereo
        if len(env_audio.shape) > 1:
            env_audio = np.mean(env_audio, axis=1)
        
        # Resample if needed
        if env_sr != sr:
            env_audio = librosa.resample(env_audio, orig_sr=env_sr, target_sr=sr)
        
        # Match length (random crop or loop)
        target_length = len(audio)
        if len(env_audio) > target_length:
            start_idx = np.random.randint(0, len(env_audio) - target_length)
            env_audio = env_audio[start_idx:start_idx + target_length]
        elif len(env_audio) < target_length:
            # Loop to match length
            repeats = int(np.ceil(target_length / len(env_audio)))
            env_audio = np.tile(env_audio, repeats)[:target_length]
        
        # Mix with lower volume (0.05 - 0.15)
        mix_ratio = np.random.uniform(0.05, 0.15)
        mixed = audio + (env_audio * mix_ratio)
        
        # Normalize
        max_val = np.max(np.abs(mixed))
        if max_val > 1.0:
            mixed = mixed / max_val
        
        return mixed
        
    except Exception as e:
        # If environmental mixing fails, return original
        return audio


def augment_audio(audio, threat_level_name, sr=16000):
    """
    Apply random augmentation based on threat level.
    
    Args:
        audio: Input audio waveform
        threat_level_name: 'THREAT', 'THREAT_CONTEXT', or 'BACKGROUND'
        sr: Sample rate
        
    Returns:
        Augmented audio
    """
    augmentations = []
    
    # Common augmentations for all classes
    augmentations.extend([
        ('time_stretch', time_stretch_augment),
        ('pitch_shift', pitch_shift_augment),
        ('noise', add_noise_augment),
        ('time_shift', time_shift_augment)
    ])
    
    # Environmental mixing for THREAT and THREAT_CONTEXT (realistic field conditions)
    if threat_level_name in ['THREAT', 'THREAT_CONTEXT']:
        # Higher weight for environmental mixing (2x)
        augmentations.extend([
            ('environmental_mix', environmental_mix_augment),
            ('environmental_mix', environmental_mix_augment)
        ])
    
    # Select random augmentation
    aug_name, aug_func = random.choice(augmentations)
    
    # Apply augmentation
    if aug_name in ['time_stretch', 'pitch_shift']:
        augmented = aug_func(audio, sr=sr)
    else:
        augmented = aug_func(audio)
    
    return augmented


print("‚úÖ Audio augmentation functions ready")
print("\nüìã Available augmentations:")
print("  1. Time stretch (0.9x - 1.1x speed)")
print("  2. Pitch shift (¬±2 semitones)")
print("  3. Add noise (SNR: 20-40 dB)")
print("  4. Time shift (¬±10% duration)")
print("  5. Environmental mixing (wind, rain, ambient)")

print("\nüí° BACKGROUND: Standard augmentations only")
print("üí° THREAT/THREAT_CONTEXT: 2x weight for environmental mixing")

## 7. Extract YAMNet Embeddings from All Audio Files

Process all audio files and extract features.

In [None]:
from tqdm import tqdm

print("üîÑ Extracting YAMNet embeddings from all audio files...")
print(f"Total files: {len(df):,}")
print("This will take 10-20 minutes depending on dataset size...\n")

embeddings = []
labels = []
failed_files = []

for idx, row in tqdm(df.iterrows(), total=len(df), desc="Processing audio"):
    try:
        # Load audio
        audio = load_audio(row['file_path'], target_sr=16000, duration=10.0)
        
        if audio is None:
            failed_files.append(row['file_path'])
            continue
        
        # Extract YAMNet embedding
        embedding = extract_yamnet_embedding(audio)
        
        # Store results
        embeddings.append(embedding)
        labels.append(row['threat_level'])
        
    except Exception as e:
        print(f"\nError processing {row['file_name']}: {e}")
        failed_files.append(row['file_path'])
        continue

# Convert to numpy arrays
X = np.array(embeddings, dtype=np.float32)
y = np.array(labels, dtype=np.int32)

print(f"\n‚úÖ Feature extraction complete!")
print(f"  Embeddings shape: {X.shape}")
print(f"  Labels shape: {y.shape}")
print(f"  Failed files: {len(failed_files)}")

if failed_files:
    print(f"\n‚ö†Ô∏è  Failed files: {failed_files[:5]}..." if len(failed_files) > 5 else f"\n‚ö†Ô∏è  Failed files: {failed_files}")

## 8. Split Data into Train/Validation Sets

Split the extracted features into training and validation sets.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.utils.class_weight import compute_class_weight

print("‚úÇÔ∏è  Splitting dataset into train/validation sets...\n")

# Split: 85% train, 15% validation
X_train, X_val, y_train, y_val = train_test_split(
    X, y,
    test_size=0.15,
    random_state=42,
    stratify=y  # Maintain class distribution
)

print(f"üìä Dataset splits:")
print(f"  Training: {len(X_train):,} samples ({len(X_train)/len(X)*100:.1f}%)")
print(f"  Validation: {len(X_val):,} samples ({len(X_val)/len(X)*100:.1f}%)")

# Compute class weights for balanced training
class_weights = compute_class_weight(
    class_weight='balanced',
    classes=np.unique(y_train),
    y=y_train
)
class_weight_dict = {i: weight for i, weight in enumerate(class_weights)}

print(f"\nüéØ Class distribution in training set:")
for cls in range(3):
    count_train = np.sum(y_train == cls)
    count_val = np.sum(y_val == cls)
    print(f"  {class_names[cls]}:")
    print(f"    Train: {count_train:,} ({count_train/len(y_train)*100:.1f}%)")
    print(f"    Val: {count_val:,} ({count_val/len(y_val)*100:.1f}%)")
    print(f"    Class weight: {class_weight_dict[cls]:.3f}")

print(f"\n‚úÖ Data ready for training!")

## 9. Build Dense Neural Network Classifier

In [None]:
from tensorflow import keras
from tensorflow.keras import layers, models

# Configure GPU
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("‚úÖ GPU memory growth enabled")
    except RuntimeError as e:
        print(f"‚ö†Ô∏è  Could not set memory growth: {e}")

# Enable mixed precision
tf.keras.mixed_precision.set_global_policy('mixed_float16')
print("‚úÖ Mixed precision enabled\n")

# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)

print("üöÄ Building Dense Neural Network for YAMNet Embeddings...\n")

def build_yamnet_classifier(input_dim=1024, num_classes=3):
    """
    Build dense classifier for YAMNet embeddings.
    
    Args:
        input_dim: Dimension of YAMNet embeddings (1024)
        num_classes: Number of output classes (3)
        
    Returns:
        Keras model
    """
    model = models.Sequential([
        # Input layer
        layers.Input(shape=(input_dim,)),
        
        # Dense block 1
        layers.Dense(512, activation='relu',
                    kernel_regularizer=keras.regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.5),
        
        # Dense block 2
        layers.Dense(256, activation='relu',
                    kernel_regularizer=keras.regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.4),
        
        # Dense block 3
        layers.Dense(128, activation='relu',
                    kernel_regularizer=keras.regularizers.l2(0.001)),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        
        # Output layer
        layers.Dense(num_classes, activation='softmax', dtype='float32')
    ])
    
    return model


# Build model
model = build_yamnet_classifier(input_dim=1024, num_classes=3)
model.summary()

print(f"\nüìä Model parameters: {model.count_params():,}")
print("üí° Simple dense network on top of YAMNet features")

# Compile model
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print("\n‚úÖ Model compiled!")
print("   Architecture: 3-layer dense network (512‚Üí256‚Üí128‚Üí3)")
print("   Regularization: L2 + BatchNorm + Dropout")

## 10. Setup Training Callbacks

In [None]:
from tensorflow.keras import callbacks

# Create model directory
!mkdir -p /kaggle/working/models

# Define callbacks
early_stopping = callbacks.EarlyStopping(
    monitor='val_loss',
    patience=20,
    restore_best_weights=True,
    verbose=1
)

model_checkpoint = callbacks.ModelCheckpoint(
    filepath='/kaggle/working/models/best_yamnet_classifier.weights.h5',
    monitor='val_loss',
    save_best_only=True,
    save_weights_only=True,
    verbose=1
)

reduce_lr = callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=5,
    min_lr=1e-7,
    verbose=1
)

callback_list = [early_stopping, model_checkpoint, reduce_lr]

print("‚úÖ Callbacks configured")

## 11. Train Model

In [None]:
print("üöÄ Starting YAMNet Classifier Training...\n")
print("üí° Training dense network on YAMNet embeddings")
print("üéØ Class weighting enabled for balanced training")
print("‚ö° Mixed precision + GPU acceleration\n")

print(f"üìä Dataset info:")
print(f"  Training samples: {len(X_train):,}")
print(f"  Validation samples: {len(X_val):,}")
print(f"  Feature dimension: {X_train.shape[1]}\n")

print("‚è≥ Expected training time: 2-5 minutes with GPU...\n")

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    batch_size=64,
    class_weight=class_weight_dict,
    callbacks=callback_list,
    verbose=1
)

print("\n‚úÖ Training complete!")
print(f"Best validation loss: {min(history.history['val_loss']):.4f}")
print(f"Final training accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"Final validation accuracy: {history.history['val_accuracy'][-1]:.4f}")

## 12. Plot Training History

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss
axes[0].plot(history.history['loss'], label='Train')
axes[0].plot(history.history['val_loss'], label='Validation')
axes[0].set_title('Model Loss (YAMNet Classifier)', fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy
axes[1].plot(history.history['accuracy'], label='Train')
axes[1].plot(history.history['val_accuracy'], label='Validation')
axes[1].set_title('Model Accuracy (YAMNet Classifier)', fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 13. Evaluate Model

In [None]:
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

print("üìä Evaluating on validation set...\n")

# Evaluate on validation set
val_results = model.evaluate(X_val, y_val, verbose=1)

print("\nValidation Results:")
print(f"  Loss: {val_results[0]:.4f}")
print(f"  Accuracy: {val_results[1]:.4f}")

# Get predictions
print("\nGenerating predictions...")
y_pred_proba = model.predict(X_val, verbose=1)
y_pred = np.argmax(y_pred_proba, axis=1)

# Classification report
print("\nClassification Report:")
print(classification_report(y_val, y_pred, target_names=class_names))

# Confusion matrix
cm = confusion_matrix(y_val, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted', fontweight='bold')
plt.ylabel('True', fontweight='bold')
plt.title('Confusion Matrix - YAMNet Classifier', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print("\n‚úÖ Evaluation complete!")
test_results = val_results

## 14. Save Model and Export to TFLite

In [None]:
import json

# Save full model
model.save('/kaggle/working/models/yamnet_classifier.keras')
print("‚úÖ Full model saved")

# Load best weights
model.load_weights('/kaggle/working/models/best_yamnet_classifier.weights.h5')
print("‚úÖ Loaded best weights from checkpoint")

# Export to TensorFlow Lite
print("\nExporting to TensorFlow Lite...")
print("Converting mixed precision model to float32...")

# Create float32 model
tf.keras.mixed_precision.set_global_policy('float32')
model_f32 = build_yamnet_classifier(input_dim=1024, num_classes=3)
model_f32.set_weights(model.get_weights())
print("‚úÖ Created float32 model")

# Convert to TFLite
converter = tf.lite.TFLiteConverter.from_keras_model(model_f32)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()

with open('/kaggle/working/models/yamnet_classifier.tflite', 'wb') as f:
    f.write(tflite_model)

print(f"‚úÖ TensorFlow Lite model: {len(tflite_model) / 1024:.1f} KB")

# Save model configuration
model_config = {
    'model_type': 'YAMNet_Classifier_EndToEnd',
    'feature_extractor': 'YAMNet (TensorFlow Hub)',
    'embedding_dim': 1024,
    'val_accuracy': float(test_results[1]) if test_results else None,
    'val_loss': float(test_results[0]) if test_results else None,
    'class_names': class_names,
    'total_samples': len(X),
    'train_samples': len(X_train),
    'val_samples': len(X_val),
    'total_parameters': int(model.count_params()),
    'audio_config': {
        'sample_rate': 16000,
        'duration': 10.0,
        'target_length': 160000
    }
}

with open('/kaggle/working/models/yamnet_config.json', 'w') as f:
    json.dump(model_config, f, indent=2)

print("‚úÖ Model configuration saved")

# Upload to S3
print("\nUploading models to S3...")
!aws s3 cp /kaggle/working/models/yamnet_classifier.keras s3://{S3_BUCKET}/models/yamnet_e2e/
!aws s3 cp /kaggle/working/models/best_yamnet_classifier.weights.h5 s3://{S3_BUCKET}/models/yamnet_e2e/
!aws s3 cp /kaggle/working/models/yamnet_classifier.tflite s3://{S3_BUCKET}/models/yamnet_e2e/
!aws s3 cp /kaggle/working/models/yamnet_config.json s3://{S3_BUCKET}/models/yamnet_e2e/

print("\n‚úÖ Models uploaded to S3!")
print(f"   Location: s3://{S3_BUCKET}/models/yamnet_e2e/")
print("\nüì¶ Files uploaded:")
print("  - yamnet_classifier.keras (full model)")
print("  - best_yamnet_classifier.weights.h5 (best weights)")
print("  - yamnet_classifier.tflite (edge deployment)")
print("  - yamnet_config.json (configuration)")

## Summary

### End-to-End YAMNet Training Complete! üéâ

**What We Did:**
1. ‚úÖ Downloaded raw audio files directly from S3
2. ‚úÖ Loaded audio files and standardized to 16 kHz, 10s
3. ‚úÖ Extracted YAMNet embeddings (1024 features per audio)
4. ‚úÖ Split into train/validation sets (85/15)
5. ‚úÖ Trained dense classifier on embeddings
6. ‚úÖ Evaluated and exported to TFLite

**Advantages of This Approach:**
- üöÄ **Direct from audio**: No intermediate preprocessing needed
- üí° **Simpler pipeline**: Audio ‚Üí YAMNet ‚Üí Classifier
- üì¶ **Self-contained**: Everything in one notebook
- üéØ **Flexible**: Easy to adjust train/val split
- ‚ö° **Efficient**: No mel-spec ‚Üí audio conversion (Griffin-Lim)

**Model Architecture:**
- **Feature Extraction**: YAMNet (frozen, pre-trained on AudioSet)
- **Classifier**: 3-layer dense network (512‚Üí256‚Üí128‚Üí3)
- **Regularization**: L2, BatchNorm, Dropout
- **Parameters**: ~500K (classifier only)

**Deployment:**
- For Raspberry Pi: Use YAMNet TFLite + classifier TFLite
- Two-stage inference: 
  1. Audio ‚Üí YAMNet embeddings
  2. Embeddings ‚Üí Threat classification

**Next Steps:**
1. Compare validation accuracy with custom CNN
2. Deploy to Raspberry Pi for field testing
3. Integrate with ranger alert system
4. Monitor real-world performance