# LazyPredict Model Comparison for Audio Classification

This notebook uses LazyPredict to quickly train and compare multiple machine learning models on the tabular audio features.

LazyPredict automatically trains 40+ sklearn models and provides a comparison table showing their performance.

## Workflow:
1. Load tabular features and labels
2. Filter to single-class samples
3. Split into train/test sets
4. Run LazyClassifier to train all models
5. Compare results and identify best models

In [1]:
# Install lazypredict if not already installed
# !pip install lazypredict -q

In [2]:
import numpy as np
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# LazyPredict imports
from lazypredict.Supervised import LazyClassifier

import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

Libraries imported successfully!


## 1. Load Data

In [3]:
# Define paths
DATA_PATH = Path('./work')
FEATURES_CSV = DATA_PATH / 'trn_curated_feature.csv'
LABELS_CSV = Path('./input/train_curated.csv')

# Load features
print("Loading features...")
features_df = pd.read_csv(FEATURES_CSV)
print(f"Features shape: {features_df.shape}")

# Load labels
print("\nLoading labels...")
labels_df = pd.read_csv(LABELS_CSV)
print(f"Labels shape: {labels_df.shape}")

# Merge features and labels
df = features_df.merge(labels_df, left_on='file', right_on='fname', how='inner')
print(f"\nMerged dataframe shape: {df.shape}")

Loading features...
Features shape: (4970, 2475)

Loading labels...
Labels shape: (4970, 2)

Merged dataframe shape: (4970, 2477)


## 2. Filter to Single-Class Samples

LazyPredict works with single-label classification, so we filter to samples with only one label.

In [4]:
# Count number of labels per sample
df['num_labels'] = df['labels'].str.count(',') + 1

# Filter to single-class only
single_class_df = df[df['num_labels'] == 1].copy()
print(f"Original samples: {len(df)}")
print(f"Single-class samples: {len(single_class_df)}")
print(f"Percentage single-class: {100 * len(single_class_df) / len(df):.2f}%")

# Check class distribution
print(f"\nNumber of unique classes: {single_class_df['labels'].nunique()}")
print(f"\nTop 10 classes:")
print(single_class_df['labels'].value_counts().head(10))

Original samples: 4970
Single-class samples: 4269
Percentage single-class: 85.90%

Number of unique classes: 74

Top 10 classes:
labels
Finger_snapping           75
Scissors                  75
Zipper_(clothing)         75
Gong                      75
Printer                   75
Marimba_and_xylophone     75
Keys_jangling             75
Skateboard                75
Computer_keyboard         75
Burping_and_eructation    74
Name: count, dtype: int64


## 3. Prepare Data for Training

We'll use a 80/20 train/test split for LazyPredict.

In [5]:
# Prepare feature columns and target
feature_cols = [col for col in single_class_df.columns 
                if col not in ['file', 'fname', 'labels', 'num_labels']]

print(f"Number of feature columns: {len(feature_cols)}")

X = single_class_df[feature_cols]
y = single_class_df['labels']

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")

Number of feature columns: 2474
X shape: (4269, 2474)
y shape: (4269,)


In [6]:
# Check for and remove missing values
print(f"Missing values in X: {X.isnull().sum().sum()}")
print(f"Missing values in y: {y.isnull().sum()}")

if X.isnull().any().any():
    print("\nRemoving rows with missing values...")
    mask = ~X.isnull().any(axis=1)
    X = X[mask]
    y = y[mask]
    print(f"After removal - X shape: {X.shape}, y shape: {y.shape}")

Missing values in X: 2474
Missing values in y: 0

Removing rows with missing values...
After removal - X shape: (4268, 2474), y shape: (4268,)


In [7]:
# Split into train and test sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {len(X_train)} samples ({100*len(X_train)/len(X):.1f}%)")
print(f"Test set: {len(X_test)} samples ({100*len(X_test)/len(X):.1f}%)")
print(f"\nNumber of classes: {y.nunique()}")

Training set: 3414 samples (80.0%)
Test set: 854 samples (20.0%)

Number of classes: 74


## 4. Run LazyPredict in Segments

This will train models in 4 groups to identify problematic models and avoid getting stuck.

**Strategy:** Train specific model groups separately with timeout protection.

In [8]:
# Define model groups (split into 4 segments)
# These are common sklearn classifiers that LazyPredict uses

from sklearn.ensemble import (AdaBoostClassifier, BaggingClassifier, ExtraTreesClassifier,
                               RandomForestClassifier, GradientBoostingClassifier)
from sklearn.naive_bayes import BernoulliNB, GaussianNB, MultinomialNB
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier
from sklearn.neighbors import KNeighborsClassifier, RadiusNeighborsClassifier
from sklearn.linear_model import (LogisticRegression, RidgeClassifier, RidgeClassifierCV,
                                   PassiveAggressiveClassifier, Perceptron, SGDClassifier)
from sklearn.svm import LinearSVC, SVC, NuSVC
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.calibration import CalibratedClassifierCV
from sklearn.semi_supervised import LabelPropagation, LabelSpreading
from sklearn.neighbors import NearestCentroid
from sklearn.dummy import DummyClassifier

# Segment 1: Fast Naive Bayes and simple models (should be quick)
segment_1 = [
    ('DummyClassifier', DummyClassifier(strategy='most_frequent', random_state=42)),
    ('BernoulliNB', BernoulliNB()),
    ('GaussianNB', GaussianNB()),
    ('NearestCentroid', NearestCentroid()),
    ('Perceptron', Perceptron(random_state=42, max_iter=1000)),
]

# Segment 2: Linear models (moderate speed)
segment_2 = [
    ('LogisticRegression', LogisticRegression(random_state=42, max_iter=1000, n_jobs=-1)),
    ('RidgeClassifier', RidgeClassifier(random_state=42)),
    ('RidgeClassifierCV', RidgeClassifierCV()),
    ('PassiveAggressiveClassifier', PassiveAggressiveClassifier(random_state=42, max_iter=1000)),
    ('SGDClassifier', SGDClassifier(random_state=42, max_iter=1000)),
    ('LinearDiscriminantAnalysis', LinearDiscriminantAnalysis()),
    ('LinearSVC', LinearSVC(random_state=42, max_iter=1000, dual=False)),
]

# Segment 3: Tree-based and neighbor models (can be slow)
segment_3 = [
    ('DecisionTreeClassifier', DecisionTreeClassifier(random_state=42)),
    ('ExtraTreeClassifier', ExtraTreeClassifier(random_state=42)),
    ('KNeighborsClassifier', KNeighborsClassifier(n_neighbors=5, n_jobs=-1)),
    ('RandomForestClassifier', RandomForestClassifier(random_state=42, n_estimators=100, n_jobs=-1)),
    ('ExtraTreesClassifier', ExtraTreesClassifier(random_state=42, n_estimators=100, n_jobs=-1)),
]

# Segment 4: Ensemble and complex models (slowest - skip problematic ones)
segment_4 = [
    ('AdaBoostClassifier', AdaBoostClassifier(random_state=42, algorithm='SAMME')),
    ('BaggingClassifier', BaggingClassifier(random_state=42, n_jobs=-1)),
    ('GradientBoostingClassifier', GradientBoostingClassifier(random_state=42)),
    ('CalibratedClassifierCV', CalibratedClassifierCV(n_jobs=-1)),
]

# Store all segments
segments = [
    ("Segment 1: Fast Models", segment_1),
    ("Segment 2: Linear Models", segment_2),
    ("Segment 3: Tree & Neighbor Models", segment_3),
    ("Segment 4: Ensemble Models", segment_4),
]

print("Model segments defined:")
total_models = 0
for i, (name, models_list) in enumerate(segments, 1):
    print(f"\n{name} ({len(models_list)} models):")
    for model_name, _ in models_list:
        print(f"  - {model_name}")
    total_models += len(models_list)

print(f"\n{'='*60}")
print(f"TOTAL: {total_models} models across 4 segments")
print(f"{'='*60}")

Model segments defined:

Segment 1: Fast Models (5 models):
  - DummyClassifier
  - BernoulliNB
  - GaussianNB
  - NearestCentroid
  - Perceptron

Segment 2: Linear Models (7 models):
  - LogisticRegression
  - RidgeClassifier
  - RidgeClassifierCV
  - PassiveAggressiveClassifier
  - SGDClassifier
  - LinearDiscriminantAnalysis
  - LinearSVC

Segment 3: Tree & Neighbor Models (5 models):
  - DecisionTreeClassifier
  - ExtraTreeClassifier
  - KNeighborsClassifier
  - RandomForestClassifier
  - ExtraTreesClassifier

Segment 4: Ensemble Models (4 models):
  - AdaBoostClassifier
  - BaggingClassifier
  - GradientBoostingClassifier
  - CalibratedClassifierCV

TOTAL: 21 models across 4 segments


# Function to train models with timeout protection
import time
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score
import signal

class TimeoutError(Exception):
    pass

def timeout_handler(signum, frame):
    raise TimeoutError()

def train_and_evaluate_model(name, model, X_train, X_test, y_train, y_test, timeout=300):
    """
    Train and evaluate a single model with timeout protection.
    
    Args:
        timeout: Maximum time in seconds (default 300 = 5 minutes)
    """
    print(f"  Training {name}...", end=" ", flush=True)
    
    start_time = time.time()
    
    try:
        # Set timeout alarm (only works on Unix-like systems)
        signal.signal(signal.SIGALRM, timeout_handler)
        signal.alarm(timeout)
        
        # Train model
        model.fit(X_train, y_train)
        
        # Make predictions
        y_pred = model.predict(X_test)
        
        # Cancel alarm
        signal.alarm(0)
        
        # Calculate metrics
        accuracy = accuracy_score(y_test, y_pred)
        balanced_acc = balanced_accuracy_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred, average='weighted')
        
        elapsed_time = time.time() - start_time
        
        print(f"✓ Done in {elapsed_time:.2f}s (Acc: {accuracy:.4f})")
        
        return {
            'Model': name,
            'Accuracy': accuracy,
            'Balanced Accuracy': balanced_acc,
            'F1 Score': f1,
            'Time Taken': elapsed_time,
            'Status': 'Success'
        }
        
    except TimeoutError:
        signal.alarm(0)
        print(f"✗ TIMEOUT after {timeout}s")
        return {
            'Model': name,
            'Accuracy': None,
            'Balanced Accuracy': None,
            'F1 Score': None,
            'Time Taken': timeout,
            'Status': 'Timeout'
        }
    except Exception as e:
        signal.alarm(0)
        elapsed_time = time.time() - start_time
        print(f"✗ ERROR: {str(e)[:50]}")
        return {
            'Model': name,
            'Accuracy': None,
            'Balanced Accuracy': None,
            'F1 Score': None,
            'Time Taken': elapsed_time,
            'Status': f'Error: {str(e)[:50]}'
        }

print("Training function defined successfully!")

In [9]:
# RUN SEGMENT 1: Fast Models
print("=" * 80)
print("SEGMENT 1: Fast Models (Naive Bayes, Simple Classifiers)")
print("=" * 80)

segment_1_results = []
for model_name, model in segment_1:
    result = train_and_evaluate_model(model_name, model, X_train, X_test, y_train, y_test, timeout=180)
    segment_1_results.append(result)

# Convert to DataFrame
segment_1_df = pd.DataFrame(segment_1_results).set_index('Model')
print("\n" + "=" * 80)
print("SEGMENT 1 RESULTS:")
print("=" * 80)
print(segment_1_df[['Accuracy', 'Balanced Accuracy', 'F1 Score', 'Time Taken', 'Status']])

SEGMENT 1: Fast Models (Naive Bayes, Simple Classifiers)


NameError: name 'train_and_evaluate_model' is not defined

In [None]:
# RUN SEGMENT 2: Linear Models
print("=" * 80)
print("SEGMENT 2: Linear Models")
print("=" * 80)

segment_2_results = []
for model_name, model in segment_2:
    result = train_and_evaluate_model(model_name, model, X_train, X_test, y_train, y_test, timeout=300)
    segment_2_results.append(result)

# Convert to DataFrame
segment_2_df = pd.DataFrame(segment_2_results).set_index('Model')
print("\n" + "=" * 80)
print("SEGMENT 2 RESULTS:")
print("=" * 80)
print(segment_2_df[['Accuracy', 'Balanced Accuracy', 'F1 Score', 'Time Taken', 'Status']])

In [None]:
# RUN SEGMENT 3: Tree-based Models
print("=" * 80)
print("SEGMENT 3: Tree-based Models")
print("=" * 80)

segment_3_results = []
for model_name, model in segment_3:
    result = train_and_evaluate_model(model_name, model, X_train, X_test, y_train, y_test, timeout=600)
    segment_3_results.append(result)

# Convert to DataFrame
segment_3_df = pd.DataFrame(segment_3_results).set_index('Model')
print("\n" + "=" * 80)
print("SEGMENT 3 RESULTS:")
print("=" * 80)
print(segment_3_df[['Accuracy', 'Balanced Accuracy', 'F1 Score', 'Time Taken', 'Status']])

In [None]:
# RUN SEGMENT 4: Ensemble Models
print("=" * 80)
print("SEGMENT 4: Ensemble Models (Slowest - may take 10+ minutes)")
print("=" * 80)

segment_4_results = []
for model_name, model in segment_4:
    result = train_and_evaluate_model(model_name, model, X_train, X_test, y_train, y_test, timeout=900)
    segment_4_results.append(result)

# Convert to DataFrame
segment_4_df = pd.DataFrame(segment_4_results).set_index('Model')
print("\n" + "=" * 80)
print("SEGMENT 4 RESULTS:")
print("=" * 80)
print(segment_4_df[['Accuracy', 'Balanced Accuracy', 'F1 Score', 'Time Taken', 'Status']])

## 5. Combine All Results

Merge all segment results into a single DataFrame for analysis.

In [None]:
# Combine all results
all_results = segment_1_results + segment_2_results + segment_3_results + segment_4_results
models = pd.DataFrame(all_results).set_index('Model')

# Display full results
print("=" * 80)
print("COMBINED MODEL COMPARISON RESULTS")
print("=" * 80)
print(models)

# Filter to successful models only
successful_models = models[models['Status'] == 'Success'].copy()

print("\n" + "=" * 80)
print(f"SUMMARY: {len(successful_models)}/{len(models)} models completed successfully")
print("=" * 80)

# Show failed/timeout models if any
failed_models = models[models['Status'] != 'Success']
if len(failed_models) > 0:
    print("\nFailed/Timeout Models:")
    print(failed_models[['Time Taken', 'Status']])

In [None]:
# Display top 10 models by accuracy (successful models only)
if len(successful_models) > 0:
    print("\nTop 10 Models by Accuracy:")
    print("=" * 80)
    top_10 = successful_models.sort_values('Accuracy', ascending=False).head(10)
    print(top_10[['Accuracy', 'Balanced Accuracy', 'F1 Score', 'Time Taken']])
else:
    print("No successful models to display")

In [None]:
# Heatmap of metrics for top 10 models
top_10_models = models.sort_values('Accuracy', ascending=False).head(10)
metrics_to_plot = ['Accuracy', 'Balanced Accuracy', 'F1 Score']

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(top_10_models[metrics_to_plot], annot=True, fmt='.3f', 
            cmap='RdYlGn', vmin=0, vmax=1, ax=ax)
ax.set_title('Performance Metrics Heatmap - Top 10 Models')
ax.set_xlabel('Metric')
ax.set_ylabel('Model')
plt.tight_layout()
plt.show()

In [None]:
# Distribution of accuracies across all models
fig, ax = plt.subplots(figsize=(12, 6))
ax.hist(models['Accuracy'], bins=20, edgecolor='black', alpha=0.7)
ax.axvline(models['Accuracy'].mean(), color='red', linestyle='--', 
          label=f'Mean: {models["Accuracy"].mean():.3f}')
ax.axvline(models['Accuracy'].median(), color='green', linestyle='--', 
          label=f'Median: {models["Accuracy"].median():.3f}')
ax.set_xlabel('Accuracy')
ax.set_ylabel('Number of Models')
ax.set_title('Distribution of Model Accuracies')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 7. Model Recommendations

Based on the results, let's identify the best models for different use cases.

In [None]:
print("Model Recommendations:")
print("="*80)

# Best overall accuracy
best_acc_model = models['Accuracy'].idxmax()
print(f"\n1. BEST OVERALL ACCURACY:")
print(f"   Model: {best_acc_model}")
print(f"   Accuracy: {models.loc[best_acc_model, 'Accuracy']:.4f}")
print(f"   F1 Score: {models.loc[best_acc_model, 'F1 Score']:.4f}")
print(f"   Training Time: {models.loc[best_acc_model, 'Time Taken']:.2f}s")

# Best balanced accuracy (good for imbalanced classes)
best_bal_model = models['Balanced Accuracy'].idxmax()
print(f"\n2. BEST BALANCED ACCURACY (for imbalanced classes):")
print(f"   Model: {best_bal_model}")
print(f"   Balanced Accuracy: {models.loc[best_bal_model, 'Balanced Accuracy']:.4f}")
print(f"   Regular Accuracy: {models.loc[best_bal_model, 'Accuracy']:.4f}")
print(f"   Training Time: {models.loc[best_bal_model, 'Time Taken']:.2f}s")

# Fastest model with good accuracy (>50% of best accuracy)
threshold = models['Accuracy'].max() * 0.85
fast_models = models[models['Accuracy'] >= threshold].sort_values('Time Taken')
if len(fast_models) > 0:
    fastest_good = fast_models.index[0]
    print(f"\n3. FASTEST MODEL (with ≥85% of best accuracy):")
    print(f"   Model: {fastest_good}")
    print(f"   Accuracy: {models.loc[fastest_good, 'Accuracy']:.4f}")
    print(f"   Training Time: {models.loc[fastest_good, 'Time Taken']:.2f}s")

# Best F1 Score
best_f1_model = models['F1 Score'].idxmax()
print(f"\n4. BEST F1 SCORE (balance of precision & recall):")
print(f"   Model: {best_f1_model}")
print(f"   F1 Score: {models.loc[best_f1_model, 'F1 Score']:.4f}")
print(f"   Accuracy: {models.loc[best_f1_model, 'Accuracy']:.4f}")
print(f"   Training Time: {models.loc[best_f1_model, 'Time Taken']:.2f}s")

print("\n" + "="*80)

## 8. Export Results

In [None]:
# Save results to CSV
output_file = 'lazypredict_results.csv'
models.to_csv(output_file)
print(f"Results saved to: {output_file}")

# Save top 10 models
top_10_file = 'lazypredict_top10.csv'
models.sort_values('Accuracy', ascending=False).head(10).to_csv(top_10_file)
print(f"Top 10 models saved to: {top_10_file}")

## Summary

LazyPredict has trained and evaluated 40+ machine learning models on the audio classification task.

### Key Takeaways:
- Identified the best performing models without manual hyperparameter tuning
- Compared accuracy, F1 score, and training time across all models
- Found models suitable for different use cases (accuracy vs speed)

### Next Steps:
1. Take the top-performing models and fine-tune their hyperparameters
2. Try ensemble methods combining multiple top models
3. Investigate why certain models perform better on this audio dataset
4. Consider feature engineering to improve performance further