# Drum Sample Auto-Classifier - Complete Archive Edition

This notebook demonstrates how to use the trained model to automatically classify and organize drum samples from your complete drum archive, while keeping the original archive read-only.

## Key Features
- **Read-only source**: Works with your `/complete_drum_archive` without modifying it
- **Copy-based processing**: Creates copies for classification, preserving originals
- **Repeatable**: Can be run multiple times without affecting source data
- **Selective processing**: Can target specific folders or file patterns

## Prerequisites
1. **Train a model first** by running these notebooks in order:
   - `MFCC_Feature_Extractor.ipynb` (extracts features from sorted training data)
   - `Model1_Train.ipynb` or `Model2_Train.ipynb` (trains the classifier)

2. **Set up your archive path**:
   - Point to your `/complete_drum_archive` directory
   - The notebook will create a separate output directory for results

## Configuration Options
- **Source directory**: Your complete drum archive (read-only)
- **Output directory**: Where classified copies will be placed
- **Processing mode**: Copy vs. symbolic links vs. move
- **File filtering**: By extension, size, or pattern

## How it works
- Scans your complete drum archive (without modifying it)
- Creates organized copies in a separate output directory
- Classifies each sample with confidence scoring
- Organizes results by instrument type
- Preserves original file structure in output naming

In [21]:
import os
import numpy as np
import glob
import librosa
import librosa.display
import keras
import shutil
from pathlib import Path
import json
from datetime import datetime

In [22]:
# Scan Archive and Find Audio Files
# ===================================

def find_audio_files(archive_path, formats=SUPPORTED_FORMATS, max_files=MAX_FILES_PER_RUN):
    """Recursively find all audio files in the archive"""
    audio_files = []
    
    print(f"Scanning archive: {archive_path}")
    archive_path = Path(archive_path)
    
    for format_ext in formats:
        pattern = f"**/*{format_ext}"
        files = list(archive_path.glob(pattern))
        audio_files.extend(files)
        print(f"Found {len(files)} {format_ext} files")
    
    print(f"Total audio files found: {len(audio_files)}")
    
    if max_files and len(audio_files) > max_files:
        print(f"Limiting to first {max_files} files for this run")
        audio_files = audio_files[:max_files]
    
    return audio_files

# Scan the archive
audio_files = find_audio_files(ARCHIVE_PATH)

if not audio_files:
    print("❌ No audio files found in archive!")
    print(f"Checked formats: {SUPPORTED_FORMATS}")
    print(f"In directory: {ARCHIVE_PATH}")
else:
    print(f"✅ Ready to process {len(audio_files)} audio files")
    print(f"Sample files:")
    for i, file in enumerate(audio_files[:5]):  # Show first 5
        print(f"  {i+1}. {file.name}")
    if len(audio_files) > 5:
        print(f"  ... and {len(audio_files) - 5} more")

Scanning archive: /Users/Gilby/Projects/MLAudioClassifier/complete_drum_archive
Found 0 .wav files
Found 0 .aiff files
Found 0 .flac files
Found 0 .mp3 files
Total audio files found: 0
❌ No audio files found in archive!
Checked formats: ['.wav', '.aiff', '.flac', '.mp3']
In directory: /Users/Gilby/Projects/MLAudioClassifier/complete_drum_archive


In [23]:
# Classification Function (Read-Only Safe)
# ========================================

def classify_audio_file(file_path, model, preserve_structure=True):
    """
    Classify a single audio file and return prediction info
    This function does NOT modify the original file
    """
    try:
        # Load and process audio
        waveform, samplerate = librosa.load(str(file_path), sr=44100, mono=True)
        waveform = librosa.util.fix_length(waveform, size=50000)
        
        # Extract MFCC features
        mfcc = librosa.feature.mfcc(y=waveform, sr=samplerate, n_mfcc=40, n_fft=2048, hop_length=512)
        features = librosa.util.normalize(mfcc)
        features = features[np.newaxis, ...]
        
        # Predict
        probs = model.predict(features, verbose=0)
        label = np.argmax(probs)
        confidence = np.max(probs)
        
        # Map to instrument names
        instrument_names = ["Crash", "Hihat", "Kick", "Ride", "Snare", "Tom"]
        predicted_instrument = instrument_names[label]
        
        # Generate output filename preserving original structure
        file_path = Path(file_path)
        if preserve_structure:
            # Keep relative path structure in filename
            relative_path = file_path.relative_to(ARCHIVE_PATH)
            safe_path = str(relative_path).replace('/', '_').replace('\\', '_')
            output_filename = f"{predicted_instrument.lower()}_{confidence:.3f}_{safe_path}"
        else:
            output_filename = f"{predicted_instrument.lower()}_{confidence:.3f}_{file_path.name}"
        
        return {
            'original_path': str(file_path),
            'predicted_class': predicted_instrument,
            'confidence': confidence,
            'output_filename': output_filename,
            'success': True,
            'error': None
        }
        
    except Exception as e:
        return {
            'original_path': str(file_path),
            'predicted_class': None,
            'confidence': 0.0,
            'output_filename': None,
            'success': False,
            'error': str(e)
        }

print("✅ Classification function ready")
print("This function will NOT modify your original files!")

✅ Classification function ready
This function will NOT modify your original files!


In [None]:
# Save Processing Metadata (For Repeat Runs)
# ===========================================


# Only save report if results exist (i.e., processing was run)
if 'results' in locals() and results is not None:
    # Create processing report
    report = {
        'timestamp': datetime.now().isoformat(),
        'archive_path': ARCHIVE_PATH,
        'output_path': OUTPUT_PATH,
        'total_files_found': len(audio_files),
        'files_processed': len(results),
        'successful_classifications': len(successful),
        'errors': len(errors),
        'processing_mode': 'copy' if COPY_FILES else 'symlink',
        'class_distribution': class_counts if 'class_counts' in locals() else {},
        'configuration': {
            'max_files_per_run': MAX_FILES_PER_RUN,
            'supported_formats': SUPPORTED_FORMATS,
            'model_shape': str(model.input_shape) if 'model' in locals() else None
        }
    }

    # Save detailed results
    report_file = Path(OUTPUT_PATH) / "metadata" / f"classification_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    with open(report_file, 'w') as f:
        json.dump(report, f, indent=2)

    # Save detailed file results
    if results:
        detailed_file = Path(OUTPUT_PATH) / "metadata" / f"detailed_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
        with open(detailed_file, 'w') as f:
            json.dump(results, f, indent=2)

    print(f"📄 Processing report saved: {report_file}")
    print(f"📄 Detailed results saved: {detailed_file}")

    # Show quick stats for rerun reference
    print("\n" + "="*50)
    print("📋 SUMMARY FOR FUTURE REFERENCE")
    print("="*50)
    print(f"Archive scanned: {ARCHIVE_PATH}")
    print(f"Output directory: {OUTPUT_PATH}")
    print(f"Files in archive: {len(audio_files)}")
    print(f"Successfully classified: {len(successful)}")
    print(f"Mode: {'File copies' if COPY_FILES else 'Symbolic links'}")
    print("\nTo rerun classification:")
    print("1. Simply run all cells again")
    print("2. Adjust MAX_FILES_PER_RUN to process different batches")
    print("3. Change COPY_FILES to switch between copy/symlink modes")
    print("4. Your original archive will always remain untouched!")
else:
    print("⚠️ No results to save. Please run the processing cell above first.")

NameError: name 'results' is not defined