# 🎵 Music Embeddings Extraction Demo

**Created by Sergie Code - AI Tools for Musicians**

This notebook demonstrates how to extract embeddings from audio files using state-of-the-art models like **OpenL3** and **AudioCLIP**. These embeddings form the foundation for:

- 🔍 **Music Similarity Search**
- 🛡️ **Copyright & Plagiarism Detection** 
- 🤖 **AI Music Analysis Tools**
- 📊 **Music Recommendation Systems**

---

## 📋 What You'll Learn

1. How to extract embeddings from audio files
2. Compare different embedding models
3. Calculate audio similarity
4. Visualize embeddings in 2D space
5. Build a foundation for vector search systems

## 1. Project Setup and Dependencies

First, let's verify our project structure and install any missing dependencies.

In [None]:
import os
import sys

# Add the src directory to Python path
project_root = os.path.dirname(os.getcwd())
src_path = os.path.join(project_root, 'src')
if src_path not in sys.path:
    sys.path.insert(0, src_path)

print(f"📁 Project root: {project_root}")
print(f"📁 Source path: {src_path}")

# Check project structure
expected_files = [
    'src/embeddings.py',
    'src/utils.py', 
    'src/__init__.py',
    'requirements.txt',
    'README.md'
]

print("\n🔍 Checking project structure:")
for file in expected_files:
    file_path = os.path.join(project_root, file)
    status = "✅" if os.path.exists(file_path) else "❌"
    print(f"  {status} {file}")

# Create a data directory for sample audio files
data_dir = os.path.join(project_root, 'data')
os.makedirs(data_dir, exist_ok=True)
print(f"\n📂 Data directory ready: {data_dir}")

In [None]:
# Check if we have audio files to work with
audio_extensions = ['.wav', '.mp3', '.flac', '.m4a', '.ogg']
audio_files = []

for root, dirs, files in os.walk(data_dir):
    for file in files:
        if any(file.lower().endswith(ext) for ext in audio_extensions):
            audio_files.append(os.path.join(root, file))

print(f"🎵 Found {len(audio_files)} audio files in data directory:")
for i, audio_file in enumerate(audio_files[:5]):  # Show first 5
    print(f"  {i+1}. {os.path.basename(audio_file)}")

if len(audio_files) == 0:
    print("\n⚠️ No audio files found. To run this demo:")
    print(f"   1. Add some audio files to: {data_dir}")
    print("   2. Or we'll create synthetic audio for demonstration")
    
    # Create synthetic audio for demo
    import numpy as np
    import soundfile as sf
    
    print("\n🔧 Creating synthetic demo audio...")
    sample_rate = 22050
    duration = 5  # 5 seconds
    
    # Create different types of synthetic audio
    synthetic_files = [
        ('demo_sine_wave.wav', 'Sine wave at 440 Hz'),
        ('demo_chirp.wav', 'Frequency sweep'),
        ('demo_noise.wav', 'Pink noise')
    ]
    
    for filename, description in synthetic_files:
        filepath = os.path.join(data_dir, filename)
        
        if 'sine' in filename:
            # Pure sine wave
            t = np.linspace(0, duration, int(sample_rate * duration))
            audio = 0.3 * np.sin(2 * np.pi * 440 * t)  # 440 Hz (A4)
            
        elif 'chirp' in filename:
            # Frequency sweep
            t = np.linspace(0, duration, int(sample_rate * duration))
            freq = 200 + (2000 - 200) * t / duration  # 200 Hz to 2000 Hz
            audio = 0.3 * np.sin(2 * np.pi * freq * t)
            
        elif 'noise' in filename:
            # Pink noise
            audio = np.random.normal(0, 0.1, int(sample_rate * duration))
            
        sf.write(filepath, audio, sample_rate)
        audio_files.append(filepath)
        print(f"  ✅ Created: {filename} ({description})")

print(f"\n🎵 Total audio files available: {len(audio_files)}")

## 2. Import Required Libraries

Import all the libraries we need for audio processing and embedding extraction.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import librosa
import librosa.display
from IPython.display import Audio, display
import warnings

# Custom modules
try:
    from embeddings import AudioEmbeddingExtractor, compare_embeddings
    from utils import load_audio, preprocess_audio, save_embeddings, load_embeddings, get_audio_info
    print("✅ Successfully imported custom modules")
except ImportError as e:
    print(f"❌ Error importing custom modules: {e}")
    print("   Make sure the src directory is in your Python path")

# Visualization libraries
try:
    from sklearn.manifold import TSNE
    from sklearn.decomposition import PCA
    print("✅ Scikit-learn available for visualizations")
except ImportError:
    print("⚠️ Scikit-learn not available - install for visualizations")

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")
%matplotlib inline

print("\n🚀 All libraries imported successfully!")

## 3. Load and Preprocess Audio Files

Let's load our audio files and explore their characteristics.

In [None]:
# Select the first few audio files for our demo
demo_files = audio_files[:3] if len(audio_files) >= 3 else audio_files

print(f"🎵 Working with {len(demo_files)} audio files:")

# Load and analyze each file
audio_data = {}

for i, filepath in enumerate(demo_files):
    filename = os.path.basename(filepath)
    print(f"\n📁 File {i+1}: {filename}")
    
    try:
        # Get file info
        info = get_audio_info(filepath)
        print(f"   Duration: {info['duration_seconds']:.2f}s")
        print(f"   Sample Rate: {info['sample_rate']} Hz")
        print(f"   Channels: {info['channels']}")
        print(f"   Format: {info['format']}")
        
        # Load audio
        audio, sr = load_audio(filepath, target_sr=22050)
        
        # Preprocess
        processed_audio = preprocess_audio(audio, target_sr=sr, max_duration=10.0)
        
        # Store data
        audio_data[filename] = {
            'filepath': filepath,
            'audio': processed_audio,
            'sr': sr,
            'info': info
        }
        
        print(f"   ✅ Processed successfully")
        
    except Exception as e:
        print(f"   ❌ Error processing: {e}")

print(f"\n✅ Loaded {len(audio_data)} audio files successfully")

In [None]:
# Visualize the audio waveforms
fig, axes = plt.subplots(len(audio_data), 2, figsize=(15, 4*len(audio_data)))
if len(audio_data) == 1:
    axes = axes.reshape(1, -1)

for i, (filename, data) in enumerate(audio_data.items()):
    audio = data['audio']
    sr = data['sr']
    
    # Waveform
    axes[i, 0].plot(np.linspace(0, len(audio)/sr, len(audio)), audio)
    axes[i, 0].set_title(f'Waveform: {filename}')
    axes[i, 0].set_xlabel('Time (s)')
    axes[i, 0].set_ylabel('Amplitude')
    axes[i, 0].grid(True, alpha=0.3)
    
    # Spectrogram
    D = librosa.amplitude_to_db(np.abs(librosa.stft(audio)), ref=np.max)
    librosa.display.specshow(D, y_axis='hz', x_axis='time', sr=sr, ax=axes[i, 1])
    axes[i, 1].set_title(f'Spectrogram: {filename}')
    axes[i, 1].set_ylabel('Frequency (Hz)')

plt.tight_layout()
plt.show()

# Display audio players
print("🔊 Audio Players:")
for filename, data in audio_data.items():
    print(f"\n🎵 {filename}:")
    display(Audio(data['audio'], rate=data['sr']))

## 4. Extract Embeddings with OpenL3

Now let's extract embeddings using the OpenL3 model, which is excellent for general audio analysis.

In [None]:
# Initialize OpenL3 extractor
print("🔧 Initializing OpenL3 extractor...")

try:
    openl3_extractor = AudioEmbeddingExtractor(
        model_name='openl3',
        input_repr='mel256',
        content_type='music',
        embedding_size=6144
    )
    print("✅ OpenL3 extractor initialized successfully")
    
    # Get model info
    model_info = openl3_extractor.get_embedding_info()
    print(f"\n📊 Model Information:")
    print(f"   Model: {model_info['model_description']}")
    print(f"   Embedding dimension: {model_info['embedding_dimension']}")
    print(f"   Data type: {model_info['embedding_dtype']}")
    
except Exception as e:
    print(f"❌ OpenL3 not available: {e}")
    print("   Falling back to spectrogram-based embeddings...")
    
    openl3_extractor = AudioEmbeddingExtractor(
        model_name='spectrogram',
        n_mels=128
    )
    model_info = openl3_extractor.get_embedding_info()
    print(f"\n📊 Fallback Model Information:")
    print(f"   Model: {model_info['model_description']}")
    print(f"   Embedding dimension: {model_info['embedding_dimension']}")

In [None]:
# Extract embeddings for all audio files
print("🎵 Extracting embeddings with OpenL3...")

openl3_embeddings = {}

for filename, data in audio_data.items():
    print(f"\n🔍 Processing: {filename}")
    
    try:
        # Extract embedding
        embedding = openl3_extractor.extract_embeddings_from_array(
            data['audio'], 
            data['sr']
        )
        
        openl3_embeddings[filename] = embedding
        
        print(f"   ✅ Embedding shape: {embedding.shape}")
        print(f"   📊 Stats: min={embedding.min():.3f}, max={embedding.max():.3f}, mean={embedding.mean():.3f}")
        
    except Exception as e:
        print(f"   ❌ Error: {e}")

print(f"\n✅ Extracted embeddings for {len(openl3_embeddings)} files")

## 5. Extract Embeddings with AudioCLIP

Let's also try AudioCLIP, which provides multi-modal embeddings that can relate audio to text descriptions.

In [None]:
# Initialize AudioCLIP extractor
print("🔧 Initializing AudioCLIP extractor...")

try:
    audioclip_extractor = AudioEmbeddingExtractor(
        model_name='audioclip',
        model_name='microsoft/unispeech-large'
    )
    print("✅ AudioCLIP extractor initialized successfully")
    
    # Get model info
    model_info_audioclip = audioclip_extractor.get_embedding_info()
    print(f"\n📊 AudioCLIP Model Information:")
    print(f"   Model: {model_info_audioclip['model_description']}")
    print(f"   Embedding dimension: {model_info_audioclip['embedding_dimension']}")
    
except Exception as e:
    print(f"❌ AudioCLIP not available: {e}")
    print("   Will use OpenL3 embeddings for comparison...")
    audioclip_extractor = None
    model_info_audioclip = None

In [None]:
# Extract AudioCLIP embeddings if available
audioclip_embeddings = {}

if audioclip_extractor is not None:
    print("🎵 Extracting embeddings with AudioCLIP...")
    
    for filename, data in audio_data.items():
        print(f"\n🔍 Processing: {filename}")
        
        try:
            # Extract embedding
            embedding = audioclip_extractor.extract_embeddings_from_array(
                data['audio'], 
                data['sr']
            )
            
            audioclip_embeddings[filename] = embedding
            
            print(f"   ✅ Embedding shape: {embedding.shape}")
            print(f"   📊 Stats: min={embedding.min():.3f}, max={embedding.max():.3f}, mean={embedding.mean():.3f}")
            
        except Exception as e:
            print(f"   ❌ Error: {e}")
            
    print(f"\n✅ Extracted AudioCLIP embeddings for {len(audioclip_embeddings)} files")
else:
    print("⚠️ Skipping AudioCLIP embeddings (not available)")

## 6. Compare Different Embedding Models

Let's compare the embeddings from different models and see how they differ.

In [None]:
# Compare embedding characteristics
print("📊 Embedding Model Comparison")
print("=" * 50)

comparison_data = []

for filename in openl3_embeddings.keys():
    openl3_emb = openl3_embeddings[filename]
    
    # OpenL3 stats
    comparison_data.append({
        'File': filename,
        'Model': 'OpenL3',
        'Dimensions': openl3_emb.shape[0],
        'Min': openl3_emb.min(),
        'Max': openl3_emb.max(),
        'Mean': openl3_emb.mean(),
        'Std': openl3_emb.std(),
        'L2 Norm': np.linalg.norm(openl3_emb)
    })
    
    # AudioCLIP stats (if available)
    if filename in audioclip_embeddings:
        audioclip_emb = audioclip_embeddings[filename]
        comparison_data.append({
            'File': filename,
            'Model': 'AudioCLIP',
            'Dimensions': audioclip_emb.shape[0],
            'Min': audioclip_emb.min(),
            'Max': audioclip_emb.max(),
            'Mean': audioclip_emb.mean(),
            'Std': audioclip_emb.std(),
            'L2 Norm': np.linalg.norm(audioclip_emb)
        })

# Create comparison DataFrame
df_comparison = pd.DataFrame(comparison_data)
print("\n📋 Embedding Statistics:")
display(df_comparison.round(4))

In [None]:
# Visualize embedding distributions
if len(openl3_embeddings) > 0:
    n_files = len(openl3_embeddings)
    fig, axes = plt.subplots(n_files, 2, figsize=(15, 4*n_files))
    
    if n_files == 1:
        axes = axes.reshape(1, -1)
    
    for i, (filename, embedding) in enumerate(openl3_embeddings.items()):
        # OpenL3 histogram
        axes[i, 0].hist(embedding, bins=50, alpha=0.7, color='blue', label='OpenL3')
        axes[i, 0].set_title(f'OpenL3 Distribution: {filename}')
        axes[i, 0].set_xlabel('Embedding Value')
        axes[i, 0].set_ylabel('Frequency')
        axes[i, 0].grid(True, alpha=0.3)
        
        # Embedding values over dimensions
        axes[i, 1].plot(embedding[:100], alpha=0.8, color='blue', label='OpenL3 (first 100 dims)')  # Show first 100 dimensions
        
        # Add AudioCLIP if available
        if filename in audioclip_embeddings:
            audioclip_emb = audioclip_embeddings[filename]
            axes[i, 0].hist(audioclip_emb, bins=50, alpha=0.7, color='red', label='AudioCLIP')
            axes[i, 1].plot(audioclip_emb[:100], alpha=0.8, color='red', label='AudioCLIP (first 100 dims)')
        
        axes[i, 0].legend()
        axes[i, 1].legend()
        axes[i, 1].set_title(f'Embedding Values: {filename}')
        axes[i, 1].set_xlabel('Dimension')
        axes[i, 1].set_ylabel('Value')
        axes[i, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 7. Save and Load Embeddings

Let's save our embeddings to disk so we can reuse them later without re-computation.

In [None]:
# Create embeddings directory
embeddings_dir = os.path.join(project_root, 'embeddings')
os.makedirs(embeddings_dir, exist_ok=True)

print(f"💾 Saving embeddings to: {embeddings_dir}")

# Save OpenL3 embeddings
for filename, embedding in openl3_embeddings.items():
    # Create metadata
    metadata = {
        'original_file': filename,
        'model': 'OpenL3',
        'model_config': {
            'input_repr': 'mel256',
            'content_type': 'music',
            'embedding_size': embedding.shape[0]
        },
        'extraction_date': pd.Timestamp.now().isoformat(),
        'embedding_shape': embedding.shape,
        'embedding_stats': {
            'mean': float(embedding.mean()),
            'std': float(embedding.std()),
            'min': float(embedding.min()),
            'max': float(embedding.max())
        }
    }
    
    # Save in different formats
    base_name = os.path.splitext(filename)[0]
    
    # Pickle format (includes metadata)
    pickle_path = os.path.join(embeddings_dir, f"{base_name}_openl3.pkl")
    save_embeddings(embedding, pickle_path, metadata, format='pickle')
    
    # NumPy format
    npy_path = os.path.join(embeddings_dir, f"{base_name}_openl3.npy")
    save_embeddings(embedding, npy_path, metadata, format='npy')

# Save AudioCLIP embeddings if available
for filename, embedding in audioclip_embeddings.items():
    metadata = {
        'original_file': filename,
        'model': 'AudioCLIP',
        'extraction_date': pd.Timestamp.now().isoformat(),
        'embedding_shape': embedding.shape,
        'embedding_stats': {
            'mean': float(embedding.mean()),
            'std': float(embedding.std()),
            'min': float(embedding.min()),
            'max': float(embedding.max())
        }
    }
    
    base_name = os.path.splitext(filename)[0]
    pickle_path = os.path.join(embeddings_dir, f"{base_name}_audioclip.pkl")
    save_embeddings(embedding, pickle_path, metadata, format='pickle')

print("\n✅ Embeddings saved successfully!")

# List saved files
saved_files = os.listdir(embeddings_dir)
print(f"\n📁 Saved {len(saved_files)} embedding files:")
for file in sorted(saved_files):
    print(f"   📄 {file}")

In [None]:
# Demonstrate loading embeddings
print("🔄 Testing embedding loading...")

# Load one of the saved embeddings
if saved_files:
    test_file = os.path.join(embeddings_dir, saved_files[0])
    loaded_embedding, loaded_metadata = load_embeddings(test_file)
    
    print(f"\n📊 Loaded embedding from: {saved_files[0]}")
    print(f"   Shape: {loaded_embedding.shape}")
    print(f"   Model: {loaded_metadata.get('model', 'Unknown')}")
    print(f"   Original file: {loaded_metadata.get('original_file', 'Unknown')}")
    print(f"   Extraction date: {loaded_metadata.get('extraction_date', 'Unknown')}")
    
    # Verify it's the same as original
    original_filename = loaded_metadata.get('original_file')
    if original_filename in openl3_embeddings:
        original_embedding = openl3_embeddings[original_filename]
        if np.allclose(loaded_embedding, original_embedding):
            print("   ✅ Loaded embedding matches original!")
        else:
            print("   ❌ Loaded embedding differs from original")
            
print("\n✅ Embedding save/load system working correctly!")

## 8. Visualize Embeddings with t-SNE

Let's use dimensionality reduction to visualize our high-dimensional embeddings in 2D space.

In [None]:
# Prepare data for visualization
try:
    from sklearn.manifold import TSNE
    from sklearn.decomposition import PCA
    
    # Collect all embeddings
    all_embeddings = []
    all_labels = []
    all_models = []
    
    # Add OpenL3 embeddings
    for filename, embedding in openl3_embeddings.items():
        all_embeddings.append(embedding)
        all_labels.append(filename)
        all_models.append('OpenL3')
    
    # Add AudioCLIP embeddings if available
    for filename, embedding in audioclip_embeddings.items():
        all_embeddings.append(embedding)
        all_labels.append(filename)
        all_models.append('AudioCLIP')
    
    if len(all_embeddings) > 1:
        # Convert to matrix
        embeddings_matrix = np.vstack(all_embeddings)
        
        print(f"📊 Visualizing {len(all_embeddings)} embeddings")
        print(f"   Embedding matrix shape: {embeddings_matrix.shape}")
        
        # Apply PCA first to reduce dimensions (for t-SNE efficiency)
        if embeddings_matrix.shape[1] > 50:
            print("🔧 Applying PCA preprocessing...")
            pca = PCA(n_components=50)
            embeddings_pca = pca.fit_transform(embeddings_matrix)
            print(f"   PCA explained variance ratio: {pca.explained_variance_ratio_.sum():.3f}")
        else:
            embeddings_pca = embeddings_matrix
        
        # Apply t-SNE
        print("🔧 Applying t-SNE...")
        tsne = TSNE(n_components=2, random_state=42, perplexity=min(5, len(all_embeddings)-1))
        embeddings_2d = tsne.fit_transform(embeddings_pca)
        
        # Create visualization
        plt.figure(figsize=(12, 8))
        
        # Plot points colored by model
        unique_models = list(set(all_models))
        colors = plt.cm.Set1(np.linspace(0, 1, len(unique_models)))
        
        for model, color in zip(unique_models, colors):
            mask = np.array(all_models) == model
            plt.scatter(
                embeddings_2d[mask, 0], 
                embeddings_2d[mask, 1],
                c=[color], 
                label=model, 
                s=100, 
                alpha=0.7,
                edgecolors='black',
                linewidth=1
            )
        
        # Add labels
        for i, (x, y) in enumerate(embeddings_2d):
            plt.annotate(
                f"{all_labels[i]}\n({all_models[i]})", 
                (x, y), 
                xytext=(5, 5), 
                textcoords='offset points',
                fontsize=8,
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7)
            )
        
        plt.title('Audio Embeddings Visualization (t-SNE)', fontsize=16, fontweight='bold')
        plt.xlabel('t-SNE Component 1')
        plt.ylabel('t-SNE Component 2')
        plt.legend()
        plt.grid(True, alpha=0.3)
        
        # Add text box with info
        info_text = f"Embeddings: {len(all_embeddings)}\nOriginal dims: {embeddings_matrix.shape[1]}\nPCA dims: {embeddings_pca.shape[1]}"
        plt.text(0.02, 0.98, info_text, transform=plt.gca().transAxes, 
                bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8),
                verticalalignment='top', fontsize=10)
        
        plt.tight_layout()
        plt.show()
        
        print("✅ t-SNE visualization complete!")
    else:
        print("⚠️ Need at least 2 embeddings for visualization")
        
except ImportError:
    print("❌ Scikit-learn not available for t-SNE visualization")
    print("   Install with: pip install scikit-learn")
except Exception as e:
    print(f"❌ Error during visualization: {e}")

## 9. Calculate Audio Similarity

Now let's calculate similarity between different audio files using their embeddings.

In [None]:
# Calculate similarity matrices
if len(openl3_embeddings) > 1:
    print("🔍 Calculating audio similarities...")
    
    # Get file names and embeddings
    filenames = list(openl3_embeddings.keys())
    embeddings_list = [openl3_embeddings[f] for f in filenames]
    
    # Calculate similarity matrix using different methods
    similarity_methods = ['cosine', 'euclidean', 'correlation']
    
    fig, axes = plt.subplots(1, len(similarity_methods), figsize=(5*len(similarity_methods), 4))
    if len(similarity_methods) == 1:
        axes = [axes]
    
    for method_idx, method in enumerate(similarity_methods):
        # Calculate similarity matrix
        n_files = len(embeddings_list)
        similarity_matrix = np.zeros((n_files, n_files))
        
        for i in range(n_files):
            for j in range(n_files):
                if i == j:
                    similarity_matrix[i, j] = 1.0  # Perfect similarity with itself
                else:
                    try:
                        sim = compare_embeddings(
                            embeddings_list[i], 
                            embeddings_list[j], 
                            method=method
                        )
                        similarity_matrix[i, j] = sim
                    except:
                        similarity_matrix[i, j] = 0.0
        
        # Plot heatmap
        im = axes[method_idx].imshow(similarity_matrix, cmap='viridis', vmin=0, vmax=1)
        axes[method_idx].set_title(f'{method.title()} Similarity')
        
        # Add text annotations
        for i in range(n_files):
            for j in range(n_files):
                text = axes[method_idx].text(j, i, f'{similarity_matrix[i, j]:.2f}',
                                           ha="center", va="center", color="white", fontweight='bold')
        
        # Set labels
        short_names = [f[:10] + '...' if len(f) > 10 else f for f in filenames]
        axes[method_idx].set_xticks(range(n_files))
        axes[method_idx].set_yticks(range(n_files))
        axes[method_idx].set_xticklabels(short_names, rotation=45, ha='right')
        axes[method_idx].set_yticklabels(short_names)
        
        # Add colorbar
        plt.colorbar(im, ax=axes[method_idx], fraction=0.046, pad=0.04)
    
    plt.tight_layout()
    plt.show()
    
    # Find most and least similar pairs
    print("\n🔍 Similarity Analysis (using cosine similarity):")
    cosine_similarities = []
    
    for i in range(len(filenames)):
        for j in range(i+1, len(filenames)):
            sim = compare_embeddings(embeddings_list[i], embeddings_list[j], method='cosine')
            cosine_similarities.append((filenames[i], filenames[j], sim))
    
    # Sort by similarity
    cosine_similarities.sort(key=lambda x: x[2], reverse=True)
    
    print("\n📊 Most similar pairs:")
    for i, (file1, file2, sim) in enumerate(cosine_similarities[:3]):
        print(f"   {i+1}. {file1} ↔ {file2}: {sim:.3f}")
    
    print("\n📊 Least similar pairs:")
    for i, (file1, file2, sim) in enumerate(cosine_similarities[-3:]):
        print(f"   {i+1}. {file1} ↔ {file2}: {sim:.3f}")
        
else:
    print("⚠️ Need at least 2 audio files for similarity comparison")

In [None]:
# Create a similarity search function
def find_similar_audio(query_embedding, database_embeddings, top_k=3, method='cosine'):
    """
    Find the most similar audio files to a query.
    
    Args:
        query_embedding: Embedding of the query audio
        database_embeddings: Dict of {filename: embedding} for database
        top_k: Number of similar files to return
        method: Similarity method to use
    
    Returns:
        List of (filename, similarity_score) tuples
    """
    similarities = []
    
    for filename, embedding in database_embeddings.items():
        try:
            sim = compare_embeddings(query_embedding, embedding, method=method)
            similarities.append((filename, sim))
        except:
            similarities.append((filename, 0.0))
    
    # Sort by similarity (descending)
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    return similarities[:top_k]

# Demonstrate similarity search
if len(openl3_embeddings) > 1:
    print("🔍 Demonstrating similarity search...")
    
    # Use the first file as query
    query_filename = list(openl3_embeddings.keys())[0]
    query_embedding = openl3_embeddings[query_filename]
    
    # Remove query from database for fair comparison
    database = {k: v for k, v in openl3_embeddings.items() if k != query_filename}
    
    # Find similar files
    similar_files = find_similar_audio(query_embedding, database, top_k=len(database))
    
    print(f"\n🎵 Query: {query_filename}")
    print("📊 Most similar files:")
    
    for i, (filename, similarity) in enumerate(similar_files):
        print(f"   {i+1}. {filename}: {similarity:.3f}")
    
    print("\n✅ Similarity search complete!")
    print("💡 This is the foundation for building a music similarity search engine!")
else:
    print("⚠️ Need multiple audio files for similarity search demo")

## 10. Batch Processing Multiple Files

Let's demonstrate how to efficiently process multiple audio files and build a database of embeddings.

In [None]:
# Simulate batch processing with all available audio files
print("🔄 Demonstrating batch processing...")

# Get all audio files in the data directory
all_audio_files = []
for root, dirs, files in os.walk(data_dir):
    for file in files:
        if any(file.lower().endswith(ext) for ext in audio_extensions):
            all_audio_files.append(os.path.join(root, file))

print(f"📁 Found {len(all_audio_files)} audio files for batch processing")

if len(all_audio_files) > 0:
    # Initialize extractor
    batch_extractor = AudioEmbeddingExtractor(model_name='spectrogram')  # Use fast model for demo
    
    # Process all files
    print("\n🚀 Starting batch extraction...")
    batch_embeddings = batch_extractor.extract_embeddings_batch(all_audio_files)
    
    # Analysis
    successful = sum(1 for v in batch_embeddings.values() if v is not None)
    failed = len(batch_embeddings) - successful
    
    print(f"\n📊 Batch Processing Results:")
    print(f"   ✅ Successful: {successful}")
    print(f"   ❌ Failed: {failed}")
    print(f"   📈 Success rate: {successful/len(batch_embeddings)*100:.1f}%")
    
    # Create embeddings database
    embeddings_database = {
        'metadata': {
            'created_date': pd.Timestamp.now().isoformat(),
            'model': 'spectrogram',
            'total_files': len(all_audio_files),
            'successful_extractions': successful,
            'failed_extractions': failed
        },
        'embeddings': {}
    }
    
    for file_path, embedding in batch_embeddings.items():
        if embedding is not None:
            filename = os.path.basename(file_path)
            embeddings_database['embeddings'][filename] = {
                'file_path': file_path,
                'embedding': embedding,
                'stats': {
                    'shape': embedding.shape,
                    'mean': float(embedding.mean()),
                    'std': float(embedding.std()),
                    'norm': float(np.linalg.norm(embedding))
                }
            }
    
    # Save database
    database_path = os.path.join(embeddings_dir, 'embeddings_database.pkl')
    save_embeddings(
        np.array(list(batch_embeddings.values())),  # Dummy array for the function
        database_path, 
        embeddings_database, 
        format='pickle'
    )
    
    print(f"\n💾 Saved embeddings database to: {database_path}")
    
else:
    print("⚠️ No audio files found for batch processing")

In [None]:
# Create a summary report
print("📋 Creating Summary Report...")
print("=" * 60)

report = {
    'Project': 'Music Embeddings Extraction',
    'Author': 'Sergie Code',
    'Date': pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'),
    'Audio Files Processed': len(audio_data),
    'Embedding Models Used': [],
    'Total Embeddings Generated': 0,
    'Average Embedding Dimension': 0,
    'Available Models': list(AudioEmbeddingExtractor.get_available_models().keys()),
    'Saved Files': len(os.listdir(embeddings_dir)) if os.path.exists(embeddings_dir) else 0
}

# Count embeddings and models
total_embeddings = 0
total_dimensions = 0

if openl3_embeddings:
    report['Embedding Models Used'].append('OpenL3')
    total_embeddings += len(openl3_embeddings)
    total_dimensions += list(openl3_embeddings.values())[0].shape[0]

if audioclip_embeddings:
    report['Embedding Models Used'].append('AudioCLIP')
    total_embeddings += len(audioclip_embeddings)
    total_dimensions += list(audioclip_embeddings.values())[0].shape[0]

report['Total Embeddings Generated'] = total_embeddings
report['Average Embedding Dimension'] = total_dimensions // len(report['Embedding Models Used']) if report['Embedding Models Used'] else 0

# Print report
print("\n🎵 MUSIC EMBEDDINGS EXTRACTION REPORT")
print("=" * 60)
for key, value in report.items():
    if isinstance(value, list):
        print(f"{key:<25}: {', '.join(value) if value else 'None'}")
    else:
        print(f"{key:<25}: {value}")

print("\n🚀 NEXT STEPS")
print("=" * 60)
print("1. 🔍 Build a vector search system using FAISS or Chroma")
print("2. 🌐 Create a REST API for embedding extraction")
print("3. 📊 Develop a web interface for music analysis")
print("4. 🛡️ Implement copyright detection algorithms")
print("5. 🤖 Train custom models for specific music genres")

print("\n✨ PROJECT COMPLETE!")
print("This foundation is ready for building advanced music AI tools.")
print("\n🎓 Created by Sergie Code - AI Tools for Musicians")
print("💡 Subscribe to the YouTube channel for more AI tutorials!")

---

## 🎉 Congratulations!

You've successfully completed the **Music Embeddings Extraction** demo! Here's what you've accomplished:

### ✅ What You've Built

1. **🎵 Audio Processing Pipeline** - Load, preprocess, and analyze audio files
2. **🤖 Embedding Extraction** - Extract high-quality embeddings using OpenL3 and AudioCLIP
3. **💾 Data Management** - Save and load embeddings efficiently
4. **📊 Visualization Tools** - Visualize embeddings in 2D space
5. **🔍 Similarity Search** - Find similar audio files based on embeddings
6. **⚡ Batch Processing** - Process multiple files efficiently

### 🚀 Next Steps for Your Music AI Journey

This project provides the foundation for building:

- **🎯 Music Recommendation Systems**
- **🛡️ Copyright Detection Tools**
- **📱 Music Analysis Apps**
- **🔊 Audio Search Engines**
- **🎼 Composition Analysis Tools**

### 📚 Learning Resources

**By Sergie Code - Software Engineer & AI Educator**

- 🎥 **YouTube Channel**: More AI tutorials for musicians
- 💻 **GitHub**: Find more open-source AI tools
- 🌐 **Community**: Join the discussion on AI in music

---

### 💡 Remember

This is just the beginning! The embeddings you've extracted can power sophisticated music analysis and discovery applications. Keep experimenting and building amazing tools for musicians!

**Happy coding and music making! 🎵🤖**