# CBU5201 Mini-Project
**Student Name:** Minghui Pan
**Student ID:** 231220208

## 2. Problem Description
Song identification from humming/whistling audio.

## 3. Methodology
Multi-class classification using audio features.

## 4. ML Pipeline
### 4.1 Feature Transformation Stage
#### 4.1.1 Audio Loading and Preprocessing
- Load WAV files
- Normalize audio signals
#### 4.1.2 Feature Extraction
- Power: signal energy
- Pitch Mean: average fundamental frequency
- Pitch Std: pitch variation
- Voiced Fraction: proportion of voiced regions
#### 4.1.3 Feature Normalization
- StandardScaler normalization
### 4.2 Model Stage
#### 4.2.1 Model Selection
- SVM with RBF kernel
#### 4.2.2 Hyperparameter Tuning
- C parameter
- gamma parameter
#### 4.2.3 Training
- Train on training set
- Validate on validation set
### 4.3 Ensemble Methods Stage
#### 4.3.1 Single Model Approach
- Baseline SVM model

## 5. Dataset

In [13]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os, glob, urllib.request, zipfile
from tqdm import tqdm
import librosa

In [14]:
data_path = '/Users/panmingh/Code/ML_Coursework/Data/MLEndHWII_sample_800'
files = glob.glob(os.path.join(data_path, '*.wav'))
print(f'Total files: {len(files)}')

Total files: 800


In [15]:
data = []
for f in files:
    name = f.split('/')[-1]
    parts = name.split('_')
    song = parts[3].split('.')[0]
    data.append([name, parts[0], parts[1], parts[2], song])
df = pd.DataFrame(data, columns=['file', 'participant', 'type', 'number', 'song']).set_index('file')
df.head()

Unnamed: 0_level_0,participant,type,number,song
file,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S7_hum_4_Married.wav,S7,hum,4,Married
S103_whistle_2_Necessities.wav,S103,whistle,2,Necessities
S33_whistle_2_RememberMe.wav,S33,whistle,2,RememberMe
S29_whistle_2_Necessities.wav,S29,whistle,2,Necessities
S84_hum_2_Feeling.wav,S84,hum,2,Feeling


In [16]:
df['song'].value_counts()

song
Married          100
Necessities      100
RememberMe       100
Feeling          100
Friend           100
Happy            100
TryEverything    100
NewYork          100
Name: count, dtype: int64

## 6. Experiments and Results

In [18]:
from joblib import Parallel, delayed
import warnings
warnings.filterwarnings('ignore')

def get_pitch(x, fs):
    f0, _, _ = librosa.pyin(y=x, fmin=80, fmax=450, sr=fs)
    return f0

def extract_single_feature(file_path, df):
    """提取单个文件的特征"""
    try:
        name = file_path.split('/')[-1]
        song = df.loc[name]['song']
        x, fs = librosa.load(file_path, sr=None)
        f0 = get_pitch(x, fs)
        power = np.sum(x**2)/len(x)
        pitch_mean = np.nanmean(f0) if not np.all(np.isnan(f0)) else 0
        pitch_std = np.nanstd(f0) if not np.all(np.isnan(f0)) else 0
        voiced_fr = np.sum(~np.isnan(f0))/len(f0)
        return [power, pitch_mean, pitch_std, voiced_fr], song
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return None, None

# 使用joblib并行提取特征（兼容Jupyter notebook）
print(f"开始并行提取特征...")
results = Parallel(n_jobs=-1, verbose=10)(
    delayed(extract_single_feature)(f, df) for f in files
)

# 整理结果
X, y = [], []
for features, label in results:
    if features is not None:
        X.append(features)
        y.append(label)

X = np.array(X)
y = np.array(y)
print(f"\n特征提取完成！形状: X={X.shape}, y={y.shape}")

开始并行提取特征...


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 14 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    3.4s
[Parallel(n_jobs=-1)]: Done  13 tasks      | elapsed:    4.7s
[Parallel(n_jobs=-1)]: Done  22 tasks      | elapsed:    6.2s
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    8.1s
[Parallel(n_jobs=-1)]: Done  44 tasks      | elapsed:    9.8s
[Parallel(n_jobs=-1)]: Done  57 tasks      | elapsed:   12.2s
[Parallel(n_jobs=-1)]: Done  70 tasks      | elapsed:   15.1s
[Parallel(n_jobs=-1)]: Done  85 tasks      | elapsed:   17.9s
[Parallel(n_jobs=-1)]: Done 100 tasks      | elapsed:   20.4s
[Parallel(n_jobs=-1)]: Done 117 tasks      | elapsed:   23.2s
[Parallel(n_jobs=-1)]: Done 134 tasks      | elapsed:   25.8s
[Parallel(n_jobs=-1)]: Done 153 tasks      | elapsed:   29.0s
[Parallel(n_jobs=-1)]: Done 172 tasks      | elapsed:   33.2s
[Parallel(n_jobs=-1)]: Done 193 tasks      | elapsed:   37.1s
[Parallel(n_jobs=-1)]: Done 214 tasks      | elapsed:  


特征提取完成！形状: X=(800, 4), y=(800,)


[Parallel(n_jobs=-1)]: Done 800 out of 800 | elapsed:  2.4min finished


In [19]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
model = SVC(C=1, gamma='auto')
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
acc = accuracy_score(y_val, y_pred)
print(f'Validation Accuracy: {acc}')

Validation Accuracy: 0.21666666666666667


## 7. Conclusion
Basic model implemented with limited performance.

## 8. References
Librosa, scikit-learn.