# Chroma Based Christmas Carol Classification v2: Melody Extracts

This version of the model uses audio tracks with highlighted main melody (melody extracts). 

The hypothesis: highlighting the melody will improve chroma feature extraction and clasification of new data. 

In addition to melody extracts, HPCP method for feature extraction and dMax (distance measure) were used to improve predictions.

In [1]:
from ChromaCoverId.chroma_features import ChromaFeatures
from ChromaCoverId.chroma_features import display_chroma
import ChromaCoverId.cover_similarity_measures as sims
import matplotlib.pyplot as plt

In [3]:
import os
import librosa
import numpy as np
import re

# Function to extract features from an audio file
def extract_features(audio_file_path):
    audio_features = ChromaFeatures(audio_file=audio_file_path, mono=True, sample_rate=44100) 
    return audio_features.chroma_hpcp()

# Directory containing your audio files
audio_dir = "koledy/melody_extracts"

# List to store feature matrix X and label vector y
X = []
y = []

tuples2D3D = {}

# Regex pattern to extract label from the filename
pattern = re.compile(r"^(.+?)_\d{1,2}_\d+\.wav$")

# Iterate through audio files in the directory
for filename in os.listdir(audio_dir):
    if filename.endswith(".wav"):
        # Use regex to extract label from the filename
        match = pattern.match(filename)
        if match:
            label = match.group(1)
            
            # Extract features from the audio file
            features = extract_features(os.path.join(audio_dir, filename))
            features_flattened = features.flatten()
            tuples2D3D[np.array2string(features_flattened)]= features
            
            # Append features and label to X and y
            X.append(features_flattened)
            y.append(label)
# Convert lists to NumPy arrays
X = np.array(X)
y = np.array(y)

# Now, X contains your feature matrix, and y contains your label vector


== Audio vector of koledy/melody_extracts/w_zlobie_lezy_18_5.wav loaded with shape (441263,) and sample rate 44100 ==
== Audio vector of koledy/melody_extracts/gdy_sliczna_panna_40_21.wav loaded with shape (441263,) and sample rate 44100 ==
== Audio vector of koledy/melody_extracts/dzisiaj_w_betlejem_32_14.wav loaded with shape (441263,) and sample rate 44100 ==
== Audio vector of koledy/melody_extracts/do_szopy_hej_pasterze_16_1.wav loaded with shape (441263,) and sample rate 44100 ==
== Audio vector of koledy/melody_extracts/gdy_sliczna_panna_15_20.wav loaded with shape (441263,) and sample rate 44100 ==
== Audio vector of koledy/melody_extracts/aniol_pasterzom_mowil_1_4.wav loaded with shape (441263,) and sample rate 44100 ==
== Audio vector of koledy/melody_extracts/aniol_pasterzom_mowil_0_22.wav loaded with shape (441263,) and sample rate 44100 ==
== Audio vector of koledy/melody_extracts/pojdzmy_wszyscy_do_stajenki_0_14.wav loaded with shape (441263,) and sample rate 44100 ==
== 

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# X is  feature matrix and y is  label vector
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Assuming your distance function is named 'calculate_distance'
def calculate_distance(x, y):
    # Reshape back to 3D
    x3D = tuples2D3D[np.array2string(x)]
    y3D = tuples2D3D[np.array2string(y)]
    # Compute cross recurrent plot from two chroma audio feature vectors
    sim_matrix = sims.cross_recurrent_plot(x3D, y3D)
    #Computing qmax audio similarity measure (distance)
    dmax, _ = sims.dmax_measure(sim_matrix)
    return dmax


# Instantiate and train the KNN classifier with custom distance metric
knn_classifier = KNeighborsClassifier(n_neighbors=3, metric=calculate_distance)
knn_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_classifier.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")


Accuracy: 0.7333333333333333


The accuracy is actually lower than for original versions of the songs.

### Example prediction for new data

In [25]:
import IPython
new_audio_file_path = "validation_data/Kapela Beskidy - Oj malućki malućki_melody_extract.wav"
IPython.display.Audio(new_audio_file_path)

In [26]:
new_features = extract_features(new_audio_file_path)
new_features_flattened = new_features.flatten()


tuples2D3D[np.array2string(features_flattened)] = features

X_validate = []
X_validate.append(features_flattened)
X_validate = np.array(X_validate)


# Make predictions using the trained KNN classifier
predicted_label = knn_classifier.predict(X_validate)

print(f"Predicted Label: {predicted_label[0]}")


== Audio vector of validation_data/Kapela Beskidy - Oj malućki malućki_melody_extract.wav loaded with shape (442748,) and sample rate 44100 ==
Predicted Label: aniol_pasterzom_mowil


Incorrect prediction

In [28]:
import IPython
new_audio_path2 = "validation_data/Gdy śliczna Panna - Mazowsze_melody_extract.wav"
IPython.display.Audio(new_audio_path2)

In [29]:
new_features2 = extract_features(new_audio_path2)
new_features2_flattened = new_features2.flatten()


tuples2D3D[np.array2string(new_features2_flattened)] = new_features2

X_validate2 = []
X_validate2.append(new_features2_flattened)
X_validate2 = np.array(X_validate2)


# Make predictions using the trained KNN classifier
p_label2 = knn_classifier.predict(X_validate2)

print(f"Predicted Label: {p_label2}")

== Audio vector of validation_data/Gdy śliczna Panna - Mazowsze_melody_extract.wav loaded with shape (441000,) and sample rate 44100 ==
Predicted Label: ['gdy_sliczna_panna']


Correct prediction!

In [40]:
import IPython
new_audio_path3 = "validation_data/Elżbieta Zającówna - Przybieżeli do Betlejem_melody_extract.wav"
IPython.display.Audio(new_audio_path3)

In [41]:
new_features3 = extract_features(new_audio_path3)
new_features3_flattened = new_features3.flatten()


tuples2D3D[np.array2string(new_features3_flattened)] = new_features3

X_validate3 = []
X_validate3.append(new_features3_flattened)
X_validate3 = np.array(X_validate3)


# Make predictions using the trained KNN classifier
p_label3 = knn_classifier.predict(X_validate3)

print(f"Predicted Label: {p_label3}")

== Audio vector of validation_data/Elżbieta Zającówna - Przybieżeli do Betlejem_melody_extract.wav loaded with shape (441000,) and sample rate 44100 ==
Predicted Label: ['a_wczora_z_wieczora']


Incorrect prediction

## Summary

The accuracy of the model trained on melody extracts is `~73%` which is lower than a model trained on original versions of the christmas carols (`~89%`).

This model seems also not to work well with new data, performed by artists not included in the training dataset. So it may be inferred that the chroma extraction was influenced by features such as music genre, the acompaniment or voice timbre.  