<a href="https://colab.research.google.com/github/littlecl42/AAI-511-03_Group2/blob/main/notebooks/GroupProject_CNN%26LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Final Team Project Introduction
### Introduction

Music is a form of art that is ubiquitous and has a rich history. Different composers have created music with their unique styles and compositions. However, identifying the composer of a particular piece of music can be a challenging task, especially for novice musicians or listeners. The proposed project aims to use deep learning techniques to identify the composer of a given piece of music accurately.

### Objective

The primary objective of this project is to develop a deep learning model that can predict the composer of a given musical score accurately. The project aims to accomplish this objective by using two deep learning techniques: Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN).

### Project Timeline

- Module 2 (by the end of Week 2): The course instructor will group students into teams of two to three members. Canvas, USD Email, or Slack can be used to find prospective team members.
- Module 4 (by the end of Week 4): Each team's representative will need to submit the "Team Project Status Update Form."
- Module 7 (by the end of Week 7): Each team should submit deliverables for the course project in the final week:

>1.  Project Report
>1. Project Notebook

It is critical to note that no extensions will be given for any of the final projects' due dates for any reason, and final projects submitted after the final due date will not be graded.
Dataset

The project will use a dataset consisting of musical scores from various composers. The dataset Download datasetwill contain MIDI files and sheet music of compositions from well-known classical composers like Bach, Beethoven, Chopin, Mozart, Schubert, etc. The dataset should be labeled with the name of the composer for each score.

### Dataset
The project will use a dataset consisting of musical scores from various composers. Download the dataset from Kaggle websiteLinks to an external site..

The dataset contains the midi files of compositions from well-known classical composers like Bach, Beethoven, Chopin, and Mozart. The dataset should be labeled with the name of the composer for each score. Please only do your prediction only for below composers, therefore you need to select the required composers from the given dataset above.

>1. Bach
>1. Beethoven
>1. Chopin
>1. Mozart

### Methodology

The proposed project will be implemented using the following steps:

1. Data Collection: Data is collected and provided to you.
1. Data Pre-processing: Convert the musical scores into a format suitable for deep learning models. This involves converting the musical scores into MIDI files and applying data augmentation techniques.
1. Feature Extraction: Extract features from the MIDI files, such as notes, chords, and tempo, using music analysis tools.
1. Model Building: Develop a deep learning model using LSTM and CNN architectures to classify the musical scores according to the composer.
1. Model Training: Train the deep learning model using the pre-processed and feature-extracted data.
1. Model Evaluation: Evaluate the performance of the deep learning model using accuracy, precision, and recall metrics.
1. Model Optimization: Optimize the deep learning model by fine-tuning hyperparameters.

### Deliverables

1. Project Report: A comprehensive documentation/report that describes the methodology, data pre-processing steps, feature extraction techniques, model architecture, and training process for reproducibility and future reference. Write your technical report in APA 7 style (here is a Sample Professional Paper format to follow). Please submit the report in PDF format and use the File naming convention DeliverableName-TeamNumber.pdf; for example, Project_Report-Team1.pdf

Your report should:
contain a reference list that includes any external sources, libraries, or frameworks used during the project, including proper citations or acknowledgments.

include a concluding section or markdown cell that summarizes the project, highlights key findings, and suggests any potential future improvements or extensions to the work.

2. Project Notebook: A Jupyter Notebook file (.ipynb) that contains the entire project code, including data pre-processing, feature extraction, model building, training, evaluation, and any additional analysis or visualizations performed during the project.

This deliverable will be exported from a Jupyter Notebook and submitted as a PDF or HTML file.

### Conclusion

The proposed project aims to use deep learning techniques to accurately predict the composer of a given musical score. The project will be implemented using LSTM and CNN architectures and will involve data pre-processing, feature extraction, model building, training, and evaluation. The final model can be used by novice musicians, listeners, and music enthusiasts to identify the composer of a musical piece accurately.

### Power Usage for this Project

You can use Google Colab GPU and TPU in case you need more computation power. Change your runtime in Google Colab notebook to GPU or TPU.
Another option is to buy the subscription in case you need more computational power (recommended).

Please follow this link to do so: Google Colab Pro+.
NOTE: Team members may not get the same grade on the Final Team Project, depending on each team member's level of contribution.

To understand how your work will be assessed, view the assignment rubric on the Final Team Project page.



In [None]:
!pip install pretty_midi


In [None]:
# Required libs for the project
import kagglehub
import pretty_midi
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import re
import difflib
from tqdm import tqdm

In [None]:
# Download latest version
path = kagglehub.dataset_download("blanderbuss/midi-classic-music")
print("Path to dataset files:", path)
midi_path = path

In [None]:
known_composers = [
    "Bach", "Mozart", "Beethoven", "Chopin", "Tchaikovsky", "Handel", "Schubert",
    "Haydn", "Brahms", "Liszt", "Mendelssohn", "Debussy", "Ravel", "Grieg", "Dvorak",
    "Vivaldi", "Stravinsky", "Rachmaninoff", "Mahler", "Shostakovich", "Alkan", "Albeniz",
    "Ambroise", "Arensky", "Arndt", "Bacewitz"
]

def load_midi_files(directory):
    midi_data = []
    bad_files = []  # Track bad files
    for root, _, files in os.walk(directory):
        for file in files:
            if file.lower().endswith((".mid", ".midi")):
                try:
                    full_path = os.path.join(root, file)
                    midi = pretty_midi.PrettyMIDI(full_path)
                    midi_data.append((full_path, file, midi))
                except Exception:
                    bad_files.append(file)
                    continue
    return midi_data, bad_files

In [None]:
def extract_composer_from_path(full_path):
    parts = os.path.normpath(full_path).split(os.sep)
    for part in reversed(parts[:-1]):  # Exclude filename
        for composer in known_composers:
            if composer.lower() in part.lower():
                return composer
    return "Unknown"

In [None]:
def extract_features(midi_dataset):
    rows = []
    for full_path, filename, midi in midi_dataset:

        try:
            pitches = []
            durations = []
            for instrument in midi.instruments:
                for note in instrument.notes:
                    pitches.append(note.pitch)
                    durations.append(note.end - note.start)
            if pitches:
                pitch_mean = np.mean(pitches)
                pitch_std = np.std(pitches)
                pitch_min = np.min(pitches)
                pitch_max = np.max(pitches)
                duration_mean = np.mean(durations)
                duration_std = np.std(durations)
                duration_min = np.min(durations)
                duration_max = np.max(durations)
            else:
                pitch_mean = pitch_std = pitch_min = pitch_max = 0
                duration_mean = duration_std = duration_min = duration_max = 0

            duration = midi.get_end_time()
            tempi = midi.get_tempo_changes()[1]

            if len(tempi) > 0:
                tempo_mean = np.mean(tempi)
                tempo_std = np.std(tempi)
            else:
                tempo_mean = tempo_std = 0

            key_number = midi.key_signature_changes[0].key_number if midi.key_signature_changes else -1

            duration = midi.get_end_time()
            num_instruments = len(midi.instruments)

            tempo_changes = midi.get_tempo_changes()
            tempo = float(np.mean(tempo_changes[1])) if len(tempo_changes[1]) > 0 else np.nan

            notes = [note.pitch for instrument in midi.instruments for note in instrument.notes if not instrument.is_drum]
            avg_pitch = np.mean(notes) if notes else np.nan
            note_density = len(notes) / duration if duration > 0 else 0

            composer = extract_composer_from_path(full_path)

            rows.append({
                'filename': filename,
                'composer': composer,
                'duration': duration,
                'num_instruments': num_instruments,
                'note_count': len(pitches),
                'note_density': note_density,
                'notes': notes,
                'pitch_mean': pitch_mean,
                'pitch_std': pitch_std,
                'pitch_min': pitch_min,
                'pitch_max': pitch_max,
                'avg_pitch': avg_pitch,
                'duration_mean': duration_mean,
                'duration_std': duration_std,
                'duration_min': duration_min,
                'duration_max': duration_max,
                'duration': duration,
                'num_instruments': num_instruments,
                'tempo_mean': tempo_mean,
                'tempo_std': tempo_std,
                'tempo': tempo,
                'key': key_number
            })

        except Exception as e:
            print(f"Error processing {filename}: {e}")
            continue

    return pd.DataFrame(rows)

In [None]:
midi_dataset, bad_files = load_midi_files(midi_path)
print(f"Loaded {len(midi_dataset)} good MIDI files.")
print(f"Skipped {len(bad_files)} broken files.")

In [None]:
df = extract_features(midi_dataset)

In [None]:
composer_counts = df["composer"].value_counts()
print(composer_counts)

In [None]:
# Keep only the selected composers
df = df[df["composer"].isin(["Bach", "Beethoven", "Chopin", "Mozart"])].reset_index(drop=True)

In [None]:
composer_counts = df["composer"].value_counts()
print(composer_counts)

In [None]:
df.head()

In [None]:
# Show basic summary of the data
df.info()
df.describe()

In [None]:
# Number of MIDI files per composer
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x="composer",  hue='composer', palette='Set2', legend=False)
plt.title("Number of MIDI Files per Composer")
plt.xlabel("Composer")
plt.ylabel("Number of Files")
plt.tight_layout()
plt.show()

In [None]:
# Distribution of Duration by Composer
plt.figure(figsize=(10, 8))
sns.boxplot(data=df, x="composer", y="duration", hue='composer', palette='Set2', legend=False)
plt.title("Distribution of Duration by Composer")
plt.xlabel("Composer")
plt.ylabel("Duration (seconds)")
plt.tight_layout()
plt.show()


In [None]:
# Number of Instruments per Composer
plt.figure(figsize=(10, 5))
sns.violinplot(data=df, x="composer", y="num_instruments", inner="quartile", hue='composer', palette='Set2', legend=False)
plt.title("Number of Instruments per Composer")
plt.tight_layout()
plt.show()

In [None]:
# Note Density (notes/sec) by Composer
plt.figure(figsize=(10, 10))
sns.boxplot(data=df, x='composer', y='note_density', hue='composer', palette='Set2', legend=False)
plt.title("Note Density (notes/sec) by Composer")
plt.tight_layout()
plt.show()

In [None]:
# Feature Corrleation Metrix
plt.figure(figsize=(8, 6))
sns.heatmap(df.drop(columns=["filename"]).corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Feature Correlation Matrix")
plt.tight_layout()
plt.show()

In [None]:
# How multiple features relate to each other by composer?
sns.pairplot(df, hue="composer", vars=["duration", "tempo", "avg_pitch", "note_density"])
plt.suptitle("Pairwise Feature Distributions by Composer", y=1.02)
plt.show()

In [None]:
#@title Sequence Length Distribution
sequence_lengths = df['notes'].apply(len)
plt.hist(sequence_lengths, bins=30)
plt.title('Sequence Length Distribution')
plt.xlabel('Length')
plt.ylabel('Frequency')
plt.show()

In [None]:
#@title Pitch Distribution
all_pitches = [pitch for seq in df['notes'] for pitch in seq]
sns.histplot(all_pitches, bins=50)
plt.title('Pitch Distribution')

In [None]:
#@title Composer-wise Note Count
df['note_count'] = df['notes'].apply(len)
sns.boxplot(data=df, x='composer', y='note_count', hue='composer', palette='Set2', legend=False)

In [None]:
#@title Top N Most Frequent Pitches
from collections import Counter
Counter(all_pitches).most_common(10)

In [None]:
# Feature means + std devs for each composer:
summary_stats = df.groupby("composer")[["duration", "tempo", "avg_pitch", "note_density"]].agg(['mean', 'std'])
print(summary_stats)


## Model Building

### CNN & LSTM Hybrid Model

In [None]:
#@title Prepare data arrays
# Pad the sequences with zeros to a uniform length
max_len = df['notes'].apply(len).max()
padded_notes = df['notes'].apply(lambda x: x + [0] * (max_len - len(x)))

X = np.stack(padded_notes.values)
y_labels = df['composer'].values

In [None]:
#@title Encode labels
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_encoded = le.fit_transform(y_labels)

In [None]:
#@title Train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y_encoded,
    test_size=0.2,
    stratify=y_encoded,
    random_state=42
)
# Add channel dimension for CNN
X_train = np.expand_dims(X_train, -1)
X_test = np.expand_dims(X_test, -1)

In [None]:
#@title Define the Hybrid CNN + LSTM Model
import tensorflow as tf
from tensorflow.keras import layers, models

input_length = X_train.shape[1]
vocab_size = len(note2idx) + 1  # for Embedding layer

model = models.Sequential([
    layers.Input(shape=(input_length, 1)),
    layers.Reshape((input_length, 1)),

    # CNN for local feature extraction
    layers.Conv1D(64, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPooling1D(pool_size=2),
    layers.Conv1D(128, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPooling1D(pool_size=2),

    # Flattened CNN output to LSTM
    layers.Bidirectional(layers.LSTM(128, return_sequences=True)),
    layers.LSTM(64),

    layers.Dense(64, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(len(le.classes_), activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

In [None]:
#@title Train the model
callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
    tf.keras.callbacks.ReduceLROnPlateau(patience=2, factor=0.5)
]

history = model.fit(
    X_train, y_train,
    validation_split=0.2,
    epochs=20,
    batch_size=32,
    callbacks=callbacks,
    verbose=0
)

In [None]:
#@title Evaluate the model
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

plt.plot(history.history['accuracy'], label='Train Acc')
plt.plot(history.history['val_accuracy'], label='Val Acc')
plt.legend()
plt.title('Training History')
plt.show()

y_pred = model.predict(X_test).argmax(axis=1)
print(classification_report(y_test, y_pred, target_names=le.classes_))

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', xticklabels=le.classes_, yticklabels=le.classes_)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

The LSTM model demonstrates moderate performance in classifying musical compositions by composer, achieving an overall accuracy of 70%. As shown in the training history, the model converges steadily, with training accuracy reaching approximately 75% and validation accuracy peaking near 69%, though it shows some instability, suggesting mild overfitting. The model performs best on Bach, with a high precision (0.81) and recall (0.94), indicating strong recognition of his compositional style. However, performance significantly drops for Beethoven, Chopin, and Mozart, with low recall and F1-scores (≤ 0.51), implying difficulty in distinguishing among these composers. The confusion matrix confirms this imbalance, as many non-Bach samples are misclassified as Bach. Overall, while the LSTM captures temporal patterns well for Bach, it lacks discriminative power across more stylistically similar composers, indicating a need for improved feature representation, class balancing, or more complex architectures.

