# Deep Learning for the Auto-Generated Music Composition

Name: Jianqiao Li, Zhiying Cui

NetID: jl7136, zc2191

## Introduction

- This project is based on the DeepJ model from the github repository [DeepJ: A model for style-specific music generation](https://github.com/calclavia/DeepJ) with some modifications.
- The reference paper is [Mao HH, Shin T, Cottrell G. DeepJ: Style-specific music generation. In2018 IEEE 12th International Conference on Semantic Computing (ICSC) 2018 Jan 31 (pp. 377-382). IEEE.](https://arxiv.org/abs/1801.00887).

## Goal of This Work

Our model aims to auto-generate a 10 to 30 seconds polyphony given a specific music style and random formatted inputs. Objectives are as follows:
- Rebuild the DeepJ model on our local environment and compare the results with the authors’ model.
- Replace the one-hot encoding strategy on style generation with a pre-trained model.
- Try to develop a sparse representation of music as the authors recommended.

## Development Environment

- Python version: Python 3.6.
    - Considering the incompatible version of `grpcio` in Python 3.5 for TensorFlow 2, we decide to use Python 3.6 which is different from the DeepJ repo.
- Framework: TensorFlow.
- Environment: Google Cloud.

### Some useful tips for setting up

- Set up a jupyter server on Google Cloud: 
    - [Running Jupyter Notebook on Google Cloud Platform in 15 min](https://towardsdatascience.com/running-jupyter-notebook-in-google-cloud-platform-in-15-min-61e16da34d52).
- Add a different version of kernel:
    - [How to add python 3.6 kernel alongside 3.5 on jupyter](https://stackoverflow.com/questions/43759610/how-to-add-python-3-6-kernel-alongside-3-5-on-jupyter).
    - [Jupyter Notebook Kernels: How to Add, Change, Remove](https://queirozf.com/entries/jupyter-kernels-how-to-add-change-remove).
    - [Run Jupyter Notebook script from terminal](https://deeplearning.lipingyang.org/2018/03/28/run-jupyter-notebook-script-from-terminal/).
    
### Requirements

- Install dependencies for the DeepJ.
```
pip install --ignore-installed -r requirements.txt
```
- Install `python-midi` module. The original [python-midi](https://github.com/vishnubob/python-midi) is no longer maintained. We have to find an alternative python-midi from the following repos:
    - ✅ Candidate 1: https://github.com/louisabraham/python3-midi
    - ❓ Candidate 2: https://github.com/sniperwrb/python-midi
    - ❓ Candidate 3: https://github.com/jameswenzel/mydy
- Download the dataset to `data` folder. The Midi files come from [Piano-midi](http://www.piano-midi.de/).
- Details about the project directory:

In [None]:
import os

print('Project Directory')
os.chdir('/home/choi/DLFinalProject/')
!tree -L 1

Project Directory
[01;34m.[00m
├── LICENSE
├── README.md
├── [01;34m__pycache__[00m
├── [01;34marchives[00m
├── constants.py
├── [01;34mdata[00m
├── dataset.py
├── distribution.py
├── download.py
├── generate.py
├── [01;34mimages[00m
├── main.ipynb
├── midi_util.py
├── model.py
├── nohup.out
├── [01;34mout[00m
├── requirements.txt
├── [01;34mscripts[00m
├── test.py
├── train.py
├── util.py
└── visualize.py

6 directories, 16 files


Before getting start, import all dependencies and modules required for this work.

In [None]:
import tensorflow as tf
import numpy as np
from keras.callbacks import ModelCheckpoint
from keras.callbacks import EarlyStopping, TensorBoard

from constants import *             # store constant parameters for the model
from dataset import *               # load dataset and parse to formated inputs
from generate import *              # generate music
from model import *                 # model architectures
from midi_util import midi_encode   # util funcs for midi
from util import *                  # util funcs

print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.6.2


## Objective 1: Rebuild the DeepJ Model

### Dataset

We adopted the same dataset as the authors used in training DeepJ. [Piano-midi](http://www.piano-midi.de/midicoll.htm) contains a dataset of classical piano solo pieces. The pieces of each composer are recorded by using a Midi sequencer. There are 571 pieces composed by 26 composers with a total duration of 36.7 hours of MIDI files in this dataset till Feb. 2020. There are also some alternative datasets such as [mfiles](https://www.mfiles.co.uk/).

Considering the computing power of our Google Cloud server, we eliminated one genre "baroque" and reduce the number of composers to 6. Details can be found in `constant.py` and all Midi data is saved in `data` folder. Before focusing on the model, we first need to download the dataset and parse them into formatted inputs. All utility functions to process the Midi file are coded in `dataset.py`. Several functions need to be paid attention to:

- `load_all(styles, batch_size, time_steps)`: Load all Midi files and parse them into four inputs, that are `note_data`, `note_target`, `beat_data`, `style_data`, and one label `note_target`.
- `clamp_midi(sequence)`: Clamp the Midi based on the `MIN` and `MAX` notes. In the paper, the authors truncate a standard pitch to range from 36 to 84 to reduce the input dimension. 
- `stagger(data, time_steps)`: Chop the sequence data by `time_steps`. This function returns two variables: `dataX` the sequence of data in the current time step, and `dataY` the sequence of data in the next time step which is the predicted target.

Now, load all Midi files in `data/` folder and see how each input data looks like.

In [None]:
# load data
train_data, train_labels = load_all(styles, BATCH_SIZE, SEQ_LEN) # actually we can safely remove BATCH_SIZE

In [None]:
print("note_data:", train_data[0].shape)
print("note_target:", train_data[1].shape) # aka train_labels
print("beat_data:", train_data[2].shape)
print("style_data:", train_data[3].shape)

note_data: (2306, 128, 48, 3)
note_target: (2306, 128, 48, 3)
beat_data: (2306, 128, 16)
style_data: (2306, 128, 6)


Noted that all constant parameters for the model are saved in `constants.py` file. The following table gives the meaning of each variable. (Here we use syntax of Java to present the constant type).

| Variable          | Value/Type        | Representation            |
| :---------------- | :---------------: | :------------------------ |
| genre             | List<String>      | Genre of music            |
| styles            | List<List<String>>| Directory of dataset      |
| NUM_STYLES        | styles.size()     | Numbers of styles         |
| DEFAULT_RES       | 96                | Resolution                |
| MIDI_MAX_NOTES    | 128               | Notes range [1, 128]      |
| MAX_VELOCITY      | 127               | Velocity range [0, 127]   |
| NUM_OCTAVES       | 4                 | Number of octaves         |
| OCTAVE            | 12                | Notes in every octave     |
| MIN_NOTE          | 36                | Minimum note              |
| MAX_NOTE          | MIN_NOTE + NUM_OCTAVES * OCTAVE       | Maximum note                  |
| NUM_NOTES         | MAX_NOTE - MIN_NOTE   | Number of notes between MIN_NOTE and MAX_NOTE |
| BEATS_PER_BAR     | 4                 | Number of beats in a bar  |
| NOTES_PER_BEAT    | 4                 | Notes per quarter note    |
| NOTES_PER_BAR     | NOTES_PER_BEAT * BEATS_PER_BAR    | The quickest note is a half-note  |
| BATCH_SIZE        | 16                | Training batch size       |
| SEQ_LEN           | 8 * NOTES_PER_BAR | Data sequence length      |
| OCTAVE_UNITS      | 64                | Dim of hyperparameter in octave convolution layer |
| STYLE_UNITS       | 64                | Dim of hyperparameter in style embedding          |
| NOTE_UNITS        | 3                 | Three outputs: play prob, replay prob and dynamics|
| TIME_AXIS_UNITS   | 256               | Dim of hyperparameter used for LSTMs in Time-Axis |
| NOTE_AXIS_UNITS   | 128               | Dim of hyperparameter used for LSTMs in Note-Axis |


### Model architecture

The DeepJ architecture is the following.

![DeepJ](./images/architecture.png)

The corresponding codes are written in `model.py`.

In [None]:
def build_models(time_steps=SEQ_LEN, input_dropout=0.2, dropout=0.5):
    """
    Build the LSTM model
    """
    notes_in = Input((time_steps, NUM_NOTES, NOTE_UNITS))   # Note input
    beat_in = Input((time_steps, NOTES_PER_BAR))            # Context
    style_in = Input((time_steps, NUM_STYLES))              # Style
    # Target input for conditioning, feed-forward
    chosen_in = Input((time_steps, NUM_NOTES, NOTE_UNITS))  # Chosen notes

    # Dropout inputs
    notes = Dropout(input_dropout)(notes_in)
    beat = Dropout(input_dropout)(beat_in)
    chosen = Dropout(input_dropout)(chosen_in)

    # Distributed representations
    style_l = Dense(STYLE_UNITS, name='style')
    style = style_l(style_in)

    """ Time axis """
    time_out = time_axis(dropout)(notes, beat, style)

    """ Note Axis """
    naxis = note_axis(dropout)              # 1D Convolution

    """ Prediction Layer """
    notes_out = naxis(time_out, chosen, style)

    """ Build Model """
    model = Model([notes_in, chosen_in, beat_in, style_in], [notes_out])
    model.compile(optimizer='nadam',        # Nesterov Adam optimizer
                  loss=[primary_loss])      # Loss function

    """ Generation Models """
    time_model = Model([notes_in, beat_in, style_in], [time_out])

    note_features = Input((1, NUM_NOTES, TIME_AXIS_UNITS), name='note_features')
    chosen_gen_in = Input((1, NUM_NOTES, NOTE_UNITS), name='chosen_gen_in')
    style_gen_in = Input((1, NUM_STYLES), name='style_in')

    # Dropout inputs
    chosen_gen = Dropout(input_dropout)(chosen_gen_in)
    style_gen = style_l(style_gen_in)

    note_gen_out = naxis(note_features, chosen_gen, style_gen)
    note_model = Model([note_features, chosen_gen_in, style_gen_in], note_gen_out)

    return model, time_model, note_model

In [None]:
models = build_models()
models[0].summary() # LSTM model: params 1,268,388

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 128, 48, 3)] 0                                            
__________________________________________________________________________________________________
input_3 (InputLayer)            [(None, 128, 6)]     0                                            
__________________________________________________________________________________________________
dropout (Dropout)               (None, 128, 48, 3)   0           input_1[0][0]                    
__________________________________________________________________________________________________
style (Dense)                   multiple             448         input_3[0][0]                    
______________________________________________________________________________________________

In [None]:
models[1].summary() # Time axis: params 912,606
models[2].summary() # Note axis: params 356,230

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 128, 48, 3)] 0                                            
__________________________________________________________________________________________________
input_3 (InputLayer)            [(None, 128, 6)]     0                                            
__________________________________________________________________________________________________
dropout (Dropout)               (None, 128, 48, 3)   0           input_1[0][0]                    
__________________________________________________________________________________________________
style (Dense)                   multiple             448         input_3[0][0]                    
____________________________________________________________________________________________

### Training

Training was performed using stochastic gradient descent with the Nesterov Adam optimizer. The loss function is as follows:

$$
\begin{equation*}
    \begin{split}
        & L_{play} = \sum {t_{play}\log y_{play} + (1 - t_{play}) log(1-y_{play})}\\
        & L_{rplay} = \sum {t_{play}(t_{rplay}\log y_{rplay} + (1 - t_{rplay}) log(1 - y_{rplay}))}\\
        & L_{dynamics} = \sum {t_{play}(t_{dynamics} - y_{dynamics})^2} \\
        & L_{primary} = L_{play} + L_{rplay} + L_{dynamics}
    \end{split}
\end{equation*}
$$

Play and replay are treated as logistic regression problems trained using binary cross entropy, as defined in a Biaxial LSTM. Dynamics(velocity) is trained using mean squared error. Training module is in file `train.py`.

Noted that we decrease the `patience` parameter in `EarlyStopping` to 3, and reduce the number of `epochs` to 100. In our preliminary experiment, we find that it commonly spends ~800s for training one epoch on our server. And it requires the epochs ~120 to get the relatively optimal model.

In [None]:
def train(models):
    cbs = [
        ModelCheckpoint(MODEL_FILE, monitor='loss', save_best_only=True, save_weights_only=True),
        EarlyStopping(monitor='loss', patience=3),
        TensorBoard(log_dir='out/logs', histogram_freq=1)
    ]

    print('Training')
    models[0].fit(train_data, train_labels, epochs=200, callbacks=cbs, batch_size=BATCH_SIZE)

train(models)

### Generation

After having the trained model, we need to auto generate the music. Authors performed generation by sampling from the model’s probability distribution using a coin flip to determine whether to play a note or not. After deciding to play a note, they sample from the replay probability to determine if the note should be re-attacked. Dynamics level is directly used from the model given that the note is played.

In our work, we decide to use a different method to generate music. We are going to provide a piece of Midi file cut from the training dataset and observe how does the model work. Further, in DeepJ model, authors use an adaptive temperature adjustment to avoid long period of silence, which is a tricky and smart method we adopt the same. All util functions related to music generation are in `generate.py`.

In [None]:
def generateModified(models, num_bars, styles, start_notes):
    print('Generating with styles:', styles)

    _, time_model, note_model = models
    generations = [MusicGeneration(style) for style in styles]

    for t in tqdm(range(NOTES_PER_BAR * num_bars)):
        # Produce note-invariant features
        ins = process_inputs([g.build_time_inputs() for g in generations])
        
        # Use starts notes
        ins[0][0] = start_notes[t]
        ins[0][1] = start_notes[t]

        # Pick only the last time step
        note_features = time_model.predict(ins)
        note_features = np.array(note_features)[:, -1:, :]

        # Generate each note conditioned on previous
        for n in range(NUM_NOTES):
            ins = process_inputs([g.build_note_inputs(note_features[i, :, :, :]) for i, g in enumerate(generations)])
            predictions = np.array(note_model.predict(ins))

            for i, g in enumerate(generations):
                # Remove the temporal dimension
                g.choose(predictions[i][-1], n)

        # Move one time step
        yield [g.end_time(t) for g in generations]

# generate music
print('Load model from file.')
models[0].load_weights(MODEL_FILE)

stylesGene = [compute_genre(i) for i in range(len(genre))]

write_file('output', generateModified(models, 32, stylesGene, train_data[0]))

## Objective 2: Music Genre Classification

As we can see above, the DeepJ model actually generates music by music genre rather than one specific composer. It mixes all composers' composition styles under the same genre into one-hot encoding. From the perspective of the dataset, it seems insignificant to train the model using the dataset classified by composers. 

Besides, the music genres such as Baroque, Classicism, and Romanticism are known as the different periods of ages. It is hard for a human with basic music knowledge to distinguish which period a given piece belongs to. But for a deep learning model, it is probably able to distinguish the certain pattern behind the notes and beats.

Our next step is to pre-train a music genre classification to replace the style encoding `style_in` in the DeepJ model. When we start this part of work, our server is suspended by Google because of suspicious coin mining activity. So, we decide to continue our work on the Colab.

There are some excellent models for music genre classification models by using CNN, which identify spectrograms of various music genres, such as references [Music Genres Classification using Deep learning techniques](https://www.analyticsvidhya.com/blog/2021/06/music-genres-classification-using-deep-learning-techniques/), [Music Genre Classification](http://cs229.stanford.edu/proj2018/report/21.pdf). Since our dataset is represented by notes and beats matrixes. We are going to build an SVM music genre classification model. In the work by [Chet N. Gnegy](http://cs229.stanford.edu/proj2014/Chet%20Gnegy,Classification%20Of%20Musical%20Playing%20Styles.pdf), the author concluded that by using the right feature extraction, the accuracy of the SVM genre classification model can exceed 90%. In our music genre classification, we adopt the same feature extraction method in this repo [Midi Classification Tutorial](https://github.com/sandershihacker/midi-classification-tutorial).

For the style encoding `style_in` in our model, we literally want to use it to represent what a piece of music sounds like by a human. Therefore, the style encoding is supposed to be a vector with the value of the probability of the music genre. However, the SVM models don’t output probabilities natively. We have to convert the output to class probabilities. Among all possible approaches, the Platt scaling is particularly suitable for SVMs, referenced from [Can you interpret probabilistically the output of a Support Vector Machine?](https://mmuratarat.github.io/2019-10-12/probabilistic-output-of-svm).


### Import modules

In [46]:
!pip install pretty_midi



In [63]:
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
import pretty_midi
import warnings
import os

Mount the google drive. The working directory in the google drive is located at `/content/drive/MyDrive/DLFinalProject`.

In [48]:
# mount google drive
from google.colab import drive
drive.mount('/content/drive')
# change to the working dirctory
%cd /content/drive/MyDrive/DLFinalProject

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/DLFinalProject


### Download and parse the dataset

Here, we are going to use another dataset to evaluate the performance of the music genre classification. This dataset contains more genres and more durations of music pieces than the piano-midi dataset, which is better to visualize how this model works.

#### Download dataset

- Genre labels: [tagtraum industries](http://www.tagtraum.com/msd_genre_datasets.html) -> Genre Ground Truth -> "CD1"
- Midi files: [The Lakh MIDI Dataset](http://colinraffel.com/projects/lmd/) -> LMD-matched

In [6]:
# genre labels
!wget https://www.tagtraum.com/genres/msd_tagtraum_cd1.cls.zip
!unzip msd_tagtraum_cd1.cls.zip

--2021-12-19 15:07:02--  https://www.tagtraum.com/genres/msd_tagtraum_cd1.cls.zip
Resolving www.tagtraum.com (www.tagtraum.com)... 81.169.145.77, 2a01:238:20a:202:1077::
Connecting to www.tagtraum.com (www.tagtraum.com)|81.169.145.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1183001 (1.1M) [application/zip]
Saving to: ‘msd_tagtraum_cd1.cls.zip’


2021-12-19 15:07:04 (1.71 MB/s) - ‘msd_tagtraum_cd1.cls.zip’ saved [1183001/1183001]



Map the genre to index.

In [49]:
def get_genres(path):
    ids = []
    genres = []
    with open(path) as f:
        line = f.readline()
        while line:
            if line[0] != '#':
                [x, y, *_] = line.strip().split("\t")
                ids.append(x)
                genres.append(y)
            line = f.readline()
    genre_df = pd.DataFrame(data={"Genre": genres, "TrackID": ids})
    return genre_df

# get the genre dataFrame
genre_path = "msd_tagtraum_cd1.cls"
genre_df = get_genres(genre_path)

# mapping
label_list = list(set(genre_df.Genre))
label_dict = {lbl: label_list.index(lbl) for lbl in label_list}

print(label_dict)

      Genre             TrackID
0  Pop_Rock  TRAAAAK128F9318786
1       Rap  TRAAAAW128F429D538
2  Pop_Rock  TRAAABD128F429CF47
3      Jazz  TRAAAED128E0783FAB
4  Pop_Rock  TRAAAEF128F4273421 

['Pop_Rock', 'Country', 'International', 'Jazz', 'Electronic', 'Rap', 'Reggae', 'RnB', 'Latin', 'Blues', 'New Age', 'Vocal', 'Folk'] 

{'Pop_Rock': 0, 'Country': 1, 'International': 2, 'Jazz': 3, 'Electronic': 4, 'Rap': 5, 'Reggae': 6, 'RnB': 7, 'Latin': 8, 'Blues': 9, 'New Age': 10, 'Vocal': 11, 'Folk': 12} 



In [None]:
# midi files
!wget http://hog.ee.columbia.edu/craffel/lmd/lmd_matched.tar.gz
!tar -xzvf lmd_matched.tar.gz -C /content/drive/MyDrive/DLFinalProject/

Map midi filepath to genre.

In [50]:
def get_matched_midi(midi_folder, genre_df):
    track_ids, file_paths = [], []
    for dir_name, subdir_list, file_list in os.walk(midi_folder):
        if len(dir_name) == 36:
            track_id = dir_name[18:]
            file_path_list = ["/".join([dir_name, file]) for file in file_list]
            for file_path in file_path_list:
                track_ids.append(track_id)
                file_paths.append(file_path)
    all_midi_df = pd.DataFrame({"TrackID": track_ids, "Path": file_paths})
    # join with genre
    df = pd.merge(all_midi_df, genre_df, on='TrackID', how='inner')
    return df.drop(["TrackID"], axis=1)

# mapping
midi_path = "lmd_matched"
matched_midi_df = get_matched_midi(midi_path, genre_df)

print(matched_midi_df.head())
print(matched_midi_df.shape)

                                                Path          Genre
0  lmd_matched/L/L/M/TRLLMMQ128F423119C/9cafb699c...       Pop_Rock
1  lmd_matched/L/L/Q/TRLLQXV128E07943FB/fe9fee992...       Pop_Rock
2  lmd_matched/L/L/R/TRLLRHS128E079431A/0769d0162...       Pop_Rock
3  lmd_matched/L/L/Y/TRLLYMC128F146EB42/c3ad21f1c...  International
4  lmd_matched/L/L/Y/TRLLYMC128F146EB42/198c242da...  International
(13360, 2)


#### Extract features

The size of total midi files is so large that we do not have enough RAM to train the model on the colab. Therefore, we have to cut down the number and randomly select 1000 samples for our music genere classification model.

In [51]:
def getRandomIndex(n, x):
    index = np.random.choice(np.arange(n), size=x, replace=False)
    return index

data_index = getRandomIndex(len(matched_midi_df), 1000)
matched_midi = pd.DataFrame(matched_midi_df, index=data_index)
print(matched_midi.shape)

1000
(1000, 2)


The next step is to extract appropriate features from the midi file. Differ from the DeepJ model, in the music genre classification we need to decide to use another set of features that are more likely contributing to classification. Taking the advantages of python packages `python-midi`, we decide to use similar features as the author in [Midi Classification Tutorial](https://github.com/sandershihacker/midi-classification-tutorial). He has proved this set of features is able to achieve more than 70% precision. Features are as follows, details for each feature please refer to [pretty-midi](https://craffel.github.io/pretty-midi/):
- Tempo
- Number of chord signature changes
- Resolution
- Time signature

In [52]:
%%time
def normalize_features(features):
    """
    range [-1, 1]
    """
    tempo = (features[0] - 150) / 300
    num_sig_changes = (features[1] - 2) / 10
    resolution = (features[2] - 260) / 400
    time_sig_1 = (features[3] - 3) / 8
    time_sig_2 = (features[4] - 3) / 8
    return [tempo, resolution, time_sig_1, time_sig_2]


def get_features(path):
    try:
        with warnings.catch_warnings():
            warnings.simplefilter("error")
            file = pretty_midi.PrettyMIDI(path)
            
            tempo = file.estimate_tempo()
            num_sig_changes = len(file.time_signature_changes)
            resolution = file.resolution
            ts_changes = file.time_signature_changes
            ts_1 = 4
            ts_2 = 4
            if len(ts_changes) > 0:
                ts_1 = ts_changes[0].numerator
                ts_2 = ts_changes[0].denominator
            return normalize_features([tempo, num_sig_changes, resolution, ts_1, ts_2])
    except:
        return None


def extract_midi_features(path_df):
    all_features = []
    for index, row in path_df.iterrows():
        features = get_features(row.Path)   # [tempo, num_sig_changes, resolution, ts_1, ts_2]
        genre = label_dict[row.Genre]       # [label]
        if features is not None:
            features.append(genre)
            all_features.append(features)
    return np.array(all_features)

labeled_features = extract_midi_features(matched_midi_df) # [tempo, num_sig_changes, resolution, ts_1, ts_2, label]
print(labeled_features)

[[ 0.29766837  0.55        0.125       0.125       0.        ]
 [ 0.31238361  0.55        0.125       0.125       0.        ]
 [ 0.0283018  -0.35        0.125       0.125       2.        ]
 ...
 [ 0.04577074  0.31        0.125       0.125       0.        ]
 [ 0.17373668 -0.35        0.125       0.125       0.        ]
 [ 0.3973189   1.75        0.125       0.125       0.        ]]
CPU times: user 50min 37s, sys: 3min 23s, total: 54min 1s
Wall time: 52min 1s


The overall feature extraction procedure takes about one hours.

#### Partition dataset

We will split the dataset into training datasets: validation dataset: test dataset = 6:2:2, and partition them to `x` and `y`, where `x` is the features from `labeled_features` and `y` is the music genre label.

Since the whole dataset is selected by using random indexes, there is no need to shuffle the dataset again.

In [60]:
# split the dataset: train_labeled_features, valid_labeled_features, test_labeled_features
num = len(labeled_features)
num_train = int(num * 0.6)
num_valid = int(num * 0.8)
train_labeled_features = labeled_features[:num_train]
valid_labeled_features = labeled_features[num_train:num_valid]
test_labeled_features = labeled_features[num_valid:]

# format to x and y
cols = train_labeled_features.shape[1] - 1
x_train = train_labeled_features[:, :cols]
x_valid = valid_labeled_features[:, :cols]
x_test = test_labeled_features[:, :cols]

# format features for multi-class classification
num_classes = len(label_list)
y_train = train_labeled_features[:, cols].astype(int)
y_valid = valid_labeled_features[:, cols].astype(int)
y_test = test_labeled_features[:, cols].astype(int)

print(test_labeled_features[:10])
print(y_test[:10])

[[ 0.23680891  0.31        0.125       0.125       0.        ]
 [ 0.25503977  0.31        0.125       0.125       0.        ]
 [ 0.09117949 -0.35        0.125       0.125      11.        ]
 [ 0.11965721 -0.17        0.125       0.125       0.        ]
 [ 0.18804874 -0.17        0.125       0.125       0.        ]
 [ 0.20735881 -0.17        0.125       0.125       0.        ]
 [ 0.23354232 -0.575       0.125       0.125       0.        ]
 [ 0.1413318  -0.17        0.125       0.125       0.        ]
 [ 0.14782987  0.55        0.125       0.125       0.        ]
 [ 0.14782987  0.55        0.125       0.125       0.        ]]
[ 0  0 11  0  0  0  0  0  0  0]


### Training

Now, we are going to train an SVM model genre classification using `scikit-learn`. Take the advantage of `CalibratedClassifierCV`, we can easily have the probabilty of each genre that the model predicts.

In [70]:
def get_accuracy(y_true, y_pred):
    return np.sum(np.equal(y_true, y_pred)) / len(y_true);

def train_model(x_train, y_train, x_valid, y_valid):
    clf_svm = LinearSVC()
    clf = CalibratedClassifierCV(clf_svm)   # outputs the probability
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_valid)
    print("Accuracy on validation dataset:", get_accuracy(y_valid, y_pred))
    return clf

classifier = train_model(x_train, y_train, x_valid, y_valid)

# svm = LinearSVC()
# clf = CalibratedClassifierCV(svm) 
# clf.fit(x_train, y_train)
# y_proba = clf.predict_proba(x_valid)



Accuracy: 0.7096645367412141




### Evaluation

Evaluate the model using test dataset.

In [73]:
y_pred = classifier.predict(x_test)
print("Accuracy on test dataset:", get_accuracy(y_test, y_pred))

Accuracy on test dataset: 0.7345309381237525


Make a prediction on a specific midi file.

In [83]:
def make_prediction_prob(clf, midi_path, label_list=label_list):
    x = get_features(midi_path)
    y_pred = clf.predict([x])
    y_pred_prob = clf.predict_proba([x])
    index = np.argmax(y_pred[0])
    label = label_list[y_pred[0]]
    return label, y_pred, y_pred_prob
    
# Make a Prediction
test_midi_path = "/content/drive/MyDrive/DLFinalProject/lmd_matched/L/L/Y/TRLLYMC128F146EB42/198c242dad2a442219463683abe602fd.mid"
label, y_pred, y_pred_prob = make_prediction_prob(classifier, test_midi_path)
print(label)
print("Predict index:", y_pred)
print("Predict probability:", y_pred_prob)

Pop_Rock
Predict index: [0]
Predict probability: [[0.74334592 0.08222873 0.00355892 0.03770512 0.02307281 0.01321319
  0.01109741 0.02731219 0.02011482 0.00104432 0.02143032 0.0049173
  0.01095896]]


## Summary
