# Deep Learning for the Auto-Generated Music Composition

Name: Jianqiao Li, Zhiying Cui

NetID: jl7136, zc2191

## Introduction
- This project is based on the DeepJ model from the [github repository](https://github.com/calclavia/DeepJ).
- The reference paper is [Mao HH, Shin T, Cottrell G. DeepJ: Style-specific music generation. In2018 IEEE 12th International Conference on Semantic Computing (ICSC) 2018 Jan 31 (pp. 377-382). IEEE.](https://arxiv.org/abs/1801.00887).

## Goal of This Work
Our model aims to auto-generate a 10 to 30 seconds polyphony given a speciﬁc style and random formatted inputs. Polyphony is the abbreviation of single voice polyphony. It is a sequence of notes for a single instrument, where more than one note can be played at the same time.
- Rebuild the DeepJ model on our local environment and compared the results with the authors’ model.
- Replace the one-hot encoding strategy on style generation with pre-trained model.
- Try to develop a sparse representation of music as the authors recommend.

## Development Environment

- Python version: Python 3.6
    - Consider the imcompatable version of `grpcio` in Python 3.5 for the tensorflow, we decide to use Python 3.6
- Framework: tensorflow
- Environment: Google Cloud
- Set up a jupyter server: [Running Jupyter Notebook on Google Cloud Platform in 15 min](https://towardsdatascience.com/running-jupyter-notebook-in-google-cloud-platform-in-15-min-61e16da34d52)
- Set up a different version of kernel
    - [How to add python 3.6 kernel alongside 3.5 on jupyter](https://stackoverflow.com/questions/43759610/how-to-add-python-3-6-kernel-alongside-3-5-on-jupyter)
    - [Jupyter Notebook Kernels: How to Add, Change, Remove](https://queirozf.com/entries/jupyter-kernels-how-to-add-change-remove)
    - [Run Jupyter Notebook script from terminal](https://deeplearning.lipingyang.org/2018/03/28/run-jupyter-notebook-script-from-terminal/)
- Install dependencies for the DeepJ
```
pip install --ignore-installed -r requirements.txt
```
- Set up python-midi
    - The original [python-midi](https://github.com/vishnubob/python-midi) is no longer maintained. We have to find an alternative python-midi which is not only compatible with Python 3 but also suitable to the DeepJ model
    - ✅ Candidate 1: https://github.com/louisabraham/python3-midi
    - ❓ Candidate 2: https://github.com/sniperwrb/python-midi
    - ❓ Candidate 3: https://github.com/jameswenzel/mydy
- Download the dataset to `data/` folder
    - Piano-Midi: http://www.piano-midi.de/
- Directory of the project

In [None]:
import os

print('Project Directory')
os.chdir('/home/choi/DLProject/')
!tree -L 1

## Set Up the DeepJ Model

In [None]:
import tensorflow as tf
import numpy as np
from keras.layers import Input, LSTM, Dense, Dropout, Lambda, Reshape, Permute
from keras.layers import TimeDistributed, RepeatVector, Conv1D, Activation
from keras.layers import Embedding, Flatten
from keras.layers.merge import Concatenate, Add
from keras.models import Model
import keras.backend as K
from keras import losses
from keras.callbacks import ModelCheckpoint, LambdaCallback
from keras.callbacks import EarlyStopping, TensorBoard

import argparse
import midi
import os

from dataset import *              # load dataset
from generate import *             # generate music
from midi_util import midi_encode
from model import *                # utils used in building models
from util import *


### Model Description

Consider the computing expense, we eliminated the genre "baroque".
- A summarized dataset for [Piano-Midi](http://www.piano-midi.de/): https://www.kaggle.com/soumikrakshit/classical-music-midi
- Some other source of dataset
    - https://www.mfiles.co.uk/
    - https://www.kaggle.com/programgeek01/anime-music-midi


In [None]:
"""
Constant parameters
"""

# Define the musical styles
genre = [
    'classical',
    'romantic'
]

styles = [
    [
        'data/classical/beethoven',
        'data/classical/haydn',
        'data/classical/mozart'
    ],
    [
        'data/romantic/borodin',
        'data/romantic/brahms',
        'data/romantic/tschai'
    ]
]

NUM_STYLES = sum(len(s) for s in styles)

# MIDI Resolution
DEFAULT_RES = 96
MIDI_MAX_NOTES = 128  # 1 - 128
MAX_VELOCITY = 127    # 0 - 127

# Number of octaves supported
NUM_OCTAVES = 4
OCTAVE = 12

# Min and max note (in MIDI note number)
MIN_NOTE = 36
MAX_NOTE = MIN_NOTE + NUM_OCTAVES * OCTAVE
NUM_NOTES = MAX_NOTE - MIN_NOTE

# Number of beats in a bar
BEATS_PER_BAR = 4
# Notes per quarter note
NOTES_PER_BEAT = 4
# The quickest note is a half-note
NOTES_PER_BAR = NOTES_PER_BEAT * BEATS_PER_BAR

# Training parameters
BATCH_SIZE = 16
SEQ_LEN = 8 * NOTES_PER_BAR

# Hyper Parameters
OCTAVE_UNITS = 64
STYLE_UNITS = 64
NOTE_UNITS = 3
TIME_AXIS_UNITS = 256
NOTE_AXIS_UNITS = 128

TIME_AXIS_LAYERS = 2
NOTE_AXIS_LAYERS = 2

# Move file save location
OUT_DIR = 'out'
MODEL_DIR = os.path.join(OUT_DIR, 'models')
MODEL_FILE = os.path.join(OUT_DIR, 'model.h5')
SAMPLES_DIR = os.path.join(OUT_DIR, 'samples')
CACHE_DIR = os.path.join(OUT_DIR, 'cache')


In [None]:
def build_models(time_steps=SEQ_LEN, input_dropout=0.2, dropout=0.5):
    """
    Build the LSTM model
    """
    notes_in = Input((time_steps, NUM_NOTES, NOTE_UNITS))
    beat_in = Input((time_steps, NOTES_PER_BAR))
    style_in = Input((time_steps, NUM_STYLES))
    # Target input for conditioning
    chosen_in = Input((time_steps, NUM_NOTES, NOTE_UNITS))

    # Dropout inputs
    notes = Dropout(input_dropout)(notes_in)
    beat = Dropout(input_dropout)(beat_in)
    chosen = Dropout(input_dropout)(chosen_in)

    # Distributed representations
    style_l = Dense(STYLE_UNITS, name='style')
    style = style_l(style_in)

    """ Time axis """
    time_out = time_axis(dropout)(notes, beat, style)

    """ Note Axis & Prediction Layer """
    naxis = note_axis(dropout)
    notes_out = naxis(time_out, chosen, style)

    model = Model([notes_in, chosen_in, beat_in, style_in], [notes_out])
    model.compile(optimizer='nadam', loss=[primary_loss])

    """ Generation Models """
    time_model = Model([notes_in, beat_in, style_in], [time_out])

    note_features = Input((1, NUM_NOTES, TIME_AXIS_UNITS), name='note_features')
    chosen_gen_in = Input((1, NUM_NOTES, NOTE_UNITS), name='chosen_gen_in')
    style_gen_in = Input((1, NUM_STYLES), name='style_in')

    # Dropout inputs
    chosen_gen = Dropout(input_dropout)(chosen_gen_in)
    style_gen = style_l(style_gen_in)

    note_gen_out = naxis(note_features, chosen_gen, style_gen)

    note_model = Model([note_features, chosen_gen_in, style_gen_in], note_gen_out)

    return model, time_model, note_model

In [None]:
def train(models):
    """
    Train the model
    """
    print('Loading data')
    train_data, train_labels = load_all(styles, BATCH_SIZE, SEQ_LEN)

    cbs = [
        ModelCheckpoint(MODEL_FILE, monitor='loss', save_best_only=True, save_weights_only=True),
        EarlyStopping(monitor='loss', patience=5),
        TensorBoard(log_dir='out/logs', histogram_freq=1)
    ]

    print('Training')
    models[0].fit(train_data, train_labels, epochs=2, callbacks=cbs, batch_size=BATCH_SIZE)

In [None]:
models = build_models()
models[0].summary()
models[1].summary() # time_model
models[2].summary() # note_model

In [None]:
"""
Loads all MIDI files as a piano roll. Prepare dataset
(For Keras)
"""

time_steps = SEQ_LEN
note_data = []
beat_data = []
style_data = []

note_target = []

# TODO: Can speed this up with better parallel loading. Order gaurentee.
stylesEnum = [y for x in styles for y in x]

for style_id, style in enumerate(stylesEnum):
    style_hot = one_hot(style_id, NUM_STYLES)
    # Parallel process all files into a list of music sequences
    seqs = Parallel(n_jobs=multiprocessing.cpu_count(), backend='threading')(delayed(load_midi)(f) for f in get_all_files([style]))

    for seq in seqs:
        if len(seq) >= time_steps:
            # Clamp MIDI to note range
            seq = clamp_midi(seq)
            # Create training data and labels
            train_data, label_data = stagger(seq, time_steps)
            note_data += train_data
            note_target += label_data

            beats = [compute_beat(i, NOTES_PER_BAR) for i in range(len(seq))]
            beat_data += stagger(beats, time_steps)[0]

            style_data += stagger([style_hot for i in range(len(seq))], time_steps)[0]

note_data = np.array(note_data)
beat_data = np.array(beat_data)
style_data = np.array(style_data)
note_target = np.array(note_target)

train_data = [note_data, note_target, beat_data, style_data]
train_labels = [note_target]

In [None]:
print("note_data:", train_data[0].shape)
print("beat_data:", train_data[1].shape)
print("style_data:", train_data[2].shape)
print("note_target:", train_data[3].shape)

In [None]:
cbs = [
    ModelCheckpoint(MODEL_FILE, monitor='loss', save_best_only=True, save_weights_only=True),
    EarlyStopping(monitor='loss', patience=3),
    TensorBoard(log_dir='out/logs', histogram_freq=1)
]

models[0].fit(train_data, train_labels, epochs=1000, callbacks=cbs, batch_size=BATCH_SIZE)

## Generate the Music

In [None]:
models[0].load_weights(MODEL_FILE)

In [None]:
# parser = argparse.ArgumentParser(description='Generates music.')
# parser.add_argument('--bars', default=32, type=int, help='Number of bars to generate')
# parser.add_argument('--styles', default=None, type=int, nargs='+', help='Styles to mix together')
# args = parser.parse_args()

models = build_or_load()

stylesGene = [compute_genre(i) for i in range(len(genre))]

# if args.styles:
#     # Custom style
#     styles = [np.mean([one_hot(i, NUM_STYLES) for i in args.styles], axis=0)]

write_file('output', generate(models, 32, stylesGene))

### Some thoughts
- 2 gnere: classic, jazz, EDM?
- create a table to present constant value
- make a repository on github
- music classification 
    - CNN wave images
    - https://www.analyticsvidhya.com/blog/2021/06/music-genres-classification-using-deep-learning-techniques/
    - http://cs229.stanford.edu/proj2018/report/21.pdf
    - Midi
    - https://github.com/sandershihacker/midi-classification-tutorial/blob/master/midi_classifier.ipynb
    - ByteDance https://arxiv.org/abs/2010.14805# 
    - ByteDance dataset https://arxiv.org/abs/2010.07061 Github https://github.com/bytedance/GiantMIDI-Piano
    
### what is the next steps
- change the DeepJ model to adapt more genre ranther than one specific composer?
- or pre-train a music classification model to initialize the input -> change one-hot representation
- forget about the sparse input, enough explaination is ok
