#### Student Name:
#### Student ID:

# Assignment 7

### Mozart Dice Game RNN

Instructions: 

* This notebook is an interactive assignment; please read and follow the instructions in each cell. 

* Cells that require your input (in the form of code or written response) will have 'Question #' above.

* After completing the assignment, please submit this notebook and a copy as a PDF.



In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_formats = ['svg']
from scipy.io import wavfile
from numpy.linalg import svd
from scipy.stats.mstats import gmean
from matplotlib import rcParams
import scipy
import os
import sys
import glob
import pickle
from music21 import converter, instrument, note, chord, stream
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, ReLU, Activation, Lambda, Softmax
from keras.layers import BatchNormalization as BatchNorm
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint, ProgbarLogger
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
tf.disable_v2_behavior()

2022-07-18 22:07:22.417606: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


Instructions for updating:
non-resource variables are not supported in the long term


# Generating Music with RNN

In the next section, you will practice using Keras to create a generative model based on the music of your & your classmates' Mozart Dice Game from Assignment 1. 

You will be constructing an RNN by filling in some missing lines of code & answering questions about Keras and model performance. 

The overall goal of this model is to be able to predict the next note of a sequence, given a sequence of 4 notes. (This sequence length of 4 was chosen arbitrarily; please feel free to experiment with this number). 

First, let's define the RNN model we will use. 
A base LSTM layer has been included below.

##### Question 1 (30 points)

Define & compile the rest of the network as follows:

The additional layers of your network will be:
1. Another LSTM layer, with 512 units of output which drops 3/10 of the units. 
2. A batch normalization layer.
3. A layer which drops 3/10 of the units. 
4. A fully connected layer with 256 units of output.
5. A ReLU activation layer.
6. A batch normalization layer.
7. A layer which drops 3/10 of the units. 
8. A fully connected layer with number of units of output equal to the vocabulary space of the input. 
9. A softmax activation layer which uses a temperature of .6 
    (Note, you may need to define this as two separate layers in Keras, using the definition of temperature for softmax). 
    
After creating your network, compile the model with categorical cross entropy loss and an optimizer of your choice. 


In [2]:
def create_network(network_input, n_vocab):

    model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        recurrent_dropout=0.3,
        return_sequences=True
    ))
    
    ''' Your Code Here '''
    model.add(LSTM(
        512,
        recurrent_dropout = 3/10
    ))
    model.add(BatchNorm())
    model.add(Dropout(3/10))
    model.add(Dense(256))
    model.add(ReLU())
    model.add(BatchNorm())
    model.add(Dropout(3/10))
    model.add(Dense(n_vocab))
    model.add(Lambda(lambda x: x / .6))
    model.add(Softmax())
    
    model.compile(
        loss = 'categorical_crossentropy', 
        optimizer = 'adam'
    )
    
    return model

Next, we will need to structure our input data in a way that makes sense. We can't pass a direct MIDI file to a network, so we must come up with an encoding. Read the code below:

In [3]:
def get_notes(verbose = False):

    notes = []
    for file in glob.glob("dice_songs/*.mid"):
        midi = converter.parse(file)
        if verbose:
            print("Parsing %s" % file)
        notes_to_parse = None
        try: # file has instrument parts
            s2 = instrument.partitionByInstrument(midi)
            notes_to_parse = s2.parts[0].recurse() 
        except: # file has notes in a flat structure
            notes_to_parse = midi.flat.notes

        for element in notes_to_parse:
            if isinstance(element, note.Note):
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append('.'.join(str(n) for n in element.normalOrder))

    pickle.dump(notes, open('notes.p', 'wb'))

    return notes


def prepare_sequences(notes, n_vocab):
    """ Prepare the sequences used by the Neural Network """
    sequence_length = 4 

    pitchnames = sorted(set(item for item in notes))

    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

    network_input = []
    network_output = []

    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        network_input.append([note_to_int[char] for char in sequence_in])
        network_output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    network_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
    # normalize input
    network_input = network_input / float(n_vocab)

    network_output = np_utils.to_categorical(network_output)

    return (network_input, network_output)


##### Question 2 (10 points)

How is the data from the MIDI file encoded as input to the network? Be specific in your explanation; make sure you address details such as which data type is used to represent a note in the input layer and how chords are handled, as well as what information is lost by using this encoding. 

[Hint: Try to print some of the variables to visualize their data.]

``` Your response here ```

The notes are converted to numbers and grouped as chords. The time data is lost.

Now, we are ready to train the network.

##### Question 3 (10 points)

Add a line of code to begin the training of the model.
Please train for at least 50 epochs (you are welcome to experiment with the duration of training, batch size, and other hyperparameters). 

In [4]:
checkpoint_filepath = 'models/model.hdf5'

def train_network():
    """ Train a Neural Network to generate music """
    notes = get_notes()

    n_vocab = len(set(notes))
    
    network_input, network_output = prepare_sequences(notes, n_vocab)
    
    model = create_network(network_input, n_vocab)
     
    # Your line of code here
    global checkpoint_filepath
    
    class _SelectiveProgbarLogger(ProgbarLogger):
        def __init__(self, verbose, epoch_interval, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self.default_verbose = verbose
            self.epoch_interval = epoch_interval
        
        def on_epoch_begin(self, epoch, *args, **kwargs):
            self.verbose = (
                0 
                    if epoch % self.epoch_interval != 0 
                    else self.default_verbose
            )
            super().on_epoch_begin(epoch, *args, **kwargs)
    
    model.fit(
        network_input, network_output, 
        epochs = 1500, batch_size = 65536,
        verbose = 0,
        callbacks = [
            _SelectiveProgbarLogger(
                verbose = 1,
                epoch_interval = 100
            ),
            ModelCheckpoint(
                #"weights2-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5",
                checkpoint_filepath,
                monitor = 'loss',
                verbose = 1,
                save_best_only = True,
                save_freq = 500,
                mode = 'min'
            )
        ]
    )
    
    return model
    
_ = train_network()



2022-07-18 22:07:29.252838: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-18 22:07:29.835383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22349 MB memory:  -> device: 0, name: TITAN RTX, pci bus id: 0000:60:00.0, compute capability: 7.5


Instructions for updating:
Colocations handled automatically by placer.
Epoch 1/1500


2022-07-18 22:07:31.052644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22349 MB memory:  -> device: 0, name: TITAN RTX, pci bus id: 0000:60:00.0, compute capability: 7.5
2022-07-18 22:07:31.121668: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled


Epoch 101/1500
Epoch 201/1500
Epoch 301/1500
Epoch 401/1500

Epoch 500: loss improved from inf to 2.33458, saving model to models/model.hdf5
Epoch 501/1500
Epoch 601/1500
Epoch 701/1500
Epoch 801/1500
Epoch 901/1500

Epoch 1000: loss improved from 2.33458 to 1.95288, saving model to models/model.hdf5
Epoch 1001/1500
Epoch 1101/1500
Epoch 1201/1500
Epoch 1301/1500
Epoch 1401/1500

Epoch 1500: loss improved from 1.95288 to 1.62831, saving model to models/model.hdf5


Now that we have a trained network to make predictions, it's time to use the network to generate music!

##### Question 4 (10 points)

To make the predictions, you will need to complete the line in the generate_notes function below.

[Hint: what function does Keras use to make predictions?]

In [5]:
def prepare_sequences_prediction(notes, pitchnames, n_vocab):

    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
    sequence_length = 4
    network_input = []
    output = []
    for i in range(0, len(notes) - sequence_length, 1):
        sequence_in = notes[i:i + sequence_length]
        sequence_out = notes[i + sequence_length]
        network_input.append([note_to_int[char] for char in sequence_in])
        output.append(note_to_int[sequence_out])

    n_patterns = len(network_input)

    # reshape the input into a format compatible with LSTM layers
    normalized_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
    # normalize input
    normalized_input = normalized_input / float(n_vocab)

    return (network_input, normalized_input)

def generate_notes(model, network_input, pitchnames, n_vocab):
    """ Generate notes from the neural network based on a sequence of notes """
    # Starts the melody by picking a random sequence from the input as a starting point
    start = np.random.randint(0, len(network_input)-1)

    int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

    pattern = network_input[start]
    prediction_output = []
    
    for note_index in range(200):
        prediction_input = np.reshape(pattern, (1, len(pattern), 1))
        prediction_input = prediction_input / float(n_vocab)

        
        ### Complete the line below
        prediction = model.predict(
            prediction_input,
            batch_size = 65536
        )

        index = np.argmax(prediction)
        result = int_to_note[index]
        prediction_output.append(result)

        pattern.append(index)
        pattern = pattern[1:len(pattern)]

    return prediction_output

Now we have our model set up, and can create a sequence to use as a query for a prediction of the next note, but we aren't ready to make the predictions since our model does not contain the trained weights!

##### Question 5 (10 points)

Add a line below to load the weights from your network training. 

[Hint: What Keras function is used to load weights?]

In [6]:
notes = pickle.load(open('notes.p', 'rb'))
pitchnames = sorted(set(item for item in notes))
n_vocab = len(set(notes))

network_input, normalized_input = prepare_sequences_prediction(notes, pitchnames, n_vocab)
model = create_network(normalized_input, n_vocab)

### Add a line to load the weights here
global checkpoint_filepath
model.load_weights(checkpoint_filepath)

def generate():
    global model, network_input, pitchnames, n_vocab
    prediction_output = generate_notes(model, network_input, pitchnames, n_vocab)
    return prediction_output



In [7]:
def create_midi(prediction_output):
    offset = 0
    output_notes = []
    for pattern in prediction_output:
        if ('.' in pattern) or pattern.isdigit():
            notes_in_chord = pattern.split('.')
            notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note))
                new_note.storedInstrument = instrument.Piano()
                notes.append(new_note)
            new_chord = chord.Chord(notes)
            new_chord.offset = offset
            output_notes.append(new_chord)
        else:
            new_note = note.Note(pattern)
            new_note.offset = offset
            new_note.storedInstrument = instrument.Piano()
            output_notes.append(new_note)
        offset += 0.5
    midi_stream = stream.Stream(output_notes)
    return midi_stream
    
midi_stream = create_midi(generate())
midi_stream.show('midi')
midi_stream.write('midi', fp='test_output.mid')

  updates=self.state_updates,


'test_output.mid'

##### Question 6 (10 points)

Listen to your MIDI output. You probably notice that at some point we reach a cycle. Why is this happening? 

``` Your response here ```

This happens because we use the same pattern for all iterations. Schematically ["RNN uses a `for` loop to iterate over the timesteps of a sequence"](https://www.tensorflow.org/guide/keras/rnn#introduction); therefore the patterns that can be generated from a single pattern are limited.

##### Question 7 (20 points)

The generate_notes function is copied below. Please add your same prediction line from above once more, and then modify the generate_notes function in a way that allows for a non-cyclic composition that still resembles the original input. 

[Hint: think about what we learned in HW 2 while exploring Markov Chains with the Beatles.]

In [24]:
import collections

def generate_notes(model, network_input, pitchnames, n_vocab):
    """ Generate notes from the neural network based on a sequence of notes """
    # Starts the melody by picking a random sequence from the input as a starting point

    int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
    
    prediction_output = []

    pattern = None
    patterns = set()
    for _ in range(200):
        if (
            (pattern is None)
                #or (tuple(pattern) in patterns)
        ):
            pattern = collections.deque(network_input[
                np.random.randint(0, len(network_input) - 1)
            ])
        patterns.add(tuple(pattern))
        
        prediction_input = np.reshape(pattern, (1, len(pattern), 1))
        prediction_input = prediction_input / float(n_vocab)

        ### Copy the line below from your above implementation.
        prediction = np.reshape(
            model.predict(
                prediction_input,
                batch_size = 65536
            ),
            -1
        )

        #index = np.argmax(prediction)
        index = np.random.choice(np.arange(len(prediction)), p = prediction)
        
        result = int_to_note[index]
        prediction_output.append(result)

        pattern.popleft()
        pattern.append(index)

    return prediction_output

In [25]:
midi_stream = create_midi(generate())
midi_stream.show('midi')
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'

##### Bonus Question 8 (10 points, but your total will not exceed 100)

There are many other ways in which this model could be improved for the goal of creating music that sounds like the training set. Identify two shortcomings of the model performance, and propose an idea you would use to overcome each of the shortcomings. 

``` Your response here ```

- Underfitting/Overfitting. 
    Solution: 
    to prevent underfitting, train using larger datasets; 
    to prevent overfitting, provide a validation dataset and introduce regularization.
- Generated music has repetitive patterns.
    Solution: use larger datasets with more diverse patterns.