<a href="https://colab.research.google.com/github/monko9j1/making_lofi/blob/main/making_lofi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# INSTALLS AND LIBRARIES

First, we have to make sure our file has our Database (cloned from github), and a few other libraries that it might not automatically install.

Then, we'll use a few different libraries, including pandas (popular machine learning library), Music21 (translates midi files into numbers which a computer can process), and Keras (enables the creation of more advanced neural networks).

In [None]:
! git clone https://github.com/nmtremblay/lofi-samples.git
! pip install music21
! pip install np_utils
! pip install pygame

fatal: destination path 'lofi-samples' already exists and is not an empty directory.


In [None]:
import glob
import pickle
import numpy as np
import pandas as pd
from music21 import *
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM, Bidirectional
from keras.layers import Activation
from keras.layers import BatchNormalization as BatchNorm
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint
import tensorflow as tf
from pygame import *


Using TensorFlow backend.


Now, we'll convert our MIDI data into data which Music21 can process.

In [None]:
#creating an empty list to hold the notes in
notes = []

#this for loop goes through each midi file and flattens out the notes inside of it
for file in glob.glob("lofi-samples/samples/*.mid"):
    midi = converter.parse(file)
    notes_to_parse = midi.flat.notes

    for element in notes_to_parse:
        if isinstance(element, note.Note): #if it's a single note, we don't have to join it to any other notes in the series
            notes.append(str(element.pitch))
        elif isinstance(element, chord.Chord): #if it's a chord, we will have to join it to the other notes
            notes.append('.'.join(str(n) for n in element.normalOrder))

# DATA FORMATTING

Here, we're going to convert all the midis from the dataset into sequential lists of notes and chords. This is important, of course, because our network operates on sequential data.

First, we have to translate all this categorical (string-based) data into numerical (integer-based) data. This can be accomplished with a mapping function.

In [None]:
#this is the amount of previous notes our algorithm will use to predict the next notes
#mess around with this number to see how this impacts the accuracy
sequence_length = 20 #our chord progressions are pretty short so we might not need that many notes

# get all pitch names
pitchnames = sorted(set(item for item in notes))

# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

network_input = []
network_output = []

# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i:i + sequence_length]
    sequence_out = notes[i + sequence_length]
    network_input.append([note_to_int[char] for char in sequence_in])
    network_output.append(note_to_int[sequence_out])
    n_patterns = len(network_input)

# reshape the input vector into a format compatible with LSTM layers
network_input = np.reshape(network_input, (n_patterns, sequence_length, 1))
# normalizing the input values
n_vocab = len(set(notes))
network_input = network_input / float(n_vocab)

#i wrote this to bypass an error message if there's no output yet (bc we haven't trained the model)
from keras.utils.np_utils import to_categorical
try:
  network_output = np_utils.to_categorical(network_output)
except ValueError:  #raised if `y` is empty.
    pass

#THE NETWORK!

We're working with a structure with three LSTM layers, three Dropout layers, two Dense layers, and one Activation layer.

Play around with the architecture to see if you can improve the quality of the predictions!

In [None]:
model = Sequential()

#each model.add command adds a new layer to our sequential model
#this one is our input layer :)
model.add(LSTM(
        256, #nodes
        input_shape=(network_input.shape[1], network_input.shape[2]), #unique input to tell the network the shape of our data
        return_sequences=True #this means we'll return sequences of data
    ))

#each one of these is a hidden layer:
#in theory they would all be togeter but I'm commeting fuck u

model.add(Dropout(0.3))
#these layers will set a fraction of inputs (in in this case 3/10) to 0 at each update.
#it's a technique to prevent overfitting
#(in case you haven't heard, the fraction of input units we're dropping during training is our first parameter)

model.add(LSTM(512, return_sequences=True))
#each type of LSTM layer takes a sequence as an input and returns either sequences or matrixes
#here, the first parameter is how many nodes our layer will have.
#(same thing with all the non-dropout layers)

model.add(Dropout(0.3))
model.add(LSTM(256))
model.add(Dense(256))
#these guys are fully connected and attatch to an output node

model.add(Dropout(0.3))
model.add(Dense(n_vocab))
#because this one's our last layer, it should have the same amount of nodes as the number of different outputs our system has
#this will make sure the network's output will map right onto the system classes

model.add(Activation('softmax'))
#this one figures out which activation function to use to calculate the output

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
#this is our training command
#we're using categorical





Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.




In [None]:
model = Sequential()

model.add(LSTM(256, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

model.fit(network_input, network_output, epochs=20, batch_size=64)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Epoch 1/20





Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f67de92ecf8>

# Making Music!
Here, we're setting up the network exactly the same as the last one. This time, instead of training the set, we'll use the weights generated by the network during the training phase.

The first segment of code generates the music21 note values of our song.

In [None]:
model = Sequential()
model.add(LSTM(
    512,
    input_shape=(network_input.shape[1], network_input.shape[2]),
    return_sequences=True
))
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(512))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

Just like we used a mapping function to format the data to fit into our neural network, we'll have to use another mapping function to "un-format" it back into a form in which it can be played back. Eagle-eyed readers will notice how similar it looks to the mapping function we used before.

In [None]:
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

pattern = network_input[start]

prediction_output = []

for note_index in range(100): #here, we're generating 100 notes
    prediction_input = np.reshape(pattern, (1, len(pattern), 1))
    prediction_input = prediction_input / float(n_vocab)

    prediction = model.predict(prediction_input, verbose=0)
    index = np.argmax(prediction)

    result = int_to_note[index]
    prediction_output.append(result)

    pattern.ravel()
    patternbeta =  pattern + index
    patternbeta = patternbeta[1:len(patternbeta)]

Finally, we can organize our notes into phrases. Specifically, Note objects and Chord objects.

If our Music21 value is a single note, we can store it in the corresponding Note object and play it with a piano sound.

In [None]:
offset = 0
output_notes = []

for pattern in prediction_output:
    #chords!
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = [] #creating the array where we'll store the note values, which the for loop below will handle
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes) #adding the note to the chord object
        new_chord.offset = offset #connecting it to the offset variable
        output_notes.append(new_chord) #adding it to the song
    #notes!
    else:
        new_note = note.Note(pattern) #storing it in the object
        new_note.offset = offset #connecting it to our offset command later on
        new_note.storedInstrument = instrument.Piano() #playing it with piano
        output_notes.append(new_note) #adding it to the song
    #make sure notes don't end up on top of each other by adding an 0.5 offset every time
    offset += 0.5

Our final command aggregates all of our notes into a single Stream object, then uses the write function to convert it into a playable MIDI file.

Finally! We can play our song :)

In [None]:
us = environment.UserSettings()
us.getSettingsPath()

PosixPath('/root/.music21rc')

In [None]:
s = stream.Stream(output_notes)
mf = s.write('midi', fp="lofi-samples/testOutput.mid")

#from here, i opened the instrumental and the drum loop file in audacity and played them together!