# Music Generator with LSTM

## About

Music resamples language as a temporal sequence of articulated sounds. They say something, often something human.

Although, there are crucial differences between language and music. We can still describe it as a sequence of symbols in the simplest form of understanding. Translating something complex into something simpler, but usable by computational models.

Thus, the objective of this project is to establish a communication between the human, that understands music in the most intense way that the brain can interpret through information, and the machine.

We'll create a model that can generate music based on the input information, i.e., generate a sequence of sounds which are related in some way with the sounds passed as input.

We'll use Natural Language Processing (NLP) methods, observing the music as it were a language, abstracting it. Doing this, the machine can recognize and process similar data.

On the first step, we'll use text generation techniques, using Recurrent Neural Networks (RNNs) and Long-Short Term Memories (LSTMs). With the effectiveness of the training, even if it's reasonable, we'll perform the same implementation using specific methods such as Attention.



## Imports

In [1]:
# Basic libraries
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F

# Preprocessing data libraries
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset

# Model libraries
import torch
import torch.nn as nn
import torch.optim as optim

# Data visualization
# from torch.utils.tensorboard import SummaryWriter
from tqdm.notebook import tqdm #for loading bars

# Utils
import music21
import pickle

In [2]:
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

In [3]:
from encoder import *
from decoder import *

## Dataset

### Load data

In [4]:
data_path = '../data/'
out_encoded_path = '../encoded/'
out_decoded_path = '../decoded/'
file = 'All_I_Have_to_Do_Is_Dream'
in_file = data_path + file
out_encoded = out_encoded_path + file
out_decoded = out_decoded_path + file

N_FRAMES = 36
N_NOTES = 88
MIDI_OFFSET = 20

In [5]:
# be sure that the dirs exist
if not os.path.isdir(data_path):
    os.mkdir(data_path)
if not os.path.isdir(out_decoded_path):
    os.mkdir(out_decoded_path)
if not os.path.isdir(out_encoded_path):
    os.mkdir(out_encoded_path)

In [6]:
# get encoded data and save encoded file
encoded_song = encode_data(in_file,
                           N_FRAMES,
                           N_NOTES,
                           MIDI_OFFSET, 
                           save_as=out_encoded
                           )

Encoding file All_i_have_to_do_is_dream
Encoding Voice
Encoding Piano
Encoding Sampler
Encoding Electric guitar
Encoding Stringinstrument
Encoding Voice_
Took 13.405985116958618


### Data visualization

Each song is represented by a Pandas DataFrame where each column represents some song status. 

In [7]:
encoded_song

Unnamed: 0_level_0,inst_code,ks,bpm,ts,G#0,A0,B-0,B0,C1,C#1,...,D7,E-7,E7,F7,F#7,G7,G#7,A7,B-7,B7
inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Voice,52,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Voice,52,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Voice,52,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Voice,52,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Voice,52,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Voice_,54,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Voice_,54,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Voice_,54,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Voice_,54,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [8]:
test_piano_df = encoded_song[encoded_song.index=='Piano']
print(f'Number of frames: {len(test_piano_df.index)}')
print(f'Number of bars: {len(test_piano_df.index)//N_FRAMES}')
test_piano_df

Number of frames: 360
Number of bars: 10


Unnamed: 0_level_0,inst_code,ks,bpm,ts,G#0,A0,B-0,B0,C1,C#1,...,D7,E-7,E7,F7,F#7,G7,G#7,A7,B-7,B7
inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
Piano,5,C,90.0,4/4,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


### Preprocess data

In [9]:
def get_stackframe(encoded_song, init_idx, final_idx):
    stackframe = encoded_song.iloc[init_idx:final_idx, 4:]
    stackframe = stackframe.to_numpy()
    stackframe.astype(float)
    stackframe = stackframe + 0.0
    # print(f'Stackframe shape: {stackframe.shape}')
    return stackframe
    
    

In [10]:
get_stackframe(test_piano_df, init_idx=0, final_idx=36)

array([[0.0, 0.0, 0.0, ..., 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 0.0],
       ...,
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 0.0]], dtype=object)

In [11]:
import os
def create_dataset(root_dir, instrument, N_FRAMES=36, N_NOTES=88):
    dataset=[]
    for filename in os.listdir(root_dir):
        if filename.endswith('pkl'):
            infile = open(root_dir + filename,'rb')
            encoded_song = pickle.load(infile)
            infile.close()
            encoded_part = encoded_song[encoded_song.index==instrument]
            for i in range(len(encoded_part.index)//N_FRAMES):
                init_idx = i*(N_FRAMES)    #  0, 36  , 36*2, 36*3, ..., 36*9
                final_idx = (i+1)*N_FRAMES # 36, 36*2, 36*3, 36*4, ..., 36*10
                stackframe = get_stackframe(encoded_part, init_idx=init_idx, final_idx=final_idx)
                dataset.append(stackframe)
    
    return dataset

In [12]:
root_dir = '../encoded/'
instrument = 'Piano'

dataset = create_dataset(root_dir, instrument)
print(f'Dataset length: {len(dataset)}')

Dataset length: 10


In [13]:
def split_dataset(dataset):
    X = []
    y = []
    
    # create two arrays X, y with bars
    for bar in dataset:
        xa = bar[:-1]
        ya = bar[1:]
        X.append(xa)
        y.append(ya)
        
    X = np.array(X, dtype='float64')
    y = np.array(y, dtype='float64')
    X = torch.from_numpy(X)
    y = torch.from_numpy(y)
    print(f'X.shape, y.shape: {X.shape, y.shape}')
    train_ds = TensorDataset(X, y)
    return train_ds

In [14]:
train_ds = split_dataset(dataset)
train_dl = DataLoader(train_ds, batch_size=1, shuffle=False)

X.shape, y.shape: (torch.Size([10, 35, 88]), torch.Size([10, 35, 88]))


## Model

Some important definitions

In [15]:
n_frames_input = 35
n_frames_output = 35
n_bars_input = len(train_dl.dataset.tensors[0]) # number of rows of the dataloader
bar_len = 35 # how many frames it's gonna take in a timestep
num_layers = 35
frame_len = 88
hidden_size = 88
num_epochs = 3
batch_size = 1
lr = 0.003
print('Number of bars in the input dataset: {}'.format(n_bars_input))

Number of bars in the input dataset: 10


In [16]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers, output_size):
    super(RNN, self).__init__()
    self.hidden_size = hidden_size
    self.num_layers = num_layers

    #self.embed = nn.Embedding(input_size, hidden_size)
    self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=False)
    self.fc = nn.Linear(hidden_size, output_size)
    self.act = nn.Hardsigmoid()

  def forward(self, x, hidden, cell):

    # Passing in the input and hidden state into the model and obtaining outputs
    out, (hidden, cell) = self.lstm(x.unsqueeze(1), (hidden, cell))

    # Reshaping the outputs such that it can be fit into the fully connected layer
    out = self.fc(out.contiguous().view(-1, self.hidden_size))
    out = self.act(out)
    
    return out, (hidden, cell)

  def init_hidden(self, batch_size):
    # This method generates the first hidden state of zeros which we'll use in the forward pass
    # We'll send the tensor holding the hidden state to the device we specified earlier as well
    hidden = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
    cell = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
    return hidden, cell

In [17]:
# Instantiate the model with hyperparameters
# We'll also set the model to the device that we defined earlier (default is CPU)
model = RNN(input_size=frame_len,
                   output_size=frame_len,
                   hidden_size=hidden_size,
                   num_layers=num_layers).to(device)

## Train

In [18]:
# converts one frame into torch tensor
def multi_hot_tensor(frame):
  tensor = torch.from_numpy(frame)
  return tensor

In [19]:
# retrieve data from dataloader
def get_sample(dataloader):

  input = torch.zeros(n_bars_input, bar_len, frame_len)
  target = torch.zeros(n_bars_input, bar_len, frame_len)

  for sample, (xb, yb) in enumerate(dataloader): # gets the samples
    input[sample] = xb
    target[sample] = yb
  
  return input, target

In [20]:
def train(model, optimizer, loss_fn, dataloader, batch_size=1, num_epochs=3):

  print("\nStarting training...")

  for epoch in range(1, num_epochs + 1):
    training_loss = 0.0

    print('> EPOCH #', epoch)

    input, target = get_sample(dataloader)
    input = input.to(device)
    target = target.to(device)

    for bar in tqdm(range(n_bars_input)):
      # Initialize hidden and cells
      hidden, cell = model.init_hidden(batch_size)

      # Generate predictions
      output, (hidden, cell) = model(input[bar,:], hidden, cell)

      # Compute the loss and backpropag         
      loss_step = loss_fn(output, target[bar, :])
      loss_step.backward() # Does backpropagation and calculates gradients
      optimizer.step() # Updates the weights accordingly
      optimizer.zero_grad() # Clears existing gradients from previous frame
      
      training_loss += loss_step.item()
    
    training_loss /= len(train_dl.dataset)
      
    if epoch%1 == 0:
      print('Epoch: {}/{}.............'.format(epoch, num_epochs), end=' ')
      print("Loss: {:.4f}".format(training_loss))

In [21]:
optimizer = optim.Adam(model.parameters(), lr=lr)
loss_fn = nn.BCELoss()
train(model, optimizer, loss_fn, train_dl, num_epochs=3)


Starting training...
> EPOCH # 1


100%|██████████| 10/10 [00:08<00:00,  1.22it/s]
  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 1/3............. Loss: 0.5440
> EPOCH # 2


100%|██████████| 10/10 [00:07<00:00,  1.26it/s]
  0%|          | 0/10 [00:00<?, ?it/s]

Epoch: 2/3............. Loss: 0.3151
> EPOCH # 3


100%|██████████| 10/10 [00:06<00:00,  1.46it/s]

Epoch: 3/3............. Loss: 0.3728





## Test

In [22]:
@torch.no_grad()
def generate(model, initial_bar, predict_len=36, batch_size=1, temperature=0.85):
  output = []
  hidden, cell = model.init_hidden(batch_size)
  initial_bar = initial_bar.to(device)
  for _ in range(predict_len + 1):
    out = model(initial_bar, hidden, cell)[0]
    output.append(out)
  
  return output

In [23]:
initial_bar = train_dl.dataset.tensors[0][5]
initial_bar = initial_bar.to(torch.float)
print(f'Input: ')
torch.set_printoptions(threshold=10_000)
print(initial_bar)

output = generate(model, initial_bar, predict_len=1)
print(f'Output: ')
print(output)

Input: 
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         1., 0., 0., 0., 1., 0., 1

[tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0215, 0.0000, 0.0000, 0.0000, 0.0000,
         0.3121, 0.0000, 0.3180, 0.0000, 0.3375, 0.0000, 0.0521, 0.3464, 0.0000,
         0.0000, 0.0000, 0.3194, 0.1118, 0.0000, 0.2479, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0

In [24]:
test_piano_df = encoded_song[encoded_song.index=='Piano']
test_piano_df.iloc[:, :4]

Unnamed: 0_level_0,inst_code,ks,bpm,ts
inst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Piano,5,C,90.0,4/4
Piano,5,C,90.0,4/4
Piano,5,C,90.0,4/4
Piano,5,C,90.0,4/4
Piano,5,C,90.0,4/4
...,...,...,...,...
Piano,5,C,90.0,4/4
Piano,5,C,90.0,4/4
Piano,5,C,90.0,4/4
Piano,5,C,90.0,4/4


In [25]:
len(output[0])
#len(output[0].detach().to(device).numpy())

35

In [163]:
data_out = decode_data(output, N_FRAMES, N_NOTES, save_as=out_decoded)

TypeError: 'builtin_function_or_method' object is not iterable