# Music Generator with LSTM

##About

Music resamples language as a temporal sequence of articulated sounds. They say something, often something human.

Although, there are crucial differences between language and music. We can still describe it as a sequence of symbols in the simplest form of understanding. Translating something complex into something simpler, but usable by computational models.

Thus, the objective of this project is to establish a communication between the human, that understands music in the most intense way that the brain can interpret through information, and the machine.

We'll create a model that can generate music based on the input information, i.e., generate a sequence of sounds which are related in some way with the sounds passed as input.

We'll use Natural Language Processing (NLP) methods, observing the music as it were a language, abstracting it. Doing this, the machine can recognize and process similar data.

On the first step, we'll use text generation techniques, using Recurrent Neural Networks (RNNs) and Long-Short Term Memories (LSTMs). With the effectiveness of the training, even if it's reasonable, we'll perform the same implementation using specific methods such as Attention.



## Imports

In [16]:
# Basic libraries
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F

# Preprocessing data libraries
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

# Model libraries
import torch
import torch.nn as nn
import torch.optim as optim

# Data visualization
from torch.utils.tensorboard import SummaryWriter
from tqdm.notebook import tqdm #for loading bars

# Music21
import music21

In [17]:
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

In [18]:
!git clone https://github.com/petcomputacaoufrgs/papagaio.git

Cloning into 'papagaio'...
remote: Enumerating objects: 207, done.[K
remote: Counting objects: 100% (207/207), done.[K
remote: Compressing objects: 100% (184/184), done.[K
remote: Total 207 (delta 108), reused 84 (delta 19), pack-reused 0[K
Receiving objects: 100% (207/207), 333.06 KiB | 1.64 MiB/s, done.
Resolving deltas: 100% (108/108), done.


In [19]:
from papagaio.src.encoder import *

## Dataset

In [20]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
%cd /content/gdrive/My Drive/Kaggle/papagaio

/content/gdrive/My Drive/Kaggle/papagaio


In [None]:
!ls

saves


## Preprocess data

In [35]:
%cd /content/papagaio/src

/content/papagaio/src


In [36]:
data_path = '../data/'
out_encoded_path = '../encoded/'
out_decoded_path = '../decoded/'
file = 'All_I_Have_to_Do_Is_Dream'
in_file = data_path + file
out_encoded = out_encoded_path + file
out_decoded = out_decoded_path + file

N_FRAMES = 36
N_NOTES = 88
MIDI_OFFSET = 20

In [37]:
# be sure that the dirs exist
if not os.path.isdir(data_path):
    os.mkdir(data_path)
if not os.path.isdir(out_decoded_path):
    os.mkdir(out_decoded_path)
if not os.path.isdir(out_encoded_path):
    os.mkdir(out_encoded_path)

In [39]:
# get encoded data and save encoded file
encoded_song = encode_data(in_file,
                           N_FRAMES,
                           N_NOTES,
                           MIDI_OFFSET, 
                           save_as=out_encoded
                           )

Encoding file All_i_have_to_do_is_dream
Encoding Acoustic bass




Encoding Electric guitar


IndexError: ignored

In [None]:
train_ds = load_dataset('/content/gdrive/My Drive/Kaggle/papagaio/saves/acdc.pt')

In [None]:
train_dl = load_dataloader('/content/gdrive/My Drive/Kaggle/papagaio/saves/acdc.pkl')

In [None]:
train_dl.dataset.tensors[0][200]

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], dtype=torch.float64)

## Model

Some important definitions

In [None]:
n_frames_input = 31
n_frames_output = 31
n_bars_input = len(train_dl.dataset.tensors[0]) # number of rows of the dataloader
bar_len = 31 # how many frames it's gonna take in a timestep
num_layers = 31
frame_len = 88
hidden_size = 88
num_epochs = 3
batch_size = 1
lr = 0.003
print('Number of bars in the input dataset: {}'.format(n_bars_input))

In [None]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers, output_size):
    super(RNN, self).__init__()
    self.hidden_size = hidden_size
    self.num_layers = num_layers

    #self.embed = nn.Embedding(input_size, hidden_size)
    self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=False)
    self.fc = nn.Linear(hidden_size, output_size)
    self.act = nn.Hardsigmoid()

  def forward(self, x, hidden, cell):

    # Passing in the input and hidden state into the model and obtaining outputs
    out, (hidden, cell) = self.lstm(x.unsqueeze(1), (hidden, cell))

    # Reshaping the outputs such that it can be fit into the fully connected layer
    out = self.fc(out.contiguous().view(-1, self.hidden_size))
    out = self.act(out)
    
    return out, (hidden, cell)

  def init_hidden(self, batch_size):
    # This method generates the first hidden state of zeros which we'll use in the forward pass
    # We'll send the tensor holding the hidden state to the device we specified earlier as well
    hidden = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
    cell = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
    return hidden, cell

In [None]:
# Instantiate the model with hyperparameters
# We'll also set the model to the device that we defined earlier (default is CPU)
model = RNN(input_size=frame_len,
                   output_size=frame_len,
                   hidden_size=hidden_size,
                   num_layers=num_layers).to(device)

## Train

In [None]:
# converts one frame into torch tensor
def multi_hot_tensor(frame):
  tensor = torch.from_numpy(frame)
  return tensor

In [None]:
# retrieve data from dataloader
def get_sample(dataloader):

  input = torch.zeros(n_bars_input, bar_len, frame_len)
  target = torch.zeros(n_bars_input, bar_len, frame_len)

  for sample, (xb, yb) in enumerate(dataloader): # gets the samples
    input[sample] = xb
    target[sample] = yb
  
  return input, target

In [None]:
def train(model, optimizer, loss_fn, dataloader, batch_size=1, num_epochs=3):

  print("\nStarting training...")

  for epoch in range(1, num_epochs + 1):
    training_loss = 0.0

    print('> EPOCH #', epoch)

    input, target = get_sample(dataloader)
    input = input.to(device)
    target = target.to(device)

    for bar in tqdm(range(n_bars_input)):
      # Initialize hidden and cells
      hidden, cell = model.init_hidden(batch_size)

      # Generate predictions
      output, (hidden, cell) = model(input[bar,:], hidden, cell)

      # Compute the loss and backpropag         
      loss_step = loss_fn(output, target[bar, :])
      loss_step.backward() # Does backpropagation and calculates gradients
      optimizer.step() # Updates the weights accordingly
      optimizer.zero_grad() # Clears existing gradients from previous frame
      
      training_loss += loss_step.item()
    
    training_loss /= len(train_dl.dataset)
      
    if epoch%1 == 0:
      print('Epoch: {}/{}.............'.format(epoch, num_epochs), end=' ')
      print("Loss: {:.4f}".format(training_loss))

In [None]:
optimizer = optim.Adam(model.parameters(), lr=lr)
loss_fn = nn.BCELoss()
train(model, optimizer, loss_fn, train_dl)


Starting training...
> EPOCH # 1


HBox(children=(FloatProgress(value=0.0, max=5010.0), HTML(value='')))


Epoch: 1/3............. Loss: 0.8174
> EPOCH # 2


HBox(children=(FloatProgress(value=0.0, max=5010.0), HTML(value='')))


Epoch: 2/3............. Loss: 0.8162
> EPOCH # 3


HBox(children=(FloatProgress(value=0.0, max=5010.0), HTML(value='')))


Epoch: 3/3............. Loss: 0.8162


## Test

In [None]:
def generate(model, initial_bar, predict_len=31, batch_size=1, temperature=0.85):
  output = []
  hidden, cell = model.init_hidden(batch_size)
  initial_bar = initial_bar.to(device)
  for _ in range(predict_len):
    out, (hidden, cell) = model(initial_bar, hidden, cell)
    output.append(out)
  
  return output

In [None]:
initial_bar = train_dl.dataset.tensors[0][3847]
initial_bar = initial_bar.to(torch.float)
print(f'Input: ')
torch.set_printoptions(threshold=10_000)
print(initial_bar)

output = generate(model, initial_bar, predict_len=1)
print(f'Output: ')
print(output)

Input: 
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0