**<center><h1>Automatic music genaration using RNN</center>**


# **1. Problem Statement**

- **ABC music prod. pvt.ltd** is a reknowned **audio-video production house** based out of **Mumbai, India**

- As **COVID-19** cases are increasing day by day it is almost impossible for the musicians to cope up with real time studio work.

- Hence, the company wants you to make an **AI based music genaration system.**

- The goal of this project is to make an **AI based music genaration system.**

- The key contraint to the problem is **accuracy.**


<center><img src = "https://cdn.dribbble.com/users/316072/screenshots/10724786/laptop_music_animation_01_1600x1200.gif"></center>

### **Scenario**

- You have been hired as a **freelance data scientist** for **ABC music prod. pvt.ltd**

- The model should read a text file in **abc format**.

- The model should genarate the **corresponding music** framed out of that note sequence.


# **2. Importing libraries**

In [1]:
import numpy as np                                     # Importing numpy
from keras.models import Sequential, load_model        # Importing Sequential layers and load_model
from keras.layers import LSTM, Dropout                 # Importing LSTM and Dropout layers
from keras.layers import TimeDistributed               # Importing TimeDistributed layers
from keras.layers import Dense, Activation, Embedding  # Importing Dense, Activation and Embedding layers
import os                                              # Importing OS
import json                                            # Importing JSON
import argparse                                        # Importing Argsparse

# **3. Data Reading**

- In this step we will **read the data** from the **input text corpus**.
- We will use **UTF-8** encoding for making the data

In [2]:
with open('/content/input.txt','rb') as f:
      input_text = f.read()
input_text=str(input_text,'utf-8')

In [3]:
input_text

'X: 1\nT:A and D\n% Nottingham Music Database\nS:EF\nY:AB\nM:4/4\nK:A\nM:6/8\nP:A\nf|"A"ecc c2f|"A"ecc c2f|"A"ecc c2f|"Bm"BcB "E7"B2f|\n"A"ecc c2f|"A"ecc c2c/2d/2|"D"efe "E7"dcB| [1"A"Ace a2:|\n [2"A"Ace ag=g||\nK:D\nP:B\n"D"f2f Fdd|"D"AFA f2e/2f/2|"G"g2g ecd|"Em"efd "A7"cBA|\n"D"f^ef dcd|"D"AFA f=ef|"G"gfg "A7"ABc |[1"D"d3 d2e:|[2"D"d3 d2||\n\n\nX: 2\nT:Abacus\n% Nottingham Music Database\nS:By Hugh Barwell, via Phil Rowe\nM:6/8\nK:G\n"G"g2g B^AB|d2d G3|"Em"GAB "Am"A2A|"D7"ABc "G"BAG|\n"G"g2g B^AB|d2d G2G|"Em"GAB "Am"A2G|"D7"FGA "G"G3:||:\n"D7"A^GA DFA|"G"B^AB G3|"A7"^c=c^c A^ce|"D7"fef def|\n"G"g2g de=f|"E7"e2e Bcd|"Am"c2c "D7"Adc| [1"G"B2A G3:|\n [2"G"B2A G2F||"Em"E2E G2G|B2B e2e|"Am"c2A "B7"FBA|"Em"G2F E3|"Em"EFG "Am"ABc|\n"B7"B^c^d "Em"e2e|"F#7"f2f f2e|"B7"^def BAF|"Em"E2E G2G|B2B e2e|\n"Am"c2A "B7"FBA|"Em"G2F E3|"Em"EFG "Am"ABc|"B7"B^c^d "Em"e2e|\n"F#7"f2e "B7"^def |[1"Em"e3 "D7"d3:|[2"Em"e3 "E7"e3||\n\n\nX: 3\nT:The American Dwarf\n% Nottingham Music Database\nS:FTB, via EF\nM:6

In [4]:
len(input_text)

129665

In [5]:
input_text [:100]

'X: 1\nT:A and D\n% Nottingham Music Database\nS:EF\nY:AB\nM:4/4\nK:A\nM:6/8\nP:A\nf|"A"ecc c2f|"A"ecc c2f|"A"'

# **4. Data Preprocessing**


- In this step we will make the data into the desired format.
- We will be converting each and every character into a integer.
- Then we will create a dictionary of it.

In [6]:
def generate_keys(text):
    #Charecter to index dictionary
    char_to_idx = {ch:idx for idx,ch in enumerate(sorted(list(set(text))))}

    #Index to character dicrionary
    idx_to_char = {idx:ch for ch,idx in char_to_idx.items()}

    #Printing the charecter lengths
    print("length of the  charecter to index ",len(char_to_idx))
    print("length of the  index to charecter ",len(idx_to_char))
    return char_to_idx,idx_to_char

In [7]:
# Genarating the charecter to index and index to charecter values
char_to_idx, idx_to_char = generate_keys(input_text)

length of the  charecter to index  86
length of the  index to charecter  86


# **5. Batch genaration**


- In this section we will be genarating batches in order to train our model.
- We can see that the **length is 129,665**
- We will be using a **batch size of 16** and a **sequence length of 64**
- Hence, the **number of batches** are going to be **((129,665/16)/ 64)** = 126
- Now since it is a sequence data we will divide the 1st **8104 char in 126 batches** , each batches will have these char in the 1st row.
- Similarly from **8105 - 16209th** chars will be divided into 126 batches (each batch will have 64 sequence) and will be added at the 2nd row of each batches
- This is how at **8104th interval** we will take char and divide them into batches and put them in respective rows of batches
- So we will have the **row wise continuation of the sequence** for different batches.  
- In this way we can keep the sequence information in the text data.

In [8]:
import pandas as pd
pd.DataFrame({"keys":char_to_idx.keys(),"values":char_to_idx.values})

Unnamed: 0,keys,values
0,\n,<built-in method values of dict object at 0x7a...
1,,<built-in method values of dict object at 0x7a...
2,!,<built-in method values of dict object at 0x7a...
3,"""",<built-in method values of dict object at 0x7a...
4,#,<built-in method values of dict object at 0x7a...
...,...,...
81,x,<built-in method values of dict object at 0x7a...
82,y,<built-in method values of dict object at 0x7a...
83,z,<built-in method values of dict object at 0x7a...
84,|,<built-in method values of dict object at 0x7a...


In [9]:
# Here we are genarating respective batches to fit into the model
def generate_batchs(T, vocab_size):
    length = T.shape[0] # length = 129665
    batch_char = int(length / batch_size);
    for start in range(0, 126*64,64):
        X = np.zeros((batch_size, batch_sequence))
        Y = np.zeros((batch_size, batch_sequence, vocab_size))

        # Getting each batch index and column index
        for batch_index in range(0,batch_size):
            for col_index in range(0,batch_sequence):
              X[batch_index, col_index] = T[batch_char * batch_index + start + col_index]
              Y[batch_index, col_index, T[batch_char * batch_index + start + col_index+1]] = 1
        yield X,Y

In [10]:
# Checking the size of the vocabulary
vocab_size = len(char_to_idx)
vocab_size

86

# **6. Modelling**
- In this step we will be making the **deep learning model**.
- We will train the model for a **100 epochs.**
- After every **10** epochs we will save the respective weights into the **main directory**.

In [11]:
# Defining a batch size of 16 and sequence length of 64
batch_size = 16
seq_len = 64
batch_sequence = 64

# Getting the model directory
MODEL_DIR = '/content/'

# Defining a function for saving weights
def save_weights(epoch, model):
    if not os.path.exists(MODEL_DIR):
        os.makedirs(MODEL_DIR)
    model.save_weights(os.path.join(MODEL_DIR, 'weights.{}.h5'.format(epoch)))

# Defining a function for loading weights
def load_weights(epoch, model):
    model.load_weights(os.path.join(MODEL_DIR, 'weights.{}.h5'.format(epoch)))

# Defing a function for model development
def build_model(batch_size, seq_len, vocab_size):
    model = Sequential()
    model.add(Embedding(vocab_size,
                        512,
                        batch_input_shape=(batch_size, seq_len)))

    for i in range(3):
        model.add(LSTM(256, return_sequences=True, stateful=True))
        model.add(Dropout(0.2))
    ## Using Time Distributed Dense Layer for each return sequences
    model.add(TimeDistributed(Dense(vocab_size)))
    model.add(Activation('softmax'))
    return model

In [12]:
# Defining a train function to train the model such that for every 10 epochs the model will return respective weights
def train(text, epochs=100, save_freq=10):

    #model_architecture
    model = build_model(batch_size, batch_sequence, vocab_size)
    print(model.summary())
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


    #Train data generation
    T = np.asarray([char_to_idx[c] for c in text], dtype=np.int32) #convert complete text into numerical indices

    print("Length of text:" + str(T.size)) #129,665

    steps_per_epoch = (len(text) / batch_size - 1) / batch_sequence

    for epoch in range(epochs):
        print('\nEpoch {}/{}'.format(epoch + 1, epochs))
        losses, accs = [], []
        for i, (X, Y) in enumerate(generate_batchs(T, vocab_size)):
            loss, acc = model.train_on_batch(X, Y)
            losses.append(loss)
            accs.append(acc)
        print('epoch {}: loss = {}, acc = {}'.format(epoch + 1, np.mean(loss), np.mean(acc)))


        if (epoch + 1) % 10 == 0:
            save_weights(epoch + 1, model)
            print('Saved checkpoint to', 'weights.{}.h5'.format(epoch + 1))

In [13]:
train(input_text)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (16, 64, 512)             44032     
                                                                 
 lstm (LSTM)                 (16, 64, 256)             787456    
                                                                 
 dropout (Dropout)           (16, 64, 256)             0         
                                                                 
 lstm_1 (LSTM)               (16, 64, 256)             525312    
                                                                 
 dropout_1 (Dropout)         (16, 64, 256)             0         
                                                                 
 lstm_2 (LSTM)               (16, 64, 256)             525312    
                                                                 
 dropout_2 (Dropout)         (16, 64, 256)             0

# **7. Music Genaration**
- In this step we will genarate the **abc notes**.
- Later on these notes will be put <a href="https://www.abcjs.net/abcjs-editor.html">here</a>
- Then we will be able to hear the **respective notes.**

In [14]:
# Setting the data and the model directories respectively
DATA_DIR = '/content/'
MODEL_DIR = '/content/'

In [15]:
# Save the weights into the respective model directory path with respective model names
def save_weights(epoch, model):
    if not os.path.exists(MODEL_DIR):
        os.makedirs(MODEL_DIR)
    model.save_weights(os.path.join(MODEL_DIR, 'weights.{}.h5'.format(epoch)))

In [16]:
# Loading the weights into the model
def load_weights(epoch, model):
    model.load_weights(os.path.join(MODEL_DIR, 'weights.{}.h5'.format(epoch)))

In [17]:
# Building the test model with the given vocab size = 86
def build_test_model(vocab_size):
    model = Sequential()
    model.add(Embedding(vocab_size,
                        512,
                        batch_input_shape=(1,1)))
    for i in range(3):
        model.add(LSTM(256,
                       return_sequences=(i != 2),
                       stateful=True))
        model.add(Dropout(0.2))

    model.add(Dense(vocab_size))
    model.add(Activation('softmax'))
    return model

In [18]:
# Genarating the text corpus upon the test model
def generate_music(epoch, num_char):
    with open(os.path.join(DATA_DIR, 'char_to_idx.txt')) as f:
        char_to_idx = json.load(f)

    idx_to_char = dict(zip(char_to_idx.values(), char_to_idx.keys()))

    # Feeding the pre-defined parameters
    vocab_size = len(char_to_idx)
    batch_size = 16
    seq_len = 64

    # calling the test model with vocab size = 86
    model =  build_test_model(vocab_size)
    load_weights(epoch, model)

    sampled = []
    batch = np.zeros((1, 1))
    batch[0, 0] = np.random.randint(vocab_size)
    for i in range(num_char):
        if i ==0:
            #Predicting for the respective batches and returning a flatten array
            result = model.predict_on_batch(batch).ravel()
            sample = np.random.choice(range(vocab_size), p=result)
            sampled.append(idx_to_char[sample])
        else:
            #Reassigning the batch size as of the sample size
            batch[0,0] = np.array([[sample]])

            #Predicting for the respective batches and returning a flatten array
            result = model.predict_on_batch(batch).ravel()
            sample = np.random.choice(range(vocab_size), p=result)
            sampled.append(idx_to_char[sample])
    return ''.join(sampled)

In [19]:
DATA_DIR

'/content/'

In [20]:
with open(os.path.join(DATA_DIR, 'char_to_idx.txt')) as f:
    char_to_idx = json.load(f)

In [21]:
generate_music(10, 512)

'"D7/d"c2B|"G"G3 G2::\n"D"Bcd "A7"aga|"D"dcd "G7"g2g|"A"e2B A2A|"A"A+m"ct"DEA d3|"Am"edc "A7"^AGF|"D"A3 "D"DAG|"Dm"A3 -A3:|\nP:B\ng/2e/2|"G"gaa d^fg|"A"=3g f2f/gc/2g/2A/2|eeB d2c/3|"G"gde d2d|\n"C7"ggf af^e|"A7"c2B ABA|"D"FFF GGG|"B7"BBB G3|\n"G"GBc "E7"ddd|"C"e3 d2:|\n\n\nX: 369\nT:Le Miergwrt\'hi Poekos\n% Nottingham Music Database\nY:ABBB\nS:Jio Bred. B9aly, via Phil Rowe\nM:6/8\nK:Am\nE|"D"FGm gga|"Am"e2c ccB|"Dm"A2c |"Bb"ecB "A"AFD|"G"BEA "A"dBB|"D"d3 a2A|"A7"c2c "G7"D2F|\n"/"AFA ee:|\n\n\nX: 131\nT:Ma\'y M3ve\'\n% Nottingham '

### **Observation:**

- This is the rendered output of the **input text corpus** that was given to the model.

- This output we will copy and paste <a href="https://www.abcjs.net/abcjs-editor.html">here!</a> and press the **Play button**


# **8. Applications**

- This model can be used for in-house **music production systems.**
- This can be widely used to automate **manual instruments.**
- This can be also used to make **automatic VST(virtual studio toolkit) plugins**

# **10. Conclusion**


- In this project an **Automatic music genaration system** was made from scratch.
- Here, we recieved a **validation accuracy of 90%**.
- This project can be widely used for **music production systems**
- The only limitation of this model is that it is being trained with **very less data.**
- However, while getting trained on various intrument data this model can be further enhanced for different instruments as well.
