# Music generation using Artificial Intelligence

## Problem: how to generate music automatically using AI ?

This practical exercise is designed to teach how to automatically generate music using a Recurrent Neural Network (RNN).

We do not necessarily need to be music experts to create music. Even someone without expertise in music can generate high-quality compositions using an RNN. Everyone enjoys listening to interesting music, and if there is a way to automatically generate music, especially music of acceptable quality, it represents a significant breakthrough in the music industry.

Task:
Our task is to take existing music data, then train a model using this data. The model must learn the patterns of music, so that after training, it can generate completely new compositions. It cannot simply copy and paste from the training data. Instead, it needs to understand the structure of music in order to create new melodies. While we do not expect our model to generate professional-level music, we aim for it to produce coherent and melodious compositions that are pleasant to listen to.

Now, what exactly is music? Simply put, music is nothing more than a sequence of musical events (notes). The model will automatically generate a new sequence of notes, forming a complete composition. In this practical exercise, we focus only on single-instrument music, as this serves as a demonstration model. However, the model can be further improved to include multi-instrument compositions if necessary.

## Data files (music files in ABC format) are available at:
1. http://abc.sourceforge.net/NMD/

### For training this script uses sample data file:
* Jigs (340 melodies)

# Program script
Import required libraries

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
import json
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import LSTM, Dropout, TimeDistributed, Dense, Activation, Embedding

##Select data folder and data file for training


In [None]:
data_directory = "/content/drive/MyDrive/Data"  # data folder
data_file = "slip.txt"   # music file
charIndex_json = "char_to_index.json"
model_weights_directory = 'Data/Model_Weights/' # Folder for trained model weights
BATCH_SIZE = 18  # model hyperparameter
SEQ_LENGTH = 136  # model hyperparameter

##Data preparation procedure

In [None]:
def read_batches(all_chars, unique_chars):
    length = all_chars.shape[0]
    batch_chars = int(length / BATCH_SIZE) #155222/16 = 9701

    for start in range(0, batch_chars - SEQ_LENGTH, 64):  #(0, 9637, 64)  #it denotes number of batches. It runs everytime when
        #new batch is created. We have a total of 151 batches.
        X = np.zeros((BATCH_SIZE, SEQ_LENGTH))    #(16, 64)
        Y = np.zeros((BATCH_SIZE, SEQ_LENGTH, unique_chars))   #(16, 64, 87)
        for batch_index in range(0, 16):  #it denotes each row in a batch.
            for i in range(0, 64):  #it denotes each column in a batch. Each column represents each character means
                #each time-step character in a sequence.
                X[batch_index, i] = all_chars[batch_index * batch_chars + start + i]
                Y[batch_index, i, all_chars[batch_index * batch_chars + start + i + 1]] = 1 #here we have added '1' because the
                #correct label will be the next character in the sequence. So, the next character will be denoted by
                #all_chars[batch_index * batch_chars + start + i + 1]
        yield X, Y

##Define the architecture of the neural network model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Input

In [None]:
def build_model(batch_size, seq_length, unique_chars):
    model = Sequential([
        # Define an explicit Input layer to enforce batch size
        Input(batch_shape=(batch_size, seq_length)),

        # Embedding layer (batch size is NOT set here)
        Embedding(input_dim=unique_chars, output_dim=512, mask_zero=True),

        # LSTM layers (batch size is now inherited from Input layer)
        LSTM(256, return_sequences=True, stateful=True),
        LSTM(128, return_sequences=True, stateful=True),

        # Output layer
        Dense(unique_chars, activation='softmax')
    ])

    # Compile the model
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

    return model

##Create and train the model.<br>
The model weights are saved in H5 weight files after 10 ... 80 (every 10) epochs.

In [None]:
import os
import json
import numpy as np

In [None]:
def training_model(data, epochs=80):
    # Mapping characters to indices
    char_to_index = {ch: i for i, ch in enumerate(sorted(set(data)))}
    print(f"Number of unique characters in the dataset: {len(char_to_index)}")  # 87

    # Ensure the directory exists before saving the character index JSON
    if not os.path.exists(data_directory):
        os.makedirs(data_directory)

    with open(os.path.join(data_directory, charIndex_json), "w") as f:
        json.dump(char_to_index, f)

    # Create reverse mapping (index to character)
    index_to_char = {i: ch for ch, i in char_to_index.items()}
    unique_chars = len(char_to_index)

    # Build and compile the model
    model = build_model(BATCH_SIZE, SEQ_LENGTH, unique_chars)  # Fix: Correct function name `build_model`
    model.summary()
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

    # Convert characters to numeric indices
    all_characters = np.array([char_to_index[c] for c in data], dtype=np.int32)
    print(f"Total number of characters: {all_characters.shape[0]}")  # Expected output: 155222

    epoch_number, loss, accuracy = [], [], []

    for epoch in range(epochs):
        print(f"\nEpoch {epoch+1}/{epochs}")
        final_epoch_loss, final_epoch_accuracy = 0, 0
        epoch_number.append(epoch + 1)

        # Iterate over batches
        for i, (x, y) in enumerate(read_batches(all_characters, unique_chars)):
            final_epoch_loss, final_epoch_accuracy = model.train_on_batch(x, y)  # Train on each batch
            print(f"Batch: {i+1}, Loss: {final_epoch_loss:.4f}, Accuracy: {final_epoch_accuracy:.4f}")

        # Store loss and accuracy for this epoch
        loss.append(final_epoch_loss)
        accuracy.append(final_epoch_accuracy)

        # Save weights every 10 epochs
        if (epoch + 1) % 10 == 0:
            if not os.path.exists(model_weights_directory):
                os.makedirs(model_weights_directory)
            weights_path = os.path.join(model_weights_directory, f"Weights_{epoch+1}.weights.h5")
            model.save_weights(weights_path)
            print(f"✅ Saved weights at epoch {epoch+1} to {weights_path}")

    print("🎉 Training completed!")

    return model, epoch_number, loss, accuracy

### Model training
Training can require a lot of time.
An already trained model can be used instead, if you have the Weight files from the previous session. <br>

In [None]:
file = open(os.path.join(data_directory, data_file), mode = 'r');
data = file.read()
file.close()
if __name__ == "__main__":
    training_model(data)

##Using the model
Create the music generation model.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dropout, Dense, Input

In [None]:
def make_model(batch_size, unique_chars):
    model = Sequential([
        # Input Layer (Ensures Fixed Batch Size for Stateful LSTMs)
        Input(batch_shape=(batch_size, 1)),

        # Embedding Layer (mask_zero=True helps with padded sequences)
        Embedding(input_dim=unique_chars, output_dim=512, mask_zero=True),

        # First LSTM Layer (Stateful, without batch_input_shape)
        LSTM(256, return_sequences=True, stateful=True),
        Dropout(0.2),

        # Second LSTM Layer (Stateful, not returning sequences)
        LSTM(128, stateful=True),
        Dropout(0.2),

        # Dense Output Layer (with activation)
        Dense(unique_chars, activation="softmax")
    ])

    return model

##Read the weights of the trained model. <br>
Run the model and return the results.

In [None]:
def generate_sequence(epoch_num, initial_index, seq_length):
    with open(os.path.join(data_directory, charIndex_json)) as f:
        char_to_index = json.load(f)
    index_to_char = {i:ch for ch, i in char_to_index.items()}
    unique_chars = len(index_to_char)

    model = make_model(1,unique_chars)
    model.load_weights(model_weights_directory + "Weights_{}.weights.h5".format(epoch_num))


    sequence_index = [initial_index]

    for _ in range(seq_length):
        batch = np.zeros((1, 1))
        batch[0, 0] = sequence_index[-1]

        predicted_probs = model.predict_on_batch(batch).ravel()
        sample = np.random.choice(range(unique_chars), size = 1, p = predicted_probs)

        sequence_index.append(sample[0])

    seq = ''.join(index_to_char[c] for c in sequence_index)

    cnt = 0
    for i in seq:
        cnt += 1
        if i == "\n":
            break
    seq1 = seq[cnt:]
    #above code is for ignoring the starting string of a generated sequence. This is because we are passing any arbitrary
    #character to the model for generating music. Now, the model start generating sequence from that character itself which we
    #have passed, so first few characters before "\n" contains meaningless word. Model start generating the music rhythm from
    #next line onwards. The correct sequence it start generating from next line onwards which we are considering.

    cnt = 0
    for i in seq1:
        cnt += 1
        if i == "\n" and seq1[cnt] == "\n":
            break
    seq2 = seq1[:cnt]
    #Now our data contains three newline characters after every tune. So, the model has leart that too. So, above code is used for
    #ignoring all the characters that model has generated after three new line characters. So, here we are considering only one
    #tune of music at a time and finally we are returning it..

    return seq2

## Music generation

In [None]:
ep = int(input("1. Which model to use (10, 20, 30, ..., 80)? Smaller number may mean less quality: "))
ar = int(input("\n2. Enter the generation seed (any number from 0 to 86): "))
ln = int(input("\n3. Enter the length of sequence (from 300 to 600). Smaller number means shorter musical sequence: "))

ep = 80
ar = 11
ln = 600

In [None]:
music = generate_sequence(ep, ar, ln)

In [None]:
print("\nMusic sequence generated by AI: \n")

In [None]:
print(music)

## Listen to the generated music!
Copy the created sequence to https://editor.drawthedots.com/