## 1. Data Loading and Preparation

This section focuses on getting our music data ready. We load a dataset of Irish folk tunes in **ABC notation**, which is a text-based format for music. We then shuffle and select a subset of these songs.

### Key Concepts:
*   **ABC Notation**: A simple text format to represent music (notes, rhythms, etc.).
*   **`datasets` library**: Used to easily load the "irishman" dataset.
*   **Shuffling and Subsetting**: Ensures randomness and manages data size for training.

In [1]:
from datasets import load_dataset
from music21 import converter
import torch
import torch.nn as nn
import torch.optim as optim
import os
import numpy as np
from IPython.display import Audio,display
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [2]:
from torch.cuda.amp import GradScaler, autocast

In [3]:
!pip install comet_ml > /dev/null 2>&1
import comet_ml


In [4]:
!apt-get install -y fluidsynth
!wget -q https://github.com/FluidSynth/fluidsynth/raw/master/sf2/FluidR3_GM.sf2

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  at-spi2-core fluid-soundfont-gm gsettings-desktop-schemas libatk-bridge2.0-0
  libatk1.0-0 libatk1.0-data libatspi2.0-0 libdouble-conversion3 libevdev2
  libfluidsynth3 libgtk-3-0 libgtk-3-bin libgtk-3-common libgudev-1.0-0
  libinput-bin libinput10 libinstpatch-1.0-2 libmd4c0 libmtdev1 libqt5core5a
  libqt5dbus5 libqt5gui5 libqt5network5 libqt5svg5 libqt5widgets5
  librsvg2-common libwacom-bin libwacom-common libwacom9 libxcb-icccm4
  libxcb-image0 libxcb-keysyms1 libxcb-render-util0 libxcb-util1
  libxcb-xinerama0 libxcb-xinput0 libxcb-xkb1 libxcomposite1
  libxkbcommon-x11-0 libxtst6 qsynth qt5-gtk-platformtheme
  qttranslations5-l10n session-migration timgm6mb-soundfont
Suggested packages:
  fluid-soundfont-gs gvfs qt5-image-formats-plugins qtwayland5 jackd
The following NEW packages will be installed:
  at-spi2-core fluid-soundfont

In [5]:
from google.colab import userdata

In [6]:
dataset = load_dataset("sander-wood/irishman")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.json:   0%|          | 0.00/80.0M [00:00<?, ?B/s]



validation.json: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/214122 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2162 [00:00<?, ? examples/s]

In [10]:
songs  = (dataset['train'].shuffle(seed=42).select(range(5000)))['abc notation'] # using only 5000 songs because of hardware limitations

In [11]:
songs

Column(['X:114452\nL:1/8\nQ:1/4=120\nM:4/4\nK:G\n"^Allegro"{B} dcBA G2 G2 | G2 BG FGAe |{e} dcBA G2 d2 | efge{e} dcBc | dcBA G2 G2 | G2 BG FGAe | \n dcBA G2 d2 | efge dcBd || g3 a bgfg | afdf afdf | g3 a bgfg | afdf g2{agf} g2 | g3 a bgfg | \n afdf afdf | gabg afdf | gfed efge ||', 'X:19015\nL:1/8\nQ:1/8=100\nM:4/4\nK:G\n"^moderato" Bc | d3 g edcB | dBdg edcB | cBAB GABc | d3 c B2 d2 |"^5" e2 eg e2 de | gfga b2 ab | \n gfed g2 B2 | A4 G2 || ga |"^9" b2 b2 b2 ag | a>gab a2 ga | b>age dgBc | d3 c B2 d2 | \n"^13" e2 eg e2 de | gfga b2 ab | gfed g2 B2 | A4 G2 |]', 'X:129761\nL:1/8\nM:2/4\nK:A\n E3/2 E/ E2 | E a e2 | e3/2 a/ c A | B4 | e3/2 a/ e2 | e a e d | B A B, D | E4 |]', 'X:8839\nL:1/8\nM:6/8\nK:D\n A | dA(f e)cA | BF(d c)AF | GD(B A)FD | CEE E2 A | dA(f e)cA | dF(d c)AF | GD(B A)dD |{F} E3 D2 :: \n (g2 e) (f2 d) | (c2 A) (d2 f) | (g2 e) (f2 d) | cee e2 e | (g2 e) (f2 d) | (c2 A) (d2 f) | \n (g2 e) (f2 d) | cee e2 A | dA(f e)cA | BF(d c)AF | GD(B A)FD | CEE E2 A | dA(f e)cA | BF(d c)A

In [19]:
song = songs[33]
print(song)

X:3276
L:1/8
M:4/4
K:Emin
 (G>F) | E2 (B>c) B2 AB | (cBAG) F2 (G>F) | E2 (e>f) g2 (f>e) | (^d>efd) B2 (AB) | 
 (cBAG) F2 (G>A) | (BGFE){e} ^d2 ef | (ge)(f^d) (eB) (A/c/B/A/) | G2 (TF2{EGF} E2) || z B | 
 e2 (e>f) g2 (f>e) | (b>a)(ga){g} f2 e^d | eBef g2 fe | (b>a)(ga){g} f2 AB | (cBAG) F2 (G>A) | 
 (BGFE){e} ^d2 ef | (ge)(f^d) (eB) (A/c/B/A/) | G2 (TF2{EGF} E2) ||


## 2. Music Playback

To hear our generated music, we convert ABC notation into an audio file. This involves:

*   **`music21`**: A Python library that parses ABC notation and converts it into a MIDI file.
*   **MIDI (Musical Instrument Digital Interface)**: A standard digital language for musical instruments to communicate.
*   **`fluidsynth`**: A software synthesizer that takes a MIDI file and a `SoundFont` (a collection of instrument sounds) to create an actual audio file (WAV).

Finally, `IPython.display.Audio` allows us to play this WAV file directly in the notebook.

In [20]:
# Playing the music
score = converter.parse(song)
score.write('midi', 'output.mid')

'output.mid'

In [21]:
!fluidsynth -ni FluidR3_GM.sf2 output.mid -F output.wav -r 44100


FluidSynth runtime version 2.2.5
Copyright (C) 2000-2022 Peter Hanappe and others.
Distributed under the LGPL license.
SoundFont(R) is a registered trademark of Creative Technology Ltd.

fluidsynth: error: fluid_is_soundfont(): fopen() failed: 'File does not exist.'
Parameter 'FluidR3_GM.sf2' not a SoundFont or MIDI file or error occurred identifying it.
Rendering audio to file 'output.wav'..


In [22]:
# Music from the dataset
display(Audio('output.wav'))


## 3. Text Processing: Vocabulary and Vectorization

Neural networks need numbers, not text. So, we convert our ABC notation into a numerical format:

*   **Joining Songs**: All ABC strings are combined into one large text.
*   **Vocabulary**: All unique characters from this text form our vocabulary (e.g., 'A', 'B', ':', '\n').
*   **`char2idx` & `idx2char`**: Two lookup tables are created:
    *   `char2idx`: Maps each character to a unique number (e.g., 'A' -> 0).
    *   `idx2char`: Maps each number back to its character (e.g., 0 -> 'A').
*   **Vectorization**: The entire joined text is converted into a sequence of numbers using `char2idx`.
*   **Batching (`get_batches`)**: For training, we create small chunks of this numerical sequence. Each chunk has an `x` (input sequence) and a `y` (the same sequence, shifted by one character, representing the target for prediction).

In [None]:
songs_joined = '/n/n'.join(songs)
vocab = set(songs_joined)

In [None]:
print('Length of the dataset',len(songs_joined))
print('Length of the vocabulary',len(vocab))


Length of the dataset 1474566
Length of the vocabulary 92


In [None]:
# creating lookup tables
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(list(vocab))

print(idx2char)

['2' ',' '\n' 'a' '3' 'r' 'X' 'b' '&' 'E' 'D' '_' 'h' 'm' '4' '}' 'k' 's'
 'C' '(' 'x' 'v' '>' 'J' 'I' 'M' 'c' 'u' 'd' 'T' '{' 'p' '7' '6' '<' 'Q'
 'w' '"' 'S' '1' 'K' '\\' 'j' 'o' '/' '8' 'V' '9' 'z' '!' 'i' 'L' '5' 'l'
 '+' 'n' 'N' '|' '#' "'" 'P' 'Z' 'g' ')' 'q' '.' 'B' '$' 'U' 'F' ' ' 'R'
 'y' '=' '[' ';' 't' '-' 'O' '0' '~' 'e' ']' 'f' '*' 'A' 'G' 'H' '^' 'W'
 '?' ':']


In [None]:
import pickle
import os

# Define paths for saving, using the consistent checkpoint_dir
char2idx_path = os.path.join(checkpoint_dir, "char2idx.pkl")
idx2char_path = os.path.join(checkpoint_dir, "idx2char.pkl")

# Save char2idx
with open(char2idx_path, 'wb') as f:
    pickle.dump(char2idx, f)
print(f"char2idx saved to {char2idx_path}")

# Save idx2char
with open(idx2char_path, 'wb') as f:
    pickle.dump(idx2char, f)
print(f"idx2char saved to {idx2char_path}")

char2idx saved to /content/drive/MyDrive/ML_Checkpoints/checkpoint_dir/char2idx.pkl
idx2char saved to /content/drive/MyDrive/ML_Checkpoints/checkpoint_dir/idx2char.pkl


In [None]:
def vectorize_string(string):
  return np.array([char2idx[u] for u in string])

In [None]:
vectorized_songs  = vectorize_string(songs_joined)

In [None]:
def get_batches(vectorized_songs, batch_num = 10, seq_len = 30):
  n  = len(vectorized_songs) - 1
  indices  = np.random.choice(n-seq_len, batch_num)
  x = [vectorized_songs[i:i+seq_len] for i in indices]
  y = [vectorized_songs[i+1:i+seq_len+1] for i in indices]
  return torch.tensor(x), torch.tensor(y)

## 4. Model Definition: Long Short-Term Memory (LSTM) Network

Our model uses an **LSTM**, a type of Recurrent Neural Network (RNN) excellent for sequential data like text or music because it has a 'memory'.

### `LTSM_Model` Components:

*   **`nn.Embedding`**: Converts numerical character IDs into dense vector representations (`embedding_size`). This helps the model understand character relationships.
*   **`nn.LSTM`**: The core layer. It processes sequences, maintaining internal 'hidden' and 'cell' states to remember information over time.
*   **`nn.Linear`**: Takes the LSTM's output and projects it back to the size of our vocabulary (`vocab_size`), giving us raw prediction scores (logits) for the next character.
*   **`init_hidden`**: Initializes the LSTM's internal memory states (hidden and cell states) at the beginning of processing a new sequence.
*   **`forward`**: Defines the data flow: embeddings -> LSTM -> linear layer to get predictions.

In [None]:
class LTSM_Model(nn.Module):
  def __init__(self, vocab_size,embedding_size, hidden_size):
    super(LTSM_Model, self).__init__()
    self.vocab_size = vocab_size

    self.embedding_size =  embedding_size
    self.hidden_size = hidden_size
    self.embeddings = nn.Embedding(vocab_size, embedding_size)
    self.lstm = nn.LSTM(embedding_size,hidden_size, batch_first= True)
    self.linear   =  nn.Linear(hidden_size,vocab_size)
  def init_hidden(self,batch_size,device):
    # returning cell states and hidden states
    return torch.zeros(1,batch_size,self.hidden_size).to(device), torch.zeros(1,batch_size,self.hidden_size).to(device)
  def forward(self, x,state = None, return_state = True):
    device = x.device
    batch_size = x.shape[0]
    if state is None:
      state = self.init_hidden(batch_size,device)
    x  = self.embeddings(x)
    x,state = self.lstm(x,state)
    x = self.linear(x)
    return (x,state) if return_state else x

In [None]:
vocab_size = len(vocab)

In [None]:
# testing LTSM_MODEL
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
embedding_size = 128
hidden_size = 1024
model = LTSM_Model(vocab_size, embedding_size, hidden_size)
model = model.to(device)
print(model) # seeing what the model looks like

LTSM_Model(
  (embeddings): Embedding(92, 128)
  (lstm): LSTM(128, 1024, batch_first=True)
  (linear): Linear(in_features=1024, out_features=92, bias=True)
)


In [None]:
x,y  = get_batches(vectorized_songs, 4, 32)
x = x.to(device)
y = y.to(device)
print(model(x)[0])
print(model(x)[0].shape)

tensor([[[ 0.0186,  0.0174, -0.0332,  ...,  0.0001, -0.0073, -0.0282],
         [ 0.0801,  0.0130, -0.0001,  ...,  0.0094, -0.0395,  0.0234],
         [ 0.0348, -0.0082, -0.0091,  ...,  0.0179, -0.0273,  0.0009],
         ...,
         [-0.0045, -0.0247, -0.0118,  ...,  0.0491, -0.0346, -0.0034],
         [-0.0182, -0.0485, -0.0024,  ...,  0.0404, -0.0455, -0.0300],
         [-0.0123, -0.0316, -0.0123,  ...,  0.0461, -0.0507, -0.0275]],

        [[ 0.0331, -0.0199,  0.0021,  ...,  0.0191, -0.0302,  0.0047],
         [ 0.0687, -0.0168,  0.0338,  ...,  0.0326, -0.0247,  0.0350],
         [ 0.0967,  0.0013,  0.0272,  ...,  0.0258, -0.0598,  0.0499],
         ...,
         [ 0.0114, -0.0387, -0.0105,  ...,  0.0521, -0.0477,  0.0027],
         [-0.0093, -0.0583, -0.0007,  ...,  0.0429, -0.0522, -0.0249],
         [-0.0063, -0.0362, -0.0128,  ...,  0.0489, -0.0557, -0.0235]],

        [[ 0.0094, -0.0050, -0.0199,  ...,  0.0288, -0.0310, -0.0265],
         [-0.0176, -0.0180, -0.0038,  ..., -0

## 5. Model Training Setup

This section prepares everything needed to train our LSTM model:

*   **Training Parameters (`params`)**: A dictionary defining key settings like `num_training_iterations`, `batch_size`, `learning_rate`, `embedding_dim`, and `hidden_size`.
*   **Comet ML**: Used for tracking experiment progress, logging parameters and metrics.
*   **Checkpoint Directory**: A folder (`checkpoint_dir`) where the model's learned weights are saved. This allows us to resume training or use the model later.
*   **Loss Function (`nn.CrossEntropyLoss`)**: Measures how well the model's predictions match the actual next characters.
*   **Optimizer (`optim.Adam`)**: Adjusts the model's internal parameters to minimize the loss.
*   **Model Loading**: Before training, it checks if a previously saved model (`my_ckpt`) exists. If so, it loads those weights to continue training; otherwise, it starts from scratch.

In [None]:
params = dict(
  num_training_iterations = 5000,
  batch_size = 64,
  seq_length = 400,
  learning_rate = 3e-4,
  embedding_dim = 256,
  hidden_size = 2048,
)

In [None]:
# now setting up the comet
def create_experiment():
  if "experiment" in globals() and globals()["experiment"] is not None:
          globals()["experiment"].end()
  experiment = comet_ml.Experiment(
    api_key = userdata.get('API_KEY'),project_name = "Irish_music_generator"
  )
  for key,value in params.items():
    experiment.log_parameter(key,value)
  experiment.flush()
  return experiment

In [None]:
checkpoint_dir = '/content/drive/MyDrive/ML_Checkpoints/checkpoint_dir'
checkpoint_prefix = os.path.join(checkpoint_dir, "my_ckpt")
os.makedirs(checkpoint_dir,exist_ok=True)

In [None]:
def loss(labels,logits):
  labels = labels.view(-1)
  logits = logits.view(-1,logits.shape[-1])
  return criterion(logits,labels)

In [None]:
model = LTSM_Model(vocab_size, params["embedding_dim"], params["hidden_size"]).to(device)
optimizer = optim.Adam(model.parameters(), lr = params["learning_rate"])
criterion = nn.CrossEntropyLoss()


# Change the path to load from the latest checkpoint
drive_model_path = checkpoint_prefix# Use the checkpoint_prefix defined earlier

# Load the model state_dict if a checkpoint exists
if os.path.exists(drive_model_path):
    print(f"Loading model from {drive_model_path}")
    checkpoint = torch.load(drive_model_path, map_location=device)
    # Directly load the model's state_dict from the checkpoint
    model.load_state_dict(checkpoint)
    print("Model loaded successfully.")
else:
    print(f"No checkpoint found at {drive_model_path}. Model will be initialized from scratch.")

Loading model from /content/drive/MyDrive/ML_Checkpoints/checkpoint_dir/my_ckpt
Model loaded successfully.


In [None]:
scalar = GradScaler()

  scalar = GradScaler()


## 6. Training Loop

This is where the model learns. We define a single training step and then repeat it many times:

### `one_train_step` Function:
1.  **Sets `model.train()`**: Prepares the model for training.
2.  **`optimizer.zero_grad()`**: Clears old gradients.
3.  **Data to Device**: Moves input/target data to GPU (if available).
4.  **`model.init_hidden()`**: Resets LSTM memory for each new batch.
5.  **`autocast()`**: Uses mixed precision (e.g., `float16`) for faster, more memory-efficient training on GPUs.
6.  **Forward Pass**: Input goes through the model to get predictions (`logits`).
7.  **Loss Calculation**: Computes the difference between predictions and actual targets.
8.  **Backward Pass**: Calculates gradients (how much each parameter contributed to the error).
9.  **Optimizer Step**: Updates model weights based on gradients.
10. **`GradScaler`**: Manages gradient scaling for mixed precision training.

### Loop Execution:
*   The loop runs for `num_training_iterations`.
*   In each iteration, it gets a batch, performs `one_train_step`, and logs the loss to Comet ML.
*   It periodically saves model checkpoints and prints progress.

In [None]:

def one_train_step(model,x, y):

  model.train()
  optimizer.zero_grad()
  x = x.to(device)
  y = y.to(device)
  model.init_hidden(params["batch_size"],device)
  with autocast():
    logits = model(x, return_state = False)
    loss_val = loss(y,logits)
  scalar.scale(loss_val).backward()
  scalar.step(optimizer)
  scalar.update()
  return loss_val
experiment = create_experiment()
for i in range(params["num_training_iterations"]):
  x,y = get_batches(vectorized_songs, params["batch_size"], params["seq_length"])
  loss_val = one_train_step(model,x,y)
  experiment.log_metric("loss",loss_val,step = i)
  if i % 100 == 0:
    torch.save(model.state_dict(),checkpoint_prefix)
  if i % 50 == 0:
    print(f'iteration: {i} out of {params['num_training_iterations']} , loss: {loss_val}')
torch.save(model.state_dict(),checkpoint_prefix)


[1;38;5;39mCOMET INFO:[0m Experiment is live on comet.com https://www.comet.com/habib-ghulam-bheek-habib/irish-music-generator/c8df66faec954976a7dff412ea8608af

[1;38;5;39mCOMET INFO:[0m The process of logging environment details (conda environment, git patch) is underway. Please be patient as this may take some time.
[1;38;5;39mCOMET INFO:[0m Couldn't find a Git repository in '/content' nor in any parent directory. Set `COMET_GIT_DIRECTORY` if your Git Repository is elsewhere.
[1;38;5;39mCOMET INFO:[0m Uploading 22 metrics, params and output messages
  with autocast():


iteration: 0 out of 5000 , loss: 11.159259796142578
iteration: 50 out of 5000 , loss: 1.9176827669143677
iteration: 100 out of 5000 , loss: 1.6694402694702148
iteration: 150 out of 5000 , loss: 1.4426945447921753
iteration: 200 out of 5000 , loss: 1.3529574871063232
iteration: 250 out of 5000 , loss: 1.2737921476364136
iteration: 300 out of 5000 , loss: 1.2518799304962158
iteration: 350 out of 5000 , loss: 1.1747511625289917
iteration: 400 out of 5000 , loss: 1.1748855113983154
iteration: 450 out of 5000 , loss: 1.1162996292114258
iteration: 500 out of 5000 , loss: 1.0914480686187744
iteration: 550 out of 5000 , loss: 1.0977636575698853
iteration: 600 out of 5000 , loss: 1.0709339380264282
iteration: 650 out of 5000 , loss: 1.0419787168502808
iteration: 700 out of 5000 , loss: 0.9909631609916687
iteration: 750 out of 5000 , loss: 1.0091091394424438
iteration: 800 out of 5000 , loss: 0.9668369889259338
iteration: 850 out of 5000 , loss: 0.9759905934333801
iteration: 900 out of 5000 , lo

## 7. Music Generation

After training, we can use our model to create new music in ABC notation. The `generate_music` function works like this:

1.  **Start with a character**: We give the model an initial character (e.g., 'X' to signify a new tune).
2.  **Initialize LSTM state**: The model's internal memory (hidden and cell states) is reset.
3.  **Iterative Prediction**: The model predicts the next character one by one:
    *   The current character is fed into the model along with its internal state.
    *   The model outputs `logits` (prediction scores for all possible next characters) and updates its internal state.
    *   **Sampling**: Instead of picking the absolute most probable character (which can be repetitive), we randomly *sample* a character based on the probabilities. This adds variety to the generated music.
    *   The sampled character becomes the input for the next prediction step.
4.  **Build Sequence**: Each generated character is added to a list until the desired `seq_length` is reached.
5.  **Output**: The final list of characters is joined to form the complete ABC notation.

In [None]:

def generate_music(model, char, seq_length):
  input_seq = torch.tensor([[char2idx[char]]]).to(device)
  sequence = [char]
  # print(sequence, input_seq)
  curr_state =  model.init_hidden(input_seq.shape[0],device)

  # print('here',curr_state)
  for _ in range(seq_length):
    logits,curr_state = model(input_seq,curr_state)
    logits = logits.view(-1, logits.shape[-1])
    # print(logits.shape)
    y = torch.multinomial(torch.softmax(logits,dim = -1),num_samples=1)
    input_seq = y
    # print(input_seq)
    # print(idx2char)
    # print(idx2char[y.item()])
    sequence.append(idx2char[y.item()])
  return ''.join(sequence)
print(generate_music(model,'X',4000))

X:7378
L:1/8
Q:1/8=150
M:2/2
K:G
 D2 |: GABc d2 (cB) | GAGB A2 dc | BdcA GFDC | DB,A,G, B,2 D2 | cdef dcAc | dBAc BGED | 
 !slide!c2 fedc | dcBA G2 |]/n/nX:125178
L:1/8
M:4/4
K:G
 GABd Beef | geaf gece | d2 ef BEEF | GBec BA (3Bcd | efed ed (3Bcd | gedc BAGF | GE (3EFG DEFA | 
 GBAF GEFG | A2 FA DAFA | EADC (3DEF | ECB,A, | A,2 CD | EA,A,B, B,CA,G,| 
 [A,A]3 B [Ae]2 ||/n/nX:40848
L:1/8
Q:1/8=160
M:6/8
K:D
 E2 E F2 A | D2 D DED | F2 F FED | E2 A d2 e | fdB AFE | F2 D DED | F2 D D2 :: F | 
 AFA dcB | AFA A3 | dAA A2 g | fed fdB | def A2 d | ABA AFA | dcd efg | fdB Bcd :|2 d3 dcd | 
 fed edA | B3 dBf | dfa afd | cAe e3 | dfd ecA | BAF A2 d | faa afe | Adf d2 :|/n/nX:178539
L:1/8
Q:1/8=232
M:4/4
K:G
|: dB GB dB cA | dB gB Ac Ag | fa dg ef d | c A2 G FE :|/n/nX:10342
L:1/8
M:4/4
K:F
 (GA) (cF) | F4 G2 | g2 f2 (f2 || F2) (GF) (FE) | (F2 G2) .G2 | A2 (Bc) d(Bd) | (ce) (dB) (cA) F2 | 
 (GB) (dB) (ce) de | (d6 c) | (Bd) e2 f2 g2 | (ga) b2 a2 a2 | (gf)(ge) (dB)(AG) | (FA) (d2 A2) (g/) | 
 (fe) (