# Generating Samples

In this notebook we show how to generate samples using a trained model.

Before running the notebook make sure to upload the ```dataset.zip``` file, the ```denoising_diffusion_pytorch.py``` file (can be found [here](https://github.com/Lilac-code/music-diffusion/tree/main)), and the ```checkpoint.pth``` file.

If using Kaggle, then upload these as datasets named 'dataset', 'unetfile' and 'checkpoint' respectively.

In our Github directory our pre-trained model can be found as well (named ```checkpoint.pth```).

## Loading the model

Firstly we unzip the ```dataset.zip``` file (we need it to calculate the ratio), and install all the dependencies.

In [None]:
!unzip dataset.zip

In [None]:
!pip install ema_pytorch
!pip install einops
!pip install accelerate

In [None]:
from PIL import Image
import os
from torchvision import transforms as T, utils
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam
import numpy as np
from numba import cuda
# the unet implementation from https://github.com/lucidrains/denoising-diffusion-pytorch
from denoising_diffusion_pytorch import Unet

transform = T.Compose([T.ToTensor()])

In [None]:
segments=[]
for img in os.listdir('dataset'):
  f = os.path.join('dataset', img)
  image=Image.open(f)
  image=transform(image)
  segments.append(image)

In [None]:
def calc_ratio():
  ratio=0
  for im in segments:
    ratio+=torch.sum(im).item()
  ratio/=len(segments)
  ratio/=len(segments[0][0])
  ratio/=len(segments[0][0][0])
  return ratio

This function will become handy later, as an easy way to evaluate a generated sample.

In [None]:
def calc_sample_ratio(sample):
  ratio=0
  ratio+=np.sum(sample)
  ratio/=sample.size
  return ratio

In [None]:
checkpoint_path = './checkpoint.pth'

Load model (if the model was trained on 2 GPUs then execute the cell after this)

In [None]:
unet = Unet(dim=48, channels=1, resnet_block_groups=3, dim_mults=(1, 2, 4, 4))
unet = unet.cuda()
checkpoint = torch.load(checkpoint_path)
unet.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
print('model loaded')

Loading model that was trained on 2 GPUs

In [None]:
unet = Unet(dim=48, channels=1, resnet_block_groups=3, dim_mults=(1, 2, 4, 4))
unet = unet.cuda()
checkpointm = torch.load('./checkpoint.pth')

from collections import OrderedDict
checkpoint = OrderedDict()
for k, v in checkpointm.items():
    if k!='model_state_dict':
        checkpoint[k]=v
        continue
    checkpoint[k]=OrderedDict()
    for k1,v1 in v.items():
        name = k1[7:] # remove `module.`
        checkpoint[k][name] = v1

unet.load_state_dict(checkpoint['model_state_dict'])
params = list(unet.parameters())
optimizer = Adam(params, lr=5e-5)
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
print('model loaded')

This is a function that shows a binary 2D array, in our case we will use it to see the generated binary piano rolls.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def show_image(tens):
  tens=np.asarray(dtype=np.dtype('uint8'),a=tens)
  temp=((tens))*255
  img = Image.fromarray(temp,mode='L').convert('1')
  img.save('temp.png')
  plt.figure()
  plt.imshow(mpimg.imread('temp.png'))

## The get_sample() Function

This function is the one that generates samples. It performs 2 tasks:
- **Unconditional Generation:** By calling the function without any parameters.
- **Infilling:** By calling the function with task='inf', and 'given' is an array that containes the fixed values of the piano roll segment, and -1 in the positions that we want our model to generate.

Also this function can generate segments of varying lengths, but this is not recommended.

In [None]:
def get_sample(length=192,task='gen',given=[0]):
    total_num_steps = 100
    beta = 1.0
    ratio = calc_ratio()
    noisy_intial = np.random.binomial(1, ratio,size=(length,88))
    noisy = np.copy(noisy_intial)
    if task=='inf':
      for i in range(length):
        for j in range(88):
          if given[i][j]!=-1:noisy[i][j]=given[i][j]
    for i in range(total_num_steps):
        noisy_input = noisy.reshape(1, 1, length, 88)  # Add batch and channel dimensions
        noisy_tensor = torch.from_numpy(noisy_input.astype(np.float32)).cuda('cuda')
        time_tensor = torch.unsqueeze(torch.tensor(total_num_steps - i - 1, dtype=torch.float32).cuda('cuda'),0)

        predicted_x0 = unet(noisy_tensor, time_tensor).cpu().detach().numpy()

        threshold = 0.5
        predicted_x0 = predicted_x0 >= threshold

        beta = (total_num_steps-i)/total_num_steps

        delta = predicted_x0 ^ noisy_intial
        mask = np.random.binomial(1, delta*beta)
        noisy = predicted_x0*(1-mask) + noisy_intial * mask
        noisy=noisy[0][0]

        if task=='inf':
          for i in range(length):
            for j in range(88):
              if given[i][j]!=-1:noisy[i][j]=given[i][j]
    return noisy

We noticed that fairly often the model can generate almost empty samples (with almost no notes). By generating samples until the number of notes is above a threshold is an easy but sufficient way to prevent this.

In [None]:
def generate_sample(length=192,task='gen',given=[0],thres=0.01):
  it=1
  while(1):
    sample=get_sample(length=length,task=task,given=given)
    ratio=calc_sample_ratio(sample)
    print("iteration:",it,"sample ratio:",ratio)
    it=it+1
    if ratio>=thres:
      return sample

## Unconditional Generation

In [None]:
sample=generate_sample()
show_image(sample)

This is a function that generates a number of *good* samples and saves them in a zip file.

In [None]:
def generate_samples(n=10):
  if not os.path.isdir('./samples'):
    os.makedirs('./samples')
  for i in range(n):
    sample=generate_sample()
    sample=np.asarray(dtype=np.dtype('uint8'),a=sample)
    temp=((sample))*255
    img = Image.fromarray(temp,mode='L').convert('1')
    img.save('samples/sample'+str(i)+'.png')
  !zip -r '/content/samples.zip' './samples'

In [None]:
generate_samples(n=1)

## Infilling

The ```image.png``` is the content that we want to infill. You can experiment with your files, or files provided in the ```infilling``` folder from the Github directory.

In [None]:
image=Image.open('image.png')
image=transform(image)[0].numpy()
show_image(image)
given=image.copy()

#infill middle section of the segment
for i in range(48,144):
  for j in range(88):
    given[i][j]=-1

#infill voices from top voice
for i in range(192):
  for j in range(88):
    if given[i][j]==0:
      given[i][j]=-1
    else:break

show_image(given)

In [None]:
infsample=generate_sample(task='inf',given=given)
show_image(infsample)