<a href="https://colab.research.google.com/github/littlejacinthe/Audio_Generation/blob/main/Gendyn_GAN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Gendyn GAN is a combination of Dynamic Stochastic Synthesis and audio GAN for sound generation. 

References:

https://csound.com/docs/manual/gendy.html : Csound documentation for Gendy

https://csound.com/docs/ctcsound/cookbook.html : Csound Python API, ctcsound

https://github.com/chrisdonahue/wavegan : WaveGAN, raw audio GAN model

https://www.kaggle.com/mrhippo/generating-bird-sound-with-simple-gans : GAN application to audio

https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html : Pytorch DCGAN tutorial

First, you'll have to install Csound in your environment.

Instructions to install Csound from : 
[Csound-Github](https://github.com/csound/csound/blob/develop/BUILD.md#fedora)


Note: If you don't have an audio device in your environment, just install a virtual one like pulseaudio.

In [None]:
#install dependencies
! pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html 
! pip install torchaudio==0.6.0

# make sure you have the right version of kaggle installed (you can skip this if you already have version 1.5.6)
! pip uninstall -y kaggle
! pip install --upgrade pip
! pip install kaggle==1.5.6

# python library for csound
!pip install ctcsound

In [None]:
import numpy
from audio import *
from fastai.basics import *
import torch
import ctcsound
import time
import librosa

In [None]:
#make sure you're on gpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

Next, we need to install [csoundmagics](https://github.com/csound/ctcsound/blob/master/cookbook/05-installingCsoundmagics.ipynb), the extension of Csound for Jupyter. For that we'll need files in the ctcsound repo

In [None]:
! git clone https://github.com/csound/ctcsound.git

The following cell indicates us where to move the files we need for csoundmagics

In [None]:
import notebook
import os.path
import site
import shutil

# Copy csoundmagics in user site-packages dir
dest_magics = site.getsitepackages()[0]
print('Location for csoundmagics.py:\n%s' % dest_magics)
#shutil.copy("csoundmagics/csoundmagics.py", dest_magics)

# Copy csound mode in codemirror
dest_csmode = os.path.join(notebook.DEFAULT_STATIC_FILES_PATH, "components", "codemirror", "mode", "csound")
print('Location for csoundmode.js:\n%s' % dest_csmode)
if not os.path.exists(dest_csmode):
    os.mkdir(dest_csmode)
#shutil.copy("csoundmagics/csound.js", dest_csmode)

# Copy custom.js in jupyter dir
dest_custom = os.path.join(notebook.extensions.jupyter_config_dir(), "custom")
print('Location for custom.js:\n%s' % dest_custom)
if not os.path.exists(dest_custom):
    os.mkdir(dest_custom)
#shutil.copy("csoundmagics/custom.js", dest_custom)

Csound score

In [None]:
# this definition will create a csound instance and generate a one second long audio file with the Gendy opcode. 
# we will use it when training our deep learning model, instead of the Gaussian distribution used in most generators

def gendy_noise():
    
    #reload everytime we call the function to make sure the buffers are empty
    %reload_ext csoundmagics 
    c = ICsound(port=12894)
    c.startEngine()
    # Our Orchestra for our project
    orc = """
    sr=44100
    ksmps=32
    nchnls=2
    0dbfs=1

    instr 1 

    aout gendy 0.7, 1, 1, 1, 1, 20, 1000, 0.5, 0.5, 40
    outs aout, aout
    endin"""

    #c = ctcsound.Csound()    # create an instance of Csound
    c.setOption("-odac")  # Set option for Csound
    c.setOption("-m7")  # Set option for Csound
    c.compileOrc(orc)     # Compile Orchestra from String

    sco = "i1 0 1\n"
    

    c.sendScore(sco)
    c.readScore(sco)     # Read in Score generated from notes 
    
    c.startRecord('gendyn.wav')
    c.start()
    #c.perform()
    
    time.sleep(1) #wait one second as our score is one second long
    c.stopRecord()
    c.stopEngine()
    c.reset()
     
    #x, sr = librosa.load('gendyn.wav')
    x, sr = soundfile.read('gendyn.wav') #read the file we just created
    torch_noise = torch.from_numpy(x).to(device) #transform it into a torch tensor, on gpu
    size = len(torch_noise) * 2 # multiplying by 2 as the sound file originally has 2 channels (stereo), but we're downmixing to mono / 1 channel
    pre_shape = int(size) / int(bs) 
    shape = int(pre_shape) #shape needs to be an int
    new_noise = torch.reshape(torch_noise, ([16, 1, shape])).to(dev, dtype=torch.float) #reshape the tensor so it's ready for the Generator
        
    return new_noise, shape
    del c #delete the csound instance 


In [None]:
noise, shape = gendy_noise() #test if it works

In [None]:
# the dataset is stored on Kaggle so first let's make a directory for the kaggle key
!mkdir ~/.kaggle/ 

In [None]:
# move the .json file with your kaggle api key to the directory we just made, you have to upload your own !
! mv kaggle.json ~/.kaggle/

In [None]:
# DL the dataset 
# This dataset is made of 718 audio files (mp3) from the Free Music Archive website. All files from this dataset have both tags free-jazz and improv. 
# Dataset is around 10GB so make sure you have the disk space required.
! kaggle datasets download -d jacinthecarlier/improvisation

In [None]:
# unzip the dataset to the appropriate directory
! unzip /content/improvisation.zip -d /content/jazz

In [None]:
tfms=None #no transforms needed
data_folder = '/content/jazz/Jazz' #path of the folder containing sound files
bs = 16 #batch size, adapt if necessary

In [None]:
#Configurate sounds :
# - experiment different segment_size, the bigger the size the bigger the network, 
# - 16kHz sample rate (lower than the standard 44100Hz because it makes smaller files which is easier to process) --> doesn't work well so use 44100 Hz to get better results
# - downmix from stereo to mono so the files have one channel only = 1 Dimension
config_segment = AudioConfig(segment_size = 1000, resample_to=44100, downmix=True)

In [None]:
db_audio = (AudioList.from_folder(data_folder, config=config_segment) #load them in a list, downmix, resample and sgement 
                .split_none().label_empty() #no split no label, this is not a classification model
                .transform(tfms=tfms) #no transforms necessary here, we're using the wave samples as is
                .databunch(bs=bs)) #adapt batch size if necessary

In [None]:
train_set = db_audio.dl(ds_type = DatasetType.Train) #transforms the fastai audio databunch in a pytorch dataloader

In [None]:
#Generator Model
ngf = 128 #our input can be divided by 128 so we can use it as reference just like it's done for image models
nc = 1 #input dim = 1 channel as we are using one dimensional waveforms

class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu 
        self.main = nn.Sequential(
            
            nn.ConvTranspose1d(nc, ngf * 8, 2, 2, 1),  
            nn.Dropout(p=0.2), #dropout helps stabilize training
            #nn.BatchNorm1d(ngf * 8), #batchnorm is not recommended for waveform models, and it didn't make the model more efficient during experiments
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.ConvTranspose1d(ngf * 8, ngf * 4, 2, 2, 4),  
            nn.Dropout(p=0.2),
           # nn.BatchNorm1d(ngf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # 
            nn.ConvTranspose1d(ngf * 4, ngf * 2, 2, 2, 4),
            nn.Dropout(p=0.2),
           # nn.BatchNorm1d(ngf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            #
            nn.ConvTranspose1d(ngf * 2, ngf, 2, 2, 4),
            nn.Dropout(p=0.2),
          #  nn.BatchNorm1d(ngf),
            nn.LeakyReLU(0.2, inplace=True),
            #
            nn.ConvTranspose1d(ngf, 1, 2, 2, 4),
            nn.Tanh(),

        )

    def forward(self, input):
        return self.main(input)

In [None]:
ngpu=1
dev = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu") # getting gpu
torch.cuda.is_available() #checking if gpu is available 

# Create the generator
netG = Generator(ngpu).to(dev)

# Print the model
print(netG)

from torchsummary import summary
summary(netG,(1, 5504))

In [None]:
ndf = 128
nc = 1

class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # 
            nn.Conv1d(nc, ndf, 2, 8, 2),
            nn.LeakyReLU(0.2, inplace=True),
            # 
            nn.Conv1d(ndf, ndf * 2, 2, 8, 2),
            nn.LeakyReLU(0.2, inplace=True),
            # 
            nn.Conv1d(ndf * 2, ndf * 4, 2, 8, 2),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv1d(ndf * 4, ndf * 8, 2, 8, 2),
            nn.LeakyReLU(0.2, inplace=True),
            # 
            nn.Conv1d(ndf * 8, 1, 2, 9, 2),
            nn.Sigmoid(),


        )

    def forward(self, input):
        return self.main(input)

In [None]:
# Create the Discriminator
netD = Discriminator(ngpu).to(dev)

# Print the model
print(netD)

summary(netD,(1,22144)) #22144 = size of input

In [None]:
# Initialize BCELoss function
criterion = nn.BCELoss()

# Create batch of latent vectors that we will use to visualize
#  the progression of the generator
fixed_noise = torch.randn(bs, 1, 22144, device=dev)

# Establish convention for real and fake labels during training
real_label = 1.
fake_label = 0.

lr = 0.0002 #learning rate

# Setup Adam optimizers for both G and D
optimizerD = optim.SGD(netD.parameters(), lr=lr, weight_decay=0.2) 
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(0.9, 0.999), weight_decay=0.2)

In [None]:
# Training Loop

# Lists to keep track of progress
samples_list = [] # the loop will generate samples while training and put them in this list
G_losses = []
D_losses = []
iters = 0
num_epochs = 10
b_size = 16

print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
    # For each batch in the dataloader
    for i, data in enumerate(train_set, 0):
        
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        netD.zero_grad()
        # Format batch
        real_cpu = data[0].to(device)
        new_cpu = torch.reshape(real_cpu, ([bs, 1, 128 * 173])).to(dev) #fastai_audio prepares data as 2 dimensional so we put it back to 1D
        b_size = new_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=dev) # labelling our dataset as real data
        # Forward pass real batch through D
        output = netD(new_cpu).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item()

        ## Train with all-fake batch
        # Generate batch of latent vectors
        noise, shape, size = gendy_noise()
        if noise.shape[2] < 1000: #if the gendy_noise function sends an audio file too small it won't go through the model
            noise, shape, size = gendy_noise() # so let's make sure it does
        # Generate fake image batch with G
        fake = netG(new_noise)
        # Classify all fake batch with D
        output = netD(fake.detach()).view(-1)
        label_size = int(len(output))
        label = torch.full((label_size,), fake_label, dtype=torch.float, device=dev) #labelling data as fake
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        # Add the gradients from the all-real and all-fake batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = netD(fake).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item()
        # Update G
        optimizerG.step()
        
        # Output training stats
        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(train_set),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
        
        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())

        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(train_set)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise) #testing our model on random numbers
            samples_list.append(fake) #putting generated samples in a list

        
        iters += 1
            

In [None]:
a = torch.stack(samples_list) #list -> tensors

size = a.shape[0] * a.shape[1] * a.shape[2]
ready = torch.reshape(a, [1,size]) #reshape to make mono or stereo audio waveform

b = ready.detach().cpu() 
c = b.numpy() #transform from torch tensor to numpy array

In [None]:
import IPython.display as ipd
ipd.Audio(c, rate=44100) #read audio from numpy array