### Connect to gdrive

In [None]:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive/MyDrive

### Install mimikit

In [None]:
!pip install git+https://github.com/ktonal/mimikit@experiment/new-data-again

### cd into project's repo

In [None]:
%cd AIMC-Submission-2021

## Imports and setup neptune

In [None]:
import matplotlib.pyplot as plt
import torch 

from mimikit.models.sample_rnn import SampleRNN
from mimikit import get_trainer, audio, NeptuneConnector
from mimikit.audios import transforms as A

nc = NeptuneConnector(user="k-tonal",
                      setup={
                          "model": "experiment-Stimme",
                          "trained": "experiment-Stimme/EXS-1"
                      })

## Make db from list of files

In [None]:
from dbs import DBS

# Add your files combinations in dbs.py and load them here

DBS

In [None]:
##########  Options for creating DBS ###########

# for instance :

files = DBS["throat"]

# or when you downloaded a model :

# files = net.hparams.files

# or directly so :

# files = ["Laura Newton.m4a", "Perotin.mp3", "Stimmung.mp3"]

###################################################

files_paths = ["./data/" + file for file in files]

db = SampleRNN.db_class.make("/content/tmp-db.h5", files=files_paths, sr=16000, mu=255)

db

### download and load model

In [None]:
nc.download_experiment("trained", destination="/content/sample_rnn/", artifacts="states/")

net = SampleRNN.load_from_checkpoint("/content/sample_rnn/" + nc.setup["trained"].split("/")[-1] + "/states/epoch=2.ckpt", db=db)

net

## About the model's args


- the [original paper](https://arxiv.org/pdf/1612.07837.pdf) has some recommendations, and the [Dadabots](http://dadabots.com/nips2017/generating-black-metal-and-math-rock.pdf) made a pretty good job at describing their experiments.


- the most important arg is `frame_sizes` (see [the original repo](https://github.com/soroushmehr/sampleRNN_ICLR2017) for a visual aid to this)

    - `SampleRNN` doesn't have "layers", it has "tiers" : small models that process `frame_size` inputs at a time and combine their outputs with the outputs of the previous tier.
    
    - `frame_sizes` argument determine how many samples each tier (from top to bottom!) processes at a time. The repo's image corresponds to `frame_sizes=(16, 4, 4)` for tier 3, 2, 1, which does, in fact work pretty well...
    
    - **IMPORTANT!** you can have as many tiers as you want, but :
    
        1. the two last tiers must have the same `frame_size`
        
        2. dividing a tier's frame_size by the next tier's frame_size should always result in an exact integer. e.g. 
            - (128, 4, 4) => **yes** because 128 / 4 == 32
            - (12, 11, 11) => **no** because 12 / 11 == 1.0909090909090908
            
        3. The first frame_size has to be smaller or equal to an other arg : `batch_seq_length`

    - the original paper says `(8, 2, 2)` worked best. Dadabots used only 2 tiers, probably `(4, 4)` or similar. With this implementation you could go wild and do `(256, 128, 64, 32, 16, 8, 4, 2, 2)` or even more...
    

- Each tier but the last has a Recurrent Network with 1 or more layers. The `n_rnn` argument specifies how many layers **per tier**. It seems to me that it starts working when the whole model has a total of at least 4 rnns : e.g. `frame_sizes=(8, 2, 2)` & `n_rnn=2` corresponds to 4 rnns total (last tier always has 0 rnns). Dadabots made their streams with 2 tiers and the top tier had between 5 and 9 rnns...


- `*_dim` arguments are very similar to `model_dim` in `FreqNet`. 

    - `net_dim` is the most important and will greatly influence the trade-off between speed & expressivity. It could have been named `model_dim` because most of the network's parameters will have sizes proportional to `net_dim`. `512` works well. Maybe you can go down to `256` for more speed or up to `1024` for more expressivity... Definitely worths playing with!
    
    - `emb_dim` is just for a few parameters and might not be very important. `256` works, but I would expect so would `128` or `64`, maybe even `32`... More than `256` could be too much but, honnestly, IDUNO!... :)
    
    - `mlp_dim` is for the tipp of the model (which makes the prediction). `512` works. Once again, I'm not sure how relevant this `dim` is...


- `max_epochs` : it seems SampleRNN generates quite well very early! Values for the loss that resulted in cool outputs for me were around 1.6 to 1.9 and this was after just a few epochs! It seems even that training too long results in long silent outputs, this happened to me after 100 epochs and a loss around 1.4.


- `max_lr` : it also seems that SampleRNN withstands high learning rates, which also means faster training! As a comparaison, freqnet starts to diverge with `max_lr > 1e-3` but here `5e-3` works, even if it's probably already at the limit... If the loss starts to increase, fall back to `max_lr=1e-3` and you should be fine.
    
    
- the values used in the next cell seem to work quite well. In doubt, use them.

In [None]:
net = SampleRNN(db=db, 
                frame_sizes=(16, 4, 4),
                net_dim=512,
                emb_dim=256,
                mlp_dim=512,
                n_rnn=2,
                max_lr=1e-3,
                div_factor=2.,
                betas=(.9, .9),
                batch_size=128,
                batch_seq_len=512,
                ##### params for monitoring : #######
                test_every_n_epochs=2,
                n_test_warmups=10,
                n_test_prompts=2,
                n_test_steps=16000,
                test_temp=0.5,
               )

net.hparams.files = files
net.hparams.model_class = "SampleRNN"

net.hparams

### trainer

In [None]:
trainer = get_trainer(root_dir="/content/sample_rnn",
                      max_epochs=100,
                      epochs=[10, 25, 49],
                     # comment these if you don't want to track with neptune :
                     model=net,
                     neptune_connector=nc,
                     )

## train

In [None]:
trainer.fit(net)

nc.upload_model("model", net, artifacts=("states", ))

## Generation

Generating with SampleRNN is quite flexible!

1. The model has an internal state that is suppose to encode what the model has seen _until now_. So before we let him run loose, we can "warm up" the model with a prompt of `n_warmups` batches. (I'm not sure if it changes much for the outputs but it's worth playing with...)


2. The generation method has 2 modes : deterministic and probabilistic. In the first, the output will always be the same for the same prompt/warm-up. But the second mode samples the outputs from probabilities that can be modified with a very interesting parameter : `temperature`.
    - `temperature` must be bigger than 0. and altough it could theoretically be greater than 1., values above 1. might not be so interesting because :
    - the higher the `temperature`, the "noisier" the output. The lower the temp, the more "frozen" the output. It is called "temperature" because it corresponds to some heat equations : more heat = particles move faster, less heat = particles stop moving. Musically, it means : hotter = more contrasts, cooler = longer sounds.
    - Concretly, I recommend starting around `temperature=0.5` and going tiny bits up & down....
    
    
3. Because generating in time-domain is much slower than in freq-dom, **generation is split in 2 cells**:
    - the first gets a **new prompt** and do some warmups
    - the second generates and **appends** the results to what has been previously generated.
This way, you can evaluate the first once and evaluate the 2nd several times. This beats waiting 30min to discover that it generated 30 seconds of silence...


4. You can generate `n_prompts` at the same time (like in redundance rate). This is much faster than generating one prompt at a time.


5. Because feeding data to SampleRNN is a bit complex, you'll have to stick to random prompts from the training data for now...

### initialize generation

In [None]:
# WARM-UP

n_prompts = 8
n_warmups = 20


new = net.warm_up(n_warmups, n_prompts)
        
new.size()

### re-evaluate the next cell for generating further with the same prompts

In [None]:
########### PLAY WITH THOSE : ###################

# to use the deterministic mode, set temperature to None

temperature = 0.5

# 1 second = 16000 steps !

n_steps = 32000

###############################################

## LOS GEHT'S!

new = net.generate(new, n_steps, decode_outputs=True, temperature=temperature)


for i in range(new.size(0)):

    y = new[i].squeeze().cpu().numpy()

    print("prompt number", i)
    plt.figure(figsize=(20, 2))
    plt.plot(y)
    plt.show()

    audio(y, sr=16000)

### log selected prompts to neptune

In [None]:
# ...
from random import randint

numbers_to_log = [0, 3, 5]

for i in numbers_to_log:
    y = new[i].squeeze().cpu().numpy()
    
    net.log_audio("output_id%i" % randint(0, 1e5), y, sample_rate=16000)