# MMK Snippets <a id="TOP" name="TOP"></a>

This is a collection of small self-contained code snippets that you can copy & paste in your notebook.


### ****This notebook is only for reference and not to be evaluated as is!****

For an example of how to build your own notebook, checkout the [base `FreqNet` notebook]()

For documentation on the freqnet package and its classes checkout [this link]()


## Table of Content

[Installation](#O.)

- 0. [Variables](#0.)
- I. [Database](#I.)
    - a) [Make a Database](#Ia)
    - b) [get data from a database](#Ib)
- II. [Load & Download Model & Data](#II.)
    - a) [connect to Gdrive](#IIa)
    - b) [download db from neptune](#IIb)
    - c) [download model from neptune](#IIc)
    - d) [load model from checkpoint](#IId)
    - e) [resume training from checkpoint](#IIe)
- III. [Upload & Save stuff](#III.)
    - a) [Upload a db](#IIIa)
    - b) [Upload a model](#IIIb)
    - c) [Manually Save Checkpoint](#IIIc)
    - d) [log audio](#IIId)
    - e) [access neptune experiment object](#IIIe)
    - f) [Log all kind of stuff to your experiment](#IIIf)
- IV. [Look at the logs](#IV.)
- V. [Sampling](#V.)
    - a) [Generate](#Va)
    - b) [Manual Prompt](#Vb)
    - c) [Automatic Prompt](#Vc)
    - d) [Random prompt](#Vd)
    - e) [Same prompt(s) for several models](#Ve)
- VI. [Training Tricks](#VI.)
- VII. [Model attributes](#VII.)
- VIII. [Build your own Model](#VIII.)

# INSTALL `mmk`<a id="O."></a>

## In Colab and on your own PC

In [None]:
# in a terminal :
pip install mmk

# or in a notebook :
!pip install mmk

## Development Version

In [None]:
git clone https://github.com/k-tonal/mmk
# this makes an editable package install (very handy when developing)
pip install -e mmk/

## Load Your neptune API Token

`get_token()` will find your token in the os environment 
if you added it to your `.bashrc` or `.bash_config`.

Else it will ask you to paste it from your clipboard and save it in
the environment.

By default it looks for `"NEPTUNE_API_TOKEN"` and stores the token under this key if you had to paste it.

If you're token is under an other key or you wish to store an other token in your environment, pass the name of the key like this `get_token(key="MY_NEPTUNE_KEY")`

In [None]:
from mmk.kit import get_token

neptune_token = get_token()

# 0. Variables <a id="0."></a> <a name="0."></a>

### throughout the snippets we use consistent variable names. 
### Here we pseudo-exemplify them for reference.
### It is _strongly advised_ to have a similar cell at the begining of your notebook with your own values and "build" your notebook through copying, pasting & adapting the snippets listed here.

This will protect you against typos and mistakes (and possibly against your own chaos...) if you're not an experienced developer.

In [None]:
path_to_audio_files = "MyPc/home/my_audios"
path_to_db = ""
# where files are stored :
path_to_model = ""
path_to_ckpt = path_to_model + "/states/epoch=X.ckpt"

# neptune experiment path
full_exp_path = "account/project/EXP-9"
# sometimes we need just the project path:
neptune_project = "account/project"

db = Database(path_to_db)
model = ModelClass(....)
trainer = get_trainer(root_dir=path_to_model, ....)


[Back to the top](#TOP)

# I. BUILD AND USE `Database`<a id="I."></a><a name="I."></a>

## a) Make a DB <a id="Ia"></a><a name="Ia"></a>

#### Option 1 : with a python function

In [None]:
from mmk.freqnet import freqnet_db

freqnet_db(path_to_db,
           roots=["directory/"],
          files=["file1", "file2"],
           # those are the defaults and can be omitted :
          n_fft=2048,
          hop_length=512,
          sample_rate=22050,
           # specifying a neptune_project ("account/project") 
           # automatically upload the db to a new experiment 
           # in the project 
          neptune_project=None)

#### Option 2 : with a command-line (once you pip installed mmk)

In [None]:
# this takes exactly the same arguments as the function in Option 1 :

!freqnet-db "new_db.h5" --roots "directory/" --files "file1" "file2" --n-fft 1024 --hop-length 256

# for usage and flags shorthands checkout :

!freqnet-db --help

#### Option 3 : manually build a FileWalker and an extract function! chekout mmk.freqnet.freqnet_db for an example!

[Back to the top](#TOP)

## b) Get File / Get Indices <a id="Ib"></a><a name="Ib"></a>

`Database` keeps the files flatten in a large array under a feature attribute (in `freqnet_db`s the fft are in the `db.fft` attribute) which can be indexed like a `numpy` array.

In addition, the positions of each file in the flatten array is stored in `db.metadata`, a `pandas.Dataframe` with columns `"name", "start", "stop", "duration"`.

In [None]:
from mmk.data import Database

db = Database(path_to_db)

# retrieve from frame 5 to frame 10 :

frames = db.fft[5:10]


# retrieve a whole file :

file_array = db.fft.get(db.metadata.iloc[[0]])

[Back to the top](#TOP)

# II. Load & Download Model & Data <a id="II."></a><a name="II."></a>

## a) Connect to GDrive <a id="IIa"></a><a name="IIa"></a>
this is pasted from the colab snippets

In [None]:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive

[Back to the top](#TOP)

## b) Download DB from neptune <a id="IIb"></a><a name="IIb"></a>

In [None]:
from mmk.data import download_database

database_name = "mydb.h5"
db = download_database(neptune_token, full_exp_path, database_name)

[Back to the top](#TOP)

## c) Download Model <a id="IIc"></a><a name="IIc"></a>
by default this download all model's subdirectories in the current working directory of the ipython kernel.
To download the files somewhere else, just pass a `destination="where/I/want/the/subdirectories/"`

In [None]:
from mmk.kit import download_model

download_model(full_exp_path, destination=path_to_model)

[Back to the top](#TOP)

## d) Load Checkpoint <a id="IId"></a><a name="IId"></a>

In [None]:
# IMPORTANT : all models in mmk.freqnet need a data_object argument when reloading

# import the class you want to load and Database

from mmk.freqnet import FreqNet
from mmk.data import Database

db = Database(path_to_db)
model = FreqNet.load_from_checkpoint(path_to_ckpt, data_object=db.fft)

[Back to the top](#TOP)

## e) Resume Training from Checkpoint <a id="IIe"></a><a name="IIe"></a>

In [None]:
# import the class you want to load

from mmk.freqnet import FreqNet
from mmk.kit import get_trainer


# FreqNetModel needs a valid data_object when loading :

model = FreqNet.load_from_checkpoint(path_to_ckpt, data_object=db.fft)

# this resume training from a model checkpoint (path_to_ckpt) and the last_optim_state.pt in the states/ directory

trainer = get_trainer(model=model,
                      resume_from_checkpoint=path_to_ckpt,
                      # if ckpt was made at epoch=2, 
                      # the following will train for 2 more epochs:
                      max_epochs=4,
                      epochs=1,
                      root_dir=path_to_model,
                      neptune_api_token=api_token,
                      neptune_project=neptune_project,
                      # passing an id lets you continue your experiment 
                      # in neptune too!
                      neptune_exp_id=neptune_exp_id
                      )

# trainer.fit(fnet)

[Back to the top](#TOP)

# III. UPLOAD & SAVE STUFF <a id="III."></a><a name="III."></a>

## a) Upload DB <a id="IIIa"></a><a name="IIIa"></a>

this creates a new Experiment in the specified project and upload the db in the `artifacts`

In [None]:
from mmk.data import upload_database, Database

db = Database(path_to_db)
upload_database(db, neptune_token, neptune_project, "My Experiment Name")

[Back to the top](#TOP)

## b) Upload Model (Checkpoints, Audios, Logs) to neptune <a id="IIIb"></a><a name="IIIb"></a>

In [None]:
# if `model` has just been trained with a neptune Experiment :

model.upload_to_neptune()

# if `model` hasn't just been trained :

model.upoad_to_neptune(root_dir=path_to_model,
                       experiment=neptune_exp_object)

[Back to the top](#TOP)

## c) Manually Save Checkpoint <a id="IIIc" name="IIIc"></a>

In [None]:
import os

# add a checkpoint in the model's states/ directory (where they belong...)

trainer.save_checkpoint(os.path.join(trainer.default_root_dir, "states", "test.ckpt"))

[Back to the top](#TOP)

## d) Log Audio <a id="IIIc" name="IIIc"></a>

These method will always save an audio file in `path_to_model/audios`
& in any neptune or TestTube loggers / experiments if the model is bound to some or if you passed some through
the `experiments=[...]` keyword argument

In [None]:
# some audio tensor as reutrned by FreqNet.generate(...)
audio_tensor = torch.randn(1, 10000)
audio_filename = "name_of_the_saved_file"

# if model just have been trained with some loggers:
model.log_audio(audio_filename, audio_tensor)

# if the data was made with non-default sample-rate :
model.log_audio(audio_filename, audio_tensor, sample_rate=db.fft.attrs["sr"])

# if you want to log specific Experiment objects (neptune or testtube)
model.log_audio(audio_filename, audio_tensor, experiments=[neptune_exp_object])

[Back to the top](#TOP)

## e) Access Experiment Object <a id="IIIe" name="IIIe"></a>

In [None]:
# if you want the experiment object whithout having trained a model :

from mmk.kit import get_token, get_neptune_experiment

neptune_token = get_token()
experiment = get_neptune_experiment(neptune_token,
                                    "account/project/EXP-0")

# if you have a model :

# each logger has its own Experiment object.

experiment =  model.logger.experiment 
# can then be a list of several Experiment objects....

# if you just want the neptune one :
experiment = model.neptune_experiment


[Back to the top](#TOP)

## f) Log all kind of stuff to your experiment <a id="IIIf" name="IIIf"></a>

all loggers / experiment objects offer similar logging possibilities, often under varying methods names.

Checkout the documentation for the [`TestTubeLogger`](https://pytorch-lightning.readthedocs.io/en/stable/generated/pytorch_lightning.loggers.TestTubeLogger.html#pytorch_lightning.loggers.TestTubeLogger) and its experiment object : the [`tensordboard summary_writer`](https://pytorch.org/docs/stable/tensorboard.html).
Or the one for the [`NeptuneLogger`](https://pytorch-lightning.readthedocs.io/en/stable/generated/pytorch_lightning.loggers.NeptuneLogger.html#pytorch_lightning.loggers.NeptuneLogger) and its [`Experiment`](https://docs.neptune.ai/api-reference/neptune/experiments/index.html) object.

[Back to the top](#TOP)

# IV. LOOK AT THE LOGS <a id="IV." name="IV."></a>

## Listen to logged Audios

if you logged audios in tensorboard or neptune, you can listen to them there, otherwise:

In [None]:
import os
import librosa
from mmk.utils import audio

path_to_audios = os.path.join(path_to_model, "audios")

for audio_file in os.listdir(path_to_audios):
    signal, _ = librosa.load(audio_file)
    print(audio_file)
    audio(signal)

## Open Tensorboard

assuming `path_to_model="./"`

In [None]:
!tensorboard --logdir ./logs

[Back to the top](#TOP)

# V. SAMPLING <a id="V." name="V."></a>

## a) Generate <a id="Va" name="Va"></a>

In [None]:
output = model.generate(prompt, n_steps=2048,
                        hop_length=db.fft.attrs["hop_length"])

## b) Manual Prompt <a id="Vb" name="Vb"></a>

`Database` stores all files in flat array. Informations about the files you extracted are stored in `db.metadata` which is a `panda.Dataframe`.
You can then pick prompts anywhere in the db or in specific files

In [None]:
db = Database(path_to_db)

# in all cases we get the length of the prompt from the model 

prompt_length = model.receptive_field

# anywhere in the db :

# this need to be smaller than db.fft.shape[0] - prompt_length
start_index = 1234

# in a specific file :

file = db.metadata.iloc[[3]]
# start index relative to the beginning of the file
start_index = 1234 + file["start"].item()

# finally 

prompt = db.fft[start_index:start_index+prompt_length]

## c) Automatic Prompts <a id="Vc" name="Vc"></a>
simple trick to sample throughout a db

In [None]:
db = Database(path_to_db)
N = db.fft.shape[0]
prompt_length = model.receptive_field

# how many prompts you want
n_prompts = 40

for i in range(0, N, N // n_prompts):
    prompt = db.fft[i:i+prompt_length]

#     then most likely :
#     output = model.generate(prompt, 1000, hop_length=db.fft.attrs["hop_length"])
#     model.log_audio(output, "prompt=%i" % i, sample_rate=db.fft.attrs["sr"])

## d) Random Prompt <a id="Vd" name="Vd"></a>

In [None]:
# all freqnets have methods to get random train and val examples :

# train :

prompt = model.random_train_example()

# validation :

prompt = model.random_val_example()

## e) Same Prompt(s) for several Models <a id="Ve" name="Ve"></a>

In [None]:
# just as an example :

from random import randint
import os

db = Database(path_to_db)
n_prompts = 10
prompts_indices = [randint(0, db.fft.shape[0]) for _ in range(n_prompts)]
# how many steps we want to generate :
n_steps = 1000

all_checkpoints = os.listdir(os.path.join(path_to_model, "states"))

for ckpt in all_checkpoints:
    # load the model
    path_to_ckpt = os.path.join(path_to_model, "states", ckpt)
    model = FreqNet.load_from_checkpoint(path_to_ckpt)

    # loop through the prompts 
    for i in prompts_indices:
        prompt = db.fft[i:i+model.receptive_field]
        output = model.generate(prompt, n_steps, hop_length=db.fft.attrs["hop_length"])
        model.log_audio(output, "prompt=%i" % i, sample_rate=db.fft.attrs["sr"])
        

[Back to the top](#TOP)

# VI. TRAINING TRICKS <a id="VI." name="VI."></a>

## Train several Models (ParameterGrid)

## Select Subset of the Files of a DB

## Add Data-Augmentations

## 16-Bit Precision (not available on all GPUs)

[Back to the top](#TOP)

# VII. MODEL ATTRIBUTES <a id="VII." name="VII."></a>

## hparams

## logger

## datamodule

[Back to the top](#TOP)

# VIII. BUILD YOUR OWN MODEL <a id="VIII." name="VIII."></a>

## Add `mmk` Hooks to your LightningModule

## Bypass `get_trainer`

## Build a `DataModule` with `DataObject` and a `DatasetWrapper`