Getting Started
=============

Mimikit is a python package for composers and musicians to dive into sound generation and manipulation by machine learning tools. This guide will show you how to create a neural network that can learn and generate audio data. It is not required to have strong python coding skills to use Mimikit, though you will have to learn (or know) some basic principles of the python programming language. Using deep learning tools to synthesize sound requires a workflow quite different from current state-of-the-art synthesis platforms. It is also worth noting that it is not necessarily easy (or even possible) to control the process to get exactly the results you imagine. If you are in the mood for experiments or you just want to get a bit better understanding of neural audio synthesis, this is the right tool for you.

For this guide, we will use ipython notebooks as it is the easiest and most convenient way to work with mimikit. It is important to understand that deep learning cannot be done efficiently on a CPU. Deep learning is mostly done by using a GPU (graphics processing unit). While it is not impossible to run models on a CPU, it will simply take too long to train any meaningful network. For anybody who does not own a GPU (or does not know how to set up the environment to run computations on it), we recommend using Google Colab (you will need to have a google account). Colab is an online platform that allows users to create and run ipython notebooks on a GPU and it is free of charge. The amount of time and memory you can use on Colab is limited, but to get started with mimikit and train some models, Colab will work just fine. We will now open this guide as a notebook in Colab.

1.  Sign up for a google account if you don't already have one.
2.  go to https://colab.research.google.com/notebooks/intro.ipynb and sign in with your google account
3.  Select "Open notebook" from the "file" menu item
4.  Select "github" and enter "k-tonal/mimikit-notebooks" into the textbox (as seen in the image below)
5.  Find the file "Getting_Started.ipynb" from the list and click on it. 

![](imgs/find-mmk-notebooks.jpg "get notebooks from github")



Using mimikit on colab
---------------------------------

Ipython notebooks are structured into "Cells". A "Cell" contains either text or executable code. You can recognize the code cells by the formatting and the two square brackets on the left of the Cell [ ] If you click on the square brackets the code in the cell will be executed. But before doing so, let's make sure that we are set up to use a GPU. Go to the menu and select "Runtime" -> "Change runtime type". A popup will appear in which you can select "GPU" as your hardware accelerator.

Now we are ready to run some code. To use mimikit in a notebook it first needs to be installed into the Colab runtime. This is done by running the following cell.


In [None]:
# install the mimikit package

!pip install mimikit

Import statements
---------------------------

Python uses a module system.  To use classes and functions defined in a package they need to be imported.  We import a couple of things here and use them in the comming steps of this guide.

In [None]:
from mimikit import get_trainer
from mimikit.data import Database, freqnet_db
from mimikit.freqnet import FreqNet

Connect your Gdrive (google drive)
---------------------------------------------------
We need a place to upload sound files and to save results. Colab itself has some memory for saving and exchanging data, but it is volatile and disappears as soon as the Colab runtime is disconnected. We highly recommend avoiding to save anything there and use google drive instead. Everything you upload to your Gdrive as well as everything saved by mimikit on your Gdrive will remain there until you decide to delete it.

If you run the following cell you will be directed to a link to authorize Colab to access your Gdrive.


In [None]:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive/My\ Drive

Prepare audio data for training a model
--------------------------------------------------------

We are going to train a FreqNet model. FreqNet is a neural network that is trained on audio data in the frequency domain. After training the model can generate data that should resemble the data it was trained on. Sometimes it recreates parts of the data unaltered and sometimes it might go into loops or jump around and stretch and compress material in time.

To train the model we need to convert some sound files into a format that FreqNet can read. Let's assume you have a soundfile called 'cat_sounds.wav' you want to use as training data (but of course you can select any files you want).

First, we need to upload our soundfile, then we can run the following cell to create an H5 file (which is a convenient format that FreqNet can read).


In [None]:
from google.colab import files

uploaded = files.upload()

In [None]:
# Create a H5 file called cat_sounds.h5 from all audio files found in the musicdata folder

db = freqnet_db('cat_sounds.h5', roots=['musicdata'])

Create a FreqNet object and prepare for training
---------------------------------------------------------------------

FreqNet has parameters that specify the details of the model architecture.  The speed at which the model learns and the characteristics of the outputs it will produce depend on the given parameters.  To learn more about the FreqNet architecture look [here]

In [None]:
model = FreqNet(data_object=db.fft,
                model_dim=1024,
                groups=1,
                n_layers=(4,),
                with_skip_conv=False,
                with_residual_conv=False,
                accum_outputs=0,
                concat_outputs=0,
                pad_input=0)

trainer = get_trainer(max_epochs=100,
                      epochs=[30, 60, 99],
                      root_dir=path_to_model)

Start the training
-------------------------

This will take quite some time (depending on the size of the input data and the model parameters).
For five minutes of audio data, this could range anywhere between maybe 4 to 10 hours.  You can spend this time starring at the progress bar or find something else to do.  See you in a couple of hours for the next part of the guide.

In [None]:
trainer.fit(model)