## GRUV - A Modified Implementation

As the name implies, this will be my work on turning the GRUV repo from Stanford into a usable music-making utility, for music producers' benefit. Contributions are welcome, as this is made to turn a research project into a useful tool in a music maker's toolkit. Intermixed below will be the `README.md` contents for reference.

## GRUV

GRUV is a Python project for algorithmic music generation using recurrent neural networks.

Note: This code works with Keras v. 0.1.0, later versions of Keras may not work.

For a demonstration of our project on raw audio waveforms (as opposed to the standard MIDI), see here: https://www.youtube.com/watch?v=0VTI1BBLydE

Copyright (C) 2015 Matt Vitelli matthew.vitelli@gmail.com and Aran Nayebi aran.nayebi@gmail.com



## Dependencies


In order to use GRUV, you will first need to install the following dependencies:

Theano: http://deeplearning.net/software/theano/#download

Keras: https://github.com/fchollet/keras.git

NumPy: http://www.numpy.org/

SciPy: http://www.scipy.org/

LAME (for MP3 source files): http://lame.sourceforge.net/ 

SoX (for FLAC source files): http://sox.sourceforge.net/

Once that's taken care of, you can try training a model of your own as follows:


## Step 1. Prepare the data

Copy your music into ./datasets/YourMusicLibrary/ and type the following command into Terminal:
>    python convert_directory.py

This will convert all mp3s in ./datasets/YourMusicLibrary/ into WAVs and convert the WAVs into a useful representation for the deep learning algorithms.


In [1]:
# Imports from convert_dictionary.py
from data_utils.parse_files import *
from pprint import pprint
import config.nn_config as nn_config

SyntaxError: invalid syntax (parse_files.py, line 113)

In [2]:
def convert_dictionary():
    config = nn_config.get_neural_net_configuration()
    input_directory = config['dataset_directory']
    output_filename = config['model_file'] 

    freq = config['sampling_frequency'] #sample frequency in Hz
    clip_len = 10  #length of clips for training. Defined in seconds
    block_size = freq // 4  #block sizes used for training - this defines the size of our input state
    max_seq_len = int(round((freq * clip_len) / block_size)) #Used later for zero-padding song sequences
    #Step 1 - convert MP3s to WAVs
    new_directory = convert_folder_to_wav(input_directory, freq)
    #Step 2 - convert WAVs to frequency domain with mean 0 and standard deviation of 1
    convert_wav_files_to_nptensor(new_directory, block_size, max_seq_len, output_filename)
    print('Files converted!')

In [3]:
convert_dictionary()

out_shape: (0, 40, 22050)
Flushing to disk...
Done!
Files converted!


  out=out, **kwargs)
  ret, rcount, out=ret, casting='unsafe', subok=False)


## Step 2. Train your model


At this point, you should have four files named YourMusicLibraryNP_x.npy, YourMusicLibraryNP_y.npy, YourMusicLibraryNP_var.npy, and YourMusicLibraryNP_mean.npy.

YourMusicLibraryNP_x contains the input sequences for training
YourMusicLibraryNP_y contains the output sequences for training
YourMusicLibraryNP_mean contains the mean for each feature computed from the training set
YourMusicLibraryNP_var contains the variance for each feature computed from the training set

You can train your very first model by typing the following command into Terminal:
>    python train.py

Training will take a while depending on the length and number of songs used
If you get an error of the following form:
Error allocating X bytes of device memory (out of memory). Driver report Y bytes free and Z bytes total
you must adjust the parameters in train.py - specifically, decrease the batch_size to something smaller. If you still have out of memory errors, you can also decrease the hidden_dims parameter in train.py and generate.py, although this will have a significant impact on the quality of the generated music.

### Setup Notes

I found that my Jupyter session was using `/usr/bin/python2.7` instead of the expected path at `/home/ubuntu/src/anaconda3/envs/gruv2/bin/python`.

Some of the commands used below helped solve this issue:

```bash
  880  clear
  881  tmux attach
  882  sys.executable
  885  conda install ipykernel
  888  python -m ipykernel install --user
  889  tmux attach
  890  history
```

In [3]:
import sys
print(sys.executable)
import keras


/home/ubuntu/src/anaconda3/envs/gruv2/bin/python


In [9]:
from __future__ import absolute_import
from __future__ import print_function
import numpy as np
import os
import nn_utils.network_utils as network_utils
import config.nn_config as nn_config

ModuleNotFoundError: No module named 'keras'

In [None]:
def train():
    #TODO Config can be global for this notebook.
    config = nn_config.get_neural_net_configuration()
    inputFile = config['model_file']
    cur_iter = 0
    model_basename = config['model_basename']
    model_filename = model_basename + str(cur_iter)

    #Load up the training data
    print ('Loading training data')
    #X_train is a tensor of size (num_train_examples, num_timesteps, num_frequency_dims)
    #y_train is a tensor of size (num_train_examples, num_timesteps, num_frequency_dims)
    X_train = np.load(inputFile + '_x.npy')
    y_train = np.load(inputFile + '_y.npy')
    print ('Finished loading training data')

    #Figure out how many frequencies we have in the data
    freq_space_dims = X_train.shape[2]
    hidden_dims = config['hidden_dimension_size']

    #Creates a lstm network
    model = network_utils.create_lstm_network(num_frequency_dimensions=freq_space_dims, num_hidden_dimensions=hidden_dims)
    #You could also substitute this with a RNN or GRU
    #model = network_utils.create_gru_network()

    #Load existing weights if available
    if os.path.isfile(model_filename):
        model.load_weights(model_filename)

    num_iters = 50             #Number of iterations for training
    epochs_per_iter = 25    #Number of iterations before we save our model
    batch_size = 5            #Number of training examples pushed to the GPU per batch.
                            #Larger batch sizes require more memory, but training will be faster
    print ('Starting training!')
    while cur_iter < num_iters:
        print('Iteration: ' + str(cur_iter))
        #We set cross-validation to 0,
        #as cross-validation will be on different datasets
        #if we reload our model between runs
        #The moral way to handle this is to manually split
        #your data into two sets and run cross-validation after
        #you've trained the model for some number of epochs
        history = model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=epochs_per_iter, verbose=1, validation_split=0.0)
        cur_iter += epochs_per_iter
    print ('Training complete!')
    model.save_weights(model_basename + str(cur_iter))

## Step 3. Generation


After you've finished training your model, it's time to generate some music!
Type the following command into Terminal:
>    python generate.py

After some amount of time, you should have a file called generated_song.wav

Future work:
Improve generation algorithms. Our current generation scheme uses the training / testing data as a seed sequence, which tends to produce verbatum copies of the original songs. One might imagine that we could improve these results by taking linear combinations of the hidden states for different songs and projecting the combinations back into the frequency space and using those as seed sequences. You can find the core components of the generation algorithms in gen_utils/seed_generator.py and gen_utils/sequence_generator.py
