# General Structure

## test vs training
* is it even possible to predict a test set with this little data? We have notes and chords as guidance for what the test set should be predicted on. If we don't see those notes or chords elsewhere, do we have any chance of succeeding?
* or should we just say fuck it, and try, and if we can make it reasonably far we'll add a few more training inputs (from either Mingus or some other band playing the real book. That way we know how the test set sound could sound)

## inputs/outputs
* amplitude vs time or frequency (what kind of FFT?) vs time
* real book vs time
    * notes on treble, chords on bass?
    * how closely do we have to map it to the song (I think pretty fucking closely)
    * how much data per note/chord (there are modifiers, pauses, etc.)
    * the Magenta Note Sequence actually looks pretty good for this -- why not use their helper functions to create sequences of notes and chords for input? 
* are we predicting the TFTF or the waveform directly

## model RNN
* we are trying to describe a band's interpretation of the real book.
* Does it make sense to build a GAN that goes from note input to sound output, and one that goes from sound output to which band? Absofuckinglutely.
* What are we trying to predict -- the spectograph or the waveform?
* simple RNN structures to try:
    * bidirectional
    * GRU vs LSTM
    * model that encodes all of the notes first, then produces the output all at once.
    * model that produces part of the wave form one time-step at a time (As it's reading in). How does the bi-directional fit into this?
        * would the model need output from last set as input?

### Measures. Format is: note, how long

* #### Pre
    * C (low) 1/8
* #### 1
    * notes
    * F 1/8, A 1/8, A 1/12, F 1/12, A 1/12, B 1/8, A 1/4, F 1/16, D 1/16
    * chords
    * F7#9 (F - A - C - Eb - G#)
    * Db7 (Db - F - Ab - B)
   
* #### 2
    * notes
    * F 1/8, A 1/4, F 1/16, E 1/16, F 3/8, C 1/8
    * chords
    * GbMaj7 (Gb - Bb - Db - F)
    * B#11 7 (B - A - D# - E#)???
* #### 3
    * notes
    * F 1/8, A 1/8, A 1/12, F 1/12, A 1/12, Cb 1/8, B 1/4, F 1/16, E 1/16
    * chords
    * Eb7sus4 (Eb - Ab - Bb - Db) 
    * Db7 (Db - F - Ab - B)
* #### 4
    * notes
    * F 1/8, A 1/4, F 1/16, E 1/16, F 3/8, F 1/8
    * chords
    * Eb7sus4 (Eb - Ab - Bb - Db)
    * F7 (F - A - C - Eb)
* #### 5
    * notes
    * C 1/8, E (octave up) 1/8, E (octave up) 1/12, F 1/12, A 1/12, B 1/8, A 1/4, F 1/16, A 1/16
    * chords
    * Bbm7 (Bb - Db - F - Ab)
    * Db7 (Db - F - Ab - B)
* #### 6
    * notes
    * C 1/8, F (octave up) 1/8, F (octave up) 1/12, F 1/12, B 1/12, E (octave up) 2/12, C 1/12,  A 1/12, E# 1/12, D (low) 1/12
    * chords
    * Gm7 G - Bb - D - F
    * C7#5 C - E - G# - Bb
* #### 7 ?
    * notes
    * A 1/6, C flat 1/6, F 1/6, E sharp 1/6, C 1/6, G 1/6
    * chords
    * D7 D - F# - A - C
    * G7  G - B - D - F
* #### 8
    * notes
    * A 1/2, F 3/8, Cb 1/16, B 1/16
    * chords
    * Db7 (Db - F - Ab - B)
    * GbMaj7 (Gb - Bb - Db - F)
* #### 9
    * notes
    *  A 1/2, F 1/2
    * chords
    * B^7_6 ?
    * Bb7 Bb - D - F - Ab
* #### 10
    * notes
    * B 1/8, A 1/8, F 1/8, E 1/8, Cb 1/8, B 1/8, A 1/8, F 1/8
    * chords
    * C7 C - E - G - Bb
    * Eb7 Eb - G - Bb - Db
* #### 11
    * notes
    * A 1/2, F 1/2
    * chords
    * F7#9 (F - A - C - Eb - G#)
    * Db7 (Db - F - Ab - B)
* #### 12
    * notes
    * C 1/6, B 1/6, A 1/6, F 1/6, E 1/6, C (low) 1/6
    * chords
    * GbMaj#11 G Bb C D Gb (Gb Bb B Db F?)
    * B7b5 B Eb F A

In [None]:
from datetime import datetime
from utilities import array_statistics, read_wav, append_array
import IPython.display as ipd
import librosa, librosa.display
import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf


input_data = [] # [(filename), rb_seq, sr), ...]
measure_1 = np.zeros((1, ))
#x, sr = append_array(x, 22050, total_duration=5, octave=3, note_array=[('C', 1/8.0)])
measure_1, sr = append_array(measure_1, 44100, total_duration=4, octave=4, note_array=[('F', 1/8.0), ('A', 1/8.0), ('A', 1/12.0), ('F', 1/12.0), ('A', 1/12.0), 
                                ('B', 1/8.0), ('A', 1/4.0), ('F', 1/16.0), ('D', 1/16.0)]) 

input_data.append(('pork-pie1.wav', measure_1, sr))


GPPH_DATA_DIRECTORY = '/Users/pbatra/projects/lil_wayne/data/10_17_2018'

gpph_files = [f for f in os.listdir(GPPH_DATA_DIRECTORY) if os.path.isfile(os.path.join(GPPH_DATA_DIRECTORY, f))]
print gpph_files
output_data = [] #[(filename, seq, sr), ...]

for file_ in gpph_files:
    filename = os.path.join(GPPH_DATA_DIRECTORY, file_)
    x, sr = read_wav(filename)
    output_data.append((file_, x, sr))


    


In [None]:
def sample_array_dumbly(x, window):
    total_steps = len(x)
    y = np.zeros((total_steps,))
    for idx in range(total_steps/window):
        y[(idx * window):((idx+1) * window)] = np.mean(x[(idx*window): ((idx+1) * window)])
    return y

def sample_array_tft(x, window):
    array_statistics(x, 44100)
    tft_x = librosa.stft(x)
    print tft_x.shape
    mean_values = np.mean(abs(tft_x), axis = 0)
    print "original non-zero amplitudes: %s" % np.count_nonzero(tft_x[abs(tft_x) > 0])
    tft_x[(abs(tft_x) - window * mean_values) < 0] = 0
    print "reduced non-zero amplitudes: %s" % np.count_nonzero(tft_x[abs(tft_x) > 0])
    itft_tft_x = librosa.istft(tft_x)
    array_statistics(itft_tft_x, 44100)
    return tft_x, itft_tft_x

#print filepath
#ipd.Audio(filepath)
#tft_x, itft_tft_x = sample_array_tft(output_data[0][1], 0.5)
#ipd.Audio(itft_tft_x, rate=sr)

In [None]:
ipd.Audio(output_data[0][1], rate=output_data[0][2])

In [None]:
ipd.Audio(input_data[0][1], rate=input_data[0][2])

In [None]:
def create_placeholders(hyperparameters):
    """
    create placeholders for input data
    """
    x = tf.placeholder(tf.float32, shape=[hyperparameters['batch_size'], 
                                          hyperparameters['time_steps'], 
                                          hyperparameters['input_features']], 
                                          name = 'x') 
    y_true = tf.placeholder(tf.float32, shape=[hyperparameters['batch_size'], 
                                          hyperparameters['time_steps'], 
                                          hyperparameters['input_features']], 
                                          name = 'y_true') # output wav (-1, 1)
    return x, y_true

In [None]:
def forward_propagation(x, hyperparameters):
    """
    returns -1 to 1 output
    """
    cell = tf.nn.rnn_cell.LSTMCell(hyperparameters['num_units'],
                                   use_peepholes=False,
                                   cell_clip=None,
                                   initializer=tf.contrib.layers.xavier_initializer(),
                                   num_proj=1,
                                   proj_clip=1.0,
                                   num_unit_shards=None,
                                   num_proj_shards=None,
                                   forget_bias=1.0,
                                   state_is_tuple=True,
                                   activation=tf.nn.tanh,
                                   reuse=None,
                                   name=None,
                                   dtype=tf.float32
                                  )
    outputs, state = tf.nn.dynamic_rnn(cell=cell,
                                      inputs=x,
                                      sequence_length=None,
                                      initial_state=None,
                                      dtype=tf.float32,
                                      parallel_iterations=None,
                                      swap_memory=False,
                                      time_major=False,
                                      scope=None
                                  )
                                   
    return outputs, state

In [None]:
def compute_cost(outputs, y_true):
    """
    Computes the cost
    """
    cost = tf.losses.mean_squared_error(labels=y_true, predictions=outputs)
    return cost

In [None]:
def run_model(training_seq, true_seq):
    hyperparameters = {}
    hyperparameters['input_features'] = training_seq.shape[2] #(pure input wav for now, moving to something else later that will include chords, maybe hz?)
    hyperparameters['batch_size'] = training_seq.shape[0]
    hyperparameters['time_steps'] = training_seq.shape[1]
    hyperparameters['num_units'] = 100
    hyperparameters['learning_rate'] = 0.009
    hyperparameters['training_epochs'] = 100
    hyperparameters['display_step'] = hyperparameters['training_epochs']/10
    output_results = {}
    model_start_time = datetime.now()

    costs  = []
    predictions = []


    tf.reset_default_graph()
    tf.set_random_seed(1)
    x, y_true = create_placeholders(hyperparameters)
    outputs, state = forward_propagation(x, hyperparameters)
    cost = compute_cost(outputs, y_true)
    optimizer = tf.train.AdamOptimizer(learning_rate = hyperparameters['learning_rate']).minimize(cost)
    #variable_saver = tf.train.Saver() #in case we just want to reload variables at some point

    with tf.Session() as sess:
        init = tf.global_variables_initializer()
        sess.run(init)
        #writer = tf.summary.FileWriter(folder + '_logs', sess.graph)
        #writer.close()
        # Train
        for epoch in range(hyperparameters['training_epochs']):
            epoch_start_time = datetime.now()
            #randomly shuffle training_indices
            #np.random.shuffle(randomized_training_indices)
            #for mb_index in range(0, training_size, hyperparameters['minibatch_size']):
            #    mb_indices = randomized_training_indices[mb_index:mb_index + hyperparameters['minibatch_size']]
            #    mb_training_input = training_input[mb_indices]
            #    mb_training_output=  training_output[mb_indices]
            _, cost_step, prediction_step =sess.run(
                                                    (optimizer, cost, outputs),
                                                    feed_dict=
                                                    {x: training_seq, y_true: true_seq})
            costs.append(cost_step)
            predictions.append(prediction_step)
            if epoch % hyperparameters['display_step'] == 0:    
                print "epoch: %s, %s, %s" % (epoch, cost_step, datetime.now() - epoch_start_time)

        print "epoch: %s, %s, %s" % (epoch, cost_step, datetime.now() - epoch_start_time)
        #DONE
        print "\ttotal_time: %s" % (datetime.now() - model_start_time)
        return costs, predictions

In [None]:
batches = 1
time_steps = 50000
training_batches = np.zeros([batches, time_steps,1])
output_batches = np.zeros([batches, time_steps, 1])
for b in range(batches):
    training_batches[b,:,0] = input_data[b][1][:time_steps]
    output_batches[b,:,0] = output_data[b][1][:time_steps]
costs, predictions = run_model(training_batches, output_batches)


In [None]:
array_statistics(training_batches[0,1000:2000,0], sr=44100)

In [None]:
print len(predictions)
epoch = 0
predictions[epoch].shape
playable = np.squeeze(predictions[epoch][0,:,0])
delta = np.squeeze(predictions[epoch][0,:,0] - output_batches[0,:,0])
array_statistics(playable[1000:2000], 44100)
array_statistics(delta, 44100)
ipd.Audio(playable, rate = 44100)
#ipd.Audio(delta, rate = 44100)

In [None]:
playable = np.squeeze(output_batches[0,:,0])
array_statistics(playable, 44100)
ipd.Audio(playable, rate = 44100)