# Pitch detection

* Next case is to reconstruct which note was being played from the raw signal
* Experimental setup:
    1. Generate a sequence of notes using the music generator
    2. Render it to a wave using the synthesizer
    3. Chop the wave data into small batches of a 1/10 of a second
    4. Calculate frequency features using Fourier analysis on these batches
    5. For each of those batches, predict which note is being played
        * Time resolution is equal to the batch size, we might have overlaps between notes
    6. Treat it as a multi-class classification problem
        * Each class corresponds to a note being played, multiple notes can be played together
    7. Train a model with a Gated Recurrent Unit as classifier

# How does a Gated Recurrent Unit (GRU) work?

Before we can explain a GRU we need to understand an RNN

## What is an RNN?

A **recurrent** neural network. Normal feed forward neural networks predict only based on an input vector $x$, but in an RNN there is a hidden state that can be updated by $x$

* This is particularly useful for time-series where the data can have **arbitrary length**.

* Output vector $o_t$ at timestep $t$,
* input vector $x_t$ at timestep $t$,
* Hidden state vector $h$ at timestep $t$,
* The matrices $U$, $V$ and $W$ are subject to optimization (learning) 

<img src="images/rnn.png">

Other than the dimensionality of the hidden state we have more decision to make:
* How much samples do we put in state vector $x$, just one? or a batch of samples?
* How large are we going to make our batch? We can only update our weights after running one full batch.
* What are we going to do with incomplete batches?

In Keras the input to an RNN should be of rank 3:
* `batch_size`: number of samples per batch
* `timesteps`: number of samples
* `input_dim`: dimensionality of the input vector

### Possible ways of designing our experiments

* $x$ is one sample, we train the GRU on batches of ~4000 timesteps (0.1 secs) and number of `input_dim` = 1
* $x$ is STFT transformed, `input_dim` = STFT dimensionality, we have only a couple of timesteps
* In `Keras` there is the parameter `return_sequences` when instantiating a RNN layer. Default is set to `false`, however, when set to true it corresponds to the picture above, where for every timestep $t$ there is an output prediction $o_t$.
* In `Keras` we can also set `stateful=True`. This will not reset the hidden state until the end of the batch.

## Gated Recurrent Unit

<img src="images/gru.png">

A GRU is an RNN with the following components:
* A hidden state: this is the recurrent vector, i.e. memory
* A reset operation, `r[t]`, with a sigmoid (sigmoid has values between 0 and 1). Implemented as element-wise multiplication.
* An update of the _proposed_ hidden state, `h[t]` via the `tanh` branch
* A set operation `z[t]`. Also here, the sigmoid with values between 0 and 1 acts as a element-wise decision

The set operation in detail:

$$(1 - z_t) \cdot h_{t-1} + z_t \cdot \hat{h}_t$$ 

which is 

* $h_{t-1}$ for $z = 0$, i.e. keep old state
* $\hat{h}_{t}$ for $z = 1$, i.e. switch to proposed state

N.B. In the formula the symbols represent vectors and the product operator operates element-wise.
For every vector component in the hidden state there can be a set or keep.

## Backup: sigmoid & hyperbolic tangent

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt

with tf.Session():
    x = tf.range(-10, 10, delta=0.1)
    sigmoid = tf.sigmoid(x)
    tanh = tf.tanh(x)
    plt.plot(x.eval(), sigmoid.eval(), label='sigmoid')
    plt.plot(x.eval(), tanh.eval(), label='tanh')    
    plt.legend()