# Implementing the `MiniBach` model

## Part 2: From music data to one-hot-encoded arrays

In this step, we take the pre-processed chunks of 4-measure-long chorales and encode them into the input representation of the neural network.

Originally, the chunks specify whether there is a note, *hold* (encoded as `--`), or rest symbol at any given sixteenth note. When the event is a note, the pitch and octave (e.g., `C4`) are specified.

We turn those events into numbers that will be eventually **one-hot-encoded** in the final input vector representation.

The one-hot-encoded vectors of the input (soprano voice) and output (alto, tenor, and bass) are stored as the `numpy` arrays `input.npy` and `output.npy`, by the end of this notebook.


In [None]:
import music21
import pandas as pd
import os
import numpy as np

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 100)

We read the `dataset.csv` generated in the first part of this tutorial.

In [2]:
df = pd.read_csv('dataset.csv')

The 4-measure chunks are named in the following way:

`<name_of_the_choral>_chunk_<number_of_chunk>`

For example, `chor_002.krn_chunk_0`

This name is encoded in the column `file` of the dataset. Therefore, we can iterate over each `chunk` in the dataset and encode its `soprano` as `x` and `(alto, tenor, bass)` as `y` values for the neural network.

In [3]:
chunks = list(sorted(set(df.file.to_list())))

The `MiniBach` architecture considers a different range of notes for each part. The actual range is not specified in the book but the number of notes that belong to the range of a given part are specified:


| Part   |  range  |
|--------|---------|
|Soprano | 20 notes|
|Alto:   | 20 notes|
|Tenor:  | 20 notes|
|Bass:   | 27 notes|

Nevertheless, trying to provide a collection of note ranges that satisfy the constraint and work for all Bach chorales with a 4/4 time signature, in my experience, resulted to be impossible.

A possible explanation for this is that the book used a smaller set of chorales (it doesn't mention how many chorales were used for training).

The minimum ranges I could come up with, that are able to work for all Bach chorales, are the following:

In [4]:
SOPRANO_MIN = 57
SOPRANO_MAX = 81

ALTO_MIN = 52
ALTO_MAX = 74

TENOR_MIN = 48
TENOR_MAX = 69

BASS_MIN = 36
BASS_MAX = 64

ranges = {
    'soprano': {midinumber: (midinumber - SOPRANO_MIN + 1) for midinumber in range(SOPRANO_MIN, SOPRANO_MAX + 1)},
    'alto': {midinumber: (midinumber - ALTO_MIN + 1) for midinumber in range(ALTO_MIN, ALTO_MAX + 1)},
    'tenor': {midinumber: (midinumber - TENOR_MIN + 1) for midinumber in range(TENOR_MIN, TENOR_MAX + 1)},
    'bass': {midinumber: (midinumber - BASS_MIN + 1) for midinumber in range(BASS_MIN, BASS_MAX + 1)},
}

for part, notes in ranges.items():
    print(f'the {part} has a range of {len(notes)} notes (plus the "hold" symbol)')

the soprano has a range of 25 notes (plus the "hold" symbol)
the alto has a range of 23 notes (plus the "hold" symbol)
the tenor has a range of 22 notes (plus the "hold" symbol)
the bass has a range of 29 notes (plus the "hold" symbol)


Using these ranges has of course implications for the neural network, as the number of input and output parameters will be bigger. For this experiment, I decided to make a compromise and use a more complex network in order to use all the training examples. 

> Alternatively, you can also try to adjust the voice ranges to the sizes described in `MiniBach` in order to have a smaller network. For example, you can ignore outlier examples that exceed the ranges.

The ranges used in this implementation are the following:

| Part   |       Range         |
|--------|---------------------|
|Soprano | A3 to A5 (25 notes) |
|Alto:   | E3 to D5 (23 notes) |
|Tenor:  | C3 to A4 (22 notes) |
|Bass:   | C2 to E4 (29 notes) |

The size of the input vector is therefore 

$$
(25 + 1) (16) (4) = 1664
$$

The size of the output vector is 

$$
((23 + 1) + (22 + 1) + (29 + 1))  (16) (4) = 4928
$$

This function `encode_note` translates the notes and symbols in the dataset to the corresponding numbers we will use in our one-hot encoding. Rests are ignored (as in the description of the book), and `hold` symbols have a special index `0` in the input vector of the neural network.

In [5]:
def encode_note(n, rang):
    if n == '--' or n == 'Rest':
        ret = 0
    else:
        note = music21.note.Note(n)
        ret = ranges[rang][note.pitch.midi]
    return ret

def one_hot_encode(idx, rang):
    length = len(ranges[rang].values())
    ret = [0] * (length + 1)
    ret[idx] = 1
    return ret

In [8]:
x = []
y = []
for chunk in chunks:
    print(f'Procesing {chunk}...')
    dfchunk = df[df.file == chunk]    
    s = dfchunk.soprano.apply(encode_note, args=('soprano',)) 
    xi = np.array([[one_hot_encode(idx, 'soprano') for idx in s]])    
    xi = xi.reshape(-1)    
    a = dfchunk.alto.apply(encode_note, args=('alto',))
    t = dfchunk.tenor.apply(encode_note, args=('tenor',))
    b = dfchunk.bass.apply(encode_note, args=('bass',))    
    ya = np.array([one_hot_encode(idx, 'alto') for idx in a])
    yt = np.array([one_hot_encode(idx, 'tenor') for idx in t])
    yb = np.array([one_hot_encode(idx, 'bass') for idx in b])  
    yi = np.concatenate((ya, yt, yb), axis=None)        
    x.append(xi)
    y.append(yi)    

Procesing chor002.krn_chunk_0...
Procesing chor002.krn_chunk_1...
Procesing chor002.krn_chunk_2...
Procesing chor002.krn_chunk_3...
Procesing chor002.krn_chunk_4...
Procesing chor002.krn_chunk_5...
Procesing chor002.krn_chunk_6...
Procesing chor002.krn_chunk_7...
Procesing chor002.krn_chunk_8...
Procesing chor002.krn_chunk_9...
Procesing chor003.krn_chunk_0...
Procesing chor003.krn_chunk_1...
Procesing chor003.krn_chunk_2...
Procesing chor003.krn_chunk_3...
Procesing chor003.krn_chunk_4...
Procesing chor003.krn_chunk_5...
Procesing chor003.krn_chunk_6...
Procesing chor004.krn_chunk_0...
Procesing chor004.krn_chunk_1...
Procesing chor004.krn_chunk_2...
Procesing chor004.krn_chunk_3...
Procesing chor004.krn_chunk_4...
Procesing chor004.krn_chunk_5...
Procesing chor004.krn_chunk_6...
Procesing chor005.krn_chunk_0...
Procesing chor005.krn_chunk_1...
Procesing chor005.krn_chunk_10...
Procesing chor005.krn_chunk_11...
Procesing chor005.krn_chunk_12...
Procesing chor005.krn_chunk_13...
Proces

Procesing chor029.krn_chunk_7...
Procesing chor029.krn_chunk_8...
Procesing chor029.krn_chunk_9...
Procesing chor030.krn_chunk_0...
Procesing chor030.krn_chunk_1...
Procesing chor030.krn_chunk_2...
Procesing chor030.krn_chunk_3...
Procesing chor030.krn_chunk_4...
Procesing chor030.krn_chunk_5...
Procesing chor030.krn_chunk_6...
Procesing chor030.krn_chunk_7...
Procesing chor030.krn_chunk_8...
Procesing chor031.krn_chunk_0...
Procesing chor031.krn_chunk_1...
Procesing chor031.krn_chunk_2...
Procesing chor031.krn_chunk_3...
Procesing chor031.krn_chunk_4...
Procesing chor031.krn_chunk_5...
Procesing chor031.krn_chunk_6...
Procesing chor032.krn_chunk_0...
Procesing chor032.krn_chunk_1...
Procesing chor032.krn_chunk_2...
Procesing chor032.krn_chunk_3...
Procesing chor032.krn_chunk_4...
Procesing chor032.krn_chunk_5...
Procesing chor032.krn_chunk_6...
Procesing chor032.krn_chunk_7...
Procesing chor032.krn_chunk_8...
Procesing chor033.krn_chunk_0...
Procesing chor033.krn_chunk_1...
Procesing 

Procesing chor060.krn_chunk_0...
Procesing chor060.krn_chunk_1...
Procesing chor060.krn_chunk_2...
Procesing chor060.krn_chunk_3...
Procesing chor060.krn_chunk_4...
Procesing chor060.krn_chunk_5...
Procesing chor060.krn_chunk_6...
Procesing chor060.krn_chunk_7...
Procesing chor060.krn_chunk_8...
Procesing chor061.krn_chunk_0...
Procesing chor061.krn_chunk_1...
Procesing chor061.krn_chunk_10...
Procesing chor061.krn_chunk_11...
Procesing chor061.krn_chunk_12...
Procesing chor061.krn_chunk_2...
Procesing chor061.krn_chunk_3...
Procesing chor061.krn_chunk_4...
Procesing chor061.krn_chunk_5...
Procesing chor061.krn_chunk_6...
Procesing chor061.krn_chunk_7...
Procesing chor061.krn_chunk_8...
Procesing chor061.krn_chunk_9...
Procesing chor062.krn_chunk_0...
Procesing chor062.krn_chunk_1...
Procesing chor062.krn_chunk_2...
Procesing chor062.krn_chunk_3...
Procesing chor062.krn_chunk_4...
Procesing chor062.krn_chunk_5...
Procesing chor063.krn_chunk_0...
Procesing chor063.krn_chunk_1...
Procesi

Procesing chor085.krn_chunk_4...
Procesing chor085.krn_chunk_5...
Procesing chor085.krn_chunk_6...
Procesing chor085.krn_chunk_7...
Procesing chor085.krn_chunk_8...
Procesing chor085.krn_chunk_9...
Procesing chor086.krn_chunk_0...
Procesing chor086.krn_chunk_1...
Procesing chor086.krn_chunk_10...
Procesing chor086.krn_chunk_2...
Procesing chor086.krn_chunk_3...
Procesing chor086.krn_chunk_4...
Procesing chor086.krn_chunk_5...
Procesing chor086.krn_chunk_6...
Procesing chor086.krn_chunk_7...
Procesing chor086.krn_chunk_8...
Procesing chor086.krn_chunk_9...
Procesing chor087.krn_chunk_0...
Procesing chor087.krn_chunk_1...
Procesing chor087.krn_chunk_10...
Procesing chor087.krn_chunk_11...
Procesing chor087.krn_chunk_12...
Procesing chor087.krn_chunk_2...
Procesing chor087.krn_chunk_3...
Procesing chor087.krn_chunk_4...
Procesing chor087.krn_chunk_5...
Procesing chor087.krn_chunk_6...
Procesing chor087.krn_chunk_7...
Procesing chor087.krn_chunk_8...
Procesing chor087.krn_chunk_9...
Proces

Procesing chor114.krn_chunk_8...
Procesing chor115.krn_chunk_0...
Procesing chor115.krn_chunk_1...
Procesing chor115.krn_chunk_2...
Procesing chor115.krn_chunk_3...
Procesing chor115.krn_chunk_4...
Procesing chor115.krn_chunk_5...
Procesing chor115.krn_chunk_6...
Procesing chor115.krn_chunk_7...
Procesing chor115.krn_chunk_8...
Procesing chor117.krn_chunk_0...
Procesing chor117.krn_chunk_1...
Procesing chor117.krn_chunk_2...
Procesing chor117.krn_chunk_3...
Procesing chor117.krn_chunk_4...
Procesing chor117.krn_chunk_5...
Procesing chor117.krn_chunk_6...
Procesing chor117.krn_chunk_7...
Procesing chor117.krn_chunk_8...
Procesing chor118.krn_chunk_0...
Procesing chor118.krn_chunk_1...
Procesing chor118.krn_chunk_2...
Procesing chor118.krn_chunk_3...
Procesing chor118.krn_chunk_4...
Procesing chor118.krn_chunk_5...
Procesing chor118.krn_chunk_6...
Procesing chor118.krn_chunk_7...
Procesing chor119.krn_chunk_0...
Procesing chor119.krn_chunk_1...
Procesing chor119.krn_chunk_10...
Procesing

Procesing chor137.krn_chunk_5...
Procesing chor137.krn_chunk_6...
Procesing chor137.krn_chunk_7...
Procesing chor137.krn_chunk_8...
Procesing chor137.krn_chunk_9...
Procesing chor138.krn_chunk_0...
Procesing chor138.krn_chunk_1...
Procesing chor138.krn_chunk_2...
Procesing chor138.krn_chunk_3...
Procesing chor138.krn_chunk_4...
Procesing chor138.krn_chunk_5...
Procesing chor138.krn_chunk_6...
Procesing chor138.krn_chunk_7...
Procesing chor138.krn_chunk_8...
Procesing chor138.krn_chunk_9...
Procesing chor139.krn_chunk_0...
Procesing chor139.krn_chunk_1...
Procesing chor139.krn_chunk_2...
Procesing chor139.krn_chunk_3...
Procesing chor139.krn_chunk_4...
Procesing chor139.krn_chunk_5...
Procesing chor139.krn_chunk_6...
Procesing chor139.krn_chunk_7...
Procesing chor139.krn_chunk_8...
Procesing chor139.krn_chunk_9...
Procesing chor140.krn_chunk_0...
Procesing chor140.krn_chunk_1...
Procesing chor140.krn_chunk_2...
Procesing chor140.krn_chunk_3...
Procesing chor140.krn_chunk_4...
Procesing 

Procesing chor172.krn_chunk_3...
Procesing chor172.krn_chunk_4...
Procesing chor172.krn_chunk_5...
Procesing chor172.krn_chunk_6...
Procesing chor172.krn_chunk_7...
Procesing chor172.krn_chunk_8...
Procesing chor172.krn_chunk_9...
Procesing chor174.krn_chunk_0...
Procesing chor174.krn_chunk_1...
Procesing chor174.krn_chunk_2...
Procesing chor174.krn_chunk_3...
Procesing chor174.krn_chunk_4...
Procesing chor174.krn_chunk_5...
Procesing chor174.krn_chunk_6...
Procesing chor174.krn_chunk_7...
Procesing chor175.krn_chunk_0...
Procesing chor175.krn_chunk_1...
Procesing chor175.krn_chunk_2...
Procesing chor175.krn_chunk_3...
Procesing chor175.krn_chunk_4...
Procesing chor175.krn_chunk_5...
Procesing chor177.krn_chunk_0...
Procesing chor177.krn_chunk_1...
Procesing chor177.krn_chunk_2...
Procesing chor177.krn_chunk_3...
Procesing chor177.krn_chunk_4...
Procesing chor177.krn_chunk_5...
Procesing chor177.krn_chunk_6...
Procesing chor179.krn_chunk_0...
Procesing chor179.krn_chunk_1...
Procesing 

Procesing chor205.krn_chunk_20...
Procesing chor205.krn_chunk_21...
Procesing chor205.krn_chunk_22...
Procesing chor205.krn_chunk_23...
Procesing chor205.krn_chunk_24...
Procesing chor205.krn_chunk_25...
Procesing chor205.krn_chunk_26...
Procesing chor205.krn_chunk_27...
Procesing chor205.krn_chunk_28...
Procesing chor205.krn_chunk_29...
Procesing chor205.krn_chunk_3...
Procesing chor205.krn_chunk_30...
Procesing chor205.krn_chunk_31...
Procesing chor205.krn_chunk_32...
Procesing chor205.krn_chunk_33...
Procesing chor205.krn_chunk_34...
Procesing chor205.krn_chunk_35...
Procesing chor205.krn_chunk_36...
Procesing chor205.krn_chunk_37...
Procesing chor205.krn_chunk_38...
Procesing chor205.krn_chunk_39...
Procesing chor205.krn_chunk_4...
Procesing chor205.krn_chunk_40...
Procesing chor205.krn_chunk_41...
Procesing chor205.krn_chunk_42...
Procesing chor205.krn_chunk_43...
Procesing chor205.krn_chunk_44...
Procesing chor205.krn_chunk_5...
Procesing chor205.krn_chunk_6...
Procesing chor205.

Procesing chor230.krn_chunk_0...
Procesing chor230.krn_chunk_1...
Procesing chor230.krn_chunk_2...
Procesing chor230.krn_chunk_3...
Procesing chor230.krn_chunk_4...
Procesing chor230.krn_chunk_5...
Procesing chor230.krn_chunk_6...
Procesing chor231.krn_chunk_0...
Procesing chor231.krn_chunk_1...
Procesing chor231.krn_chunk_10...
Procesing chor231.krn_chunk_2...
Procesing chor231.krn_chunk_3...
Procesing chor231.krn_chunk_4...
Procesing chor231.krn_chunk_5...
Procesing chor231.krn_chunk_6...
Procesing chor231.krn_chunk_7...
Procesing chor231.krn_chunk_8...
Procesing chor231.krn_chunk_9...
Procesing chor232.krn_chunk_0...
Procesing chor232.krn_chunk_1...
Procesing chor232.krn_chunk_2...
Procesing chor232.krn_chunk_3...
Procesing chor232.krn_chunk_4...
Procesing chor232.krn_chunk_5...
Procesing chor232.krn_chunk_6...
Procesing chor232.krn_chunk_7...
Procesing chor233.krn_chunk_0...
Procesing chor233.krn_chunk_1...
Procesing chor233.krn_chunk_2...
Procesing chor233.krn_chunk_3...
Procesing

Procesing chor256.krn_chunk_5...
Procesing chor256.krn_chunk_6...
Procesing chor256.krn_chunk_7...
Procesing chor256.krn_chunk_8...
Procesing chor256.krn_chunk_9...
Procesing chor258.krn_chunk_0...
Procesing chor258.krn_chunk_1...
Procesing chor258.krn_chunk_10...
Procesing chor258.krn_chunk_11...
Procesing chor258.krn_chunk_12...
Procesing chor258.krn_chunk_2...
Procesing chor258.krn_chunk_3...
Procesing chor258.krn_chunk_4...
Procesing chor258.krn_chunk_5...
Procesing chor258.krn_chunk_6...
Procesing chor258.krn_chunk_7...
Procesing chor258.krn_chunk_8...
Procesing chor258.krn_chunk_9...
Procesing chor259.krn_chunk_0...
Procesing chor259.krn_chunk_1...
Procesing chor259.krn_chunk_10...
Procesing chor259.krn_chunk_11...
Procesing chor259.krn_chunk_12...
Procesing chor259.krn_chunk_13...
Procesing chor259.krn_chunk_14...
Procesing chor259.krn_chunk_15...
Procesing chor259.krn_chunk_16...
Procesing chor259.krn_chunk_17...
Procesing chor259.krn_chunk_18...
Procesing chor259.krn_chunk_19.

Procesing chor280.krn_chunk_18...
Procesing chor280.krn_chunk_19...
Procesing chor280.krn_chunk_2...
Procesing chor280.krn_chunk_20...
Procesing chor280.krn_chunk_21...
Procesing chor280.krn_chunk_3...
Procesing chor280.krn_chunk_4...
Procesing chor280.krn_chunk_5...
Procesing chor280.krn_chunk_6...
Procesing chor280.krn_chunk_7...
Procesing chor280.krn_chunk_8...
Procesing chor280.krn_chunk_9...
Procesing chor281.krn_chunk_0...
Procesing chor281.krn_chunk_1...
Procesing chor281.krn_chunk_2...
Procesing chor281.krn_chunk_3...
Procesing chor281.krn_chunk_4...
Procesing chor281.krn_chunk_5...
Procesing chor281.krn_chunk_6...
Procesing chor281.krn_chunk_7...
Procesing chor281.krn_chunk_8...
Procesing chor282.krn_chunk_0...
Procesing chor282.krn_chunk_1...
Procesing chor282.krn_chunk_2...
Procesing chor282.krn_chunk_3...
Procesing chor282.krn_chunk_4...
Procesing chor282.krn_chunk_5...
Procesing chor282.krn_chunk_6...
Procesing chor282.krn_chunk_7...
Procesing chor282.krn_chunk_8...
Proces

Procesing chor310.krn_chunk_1...
Procesing chor310.krn_chunk_2...
Procesing chor310.krn_chunk_3...
Procesing chor310.krn_chunk_4...
Procesing chor311.krn_chunk_0...
Procesing chor311.krn_chunk_1...
Procesing chor311.krn_chunk_2...
Procesing chor311.krn_chunk_3...
Procesing chor311.krn_chunk_4...
Procesing chor311.krn_chunk_5...
Procesing chor311.krn_chunk_6...
Procesing chor311.krn_chunk_7...
Procesing chor311.krn_chunk_8...
Procesing chor312.krn_chunk_0...
Procesing chor312.krn_chunk_1...
Procesing chor312.krn_chunk_10...
Procesing chor312.krn_chunk_11...
Procesing chor312.krn_chunk_12...
Procesing chor312.krn_chunk_2...
Procesing chor312.krn_chunk_3...
Procesing chor312.krn_chunk_4...
Procesing chor312.krn_chunk_5...
Procesing chor312.krn_chunk_6...
Procesing chor312.krn_chunk_7...
Procesing chor312.krn_chunk_8...
Procesing chor312.krn_chunk_9...
Procesing chor313.krn_chunk_0...
Procesing chor313.krn_chunk_1...
Procesing chor313.krn_chunk_2...
Procesing chor313.krn_chunk_3...
Procesi

Procesing chor339.krn_chunk_5...
Procesing chor340.krn_chunk_0...
Procesing chor340.krn_chunk_1...
Procesing chor340.krn_chunk_2...
Procesing chor340.krn_chunk_3...
Procesing chor340.krn_chunk_4...
Procesing chor340.krn_chunk_5...
Procesing chor340.krn_chunk_6...
Procesing chor340.krn_chunk_7...
Procesing chor340.krn_chunk_8...
Procesing chor341.krn_chunk_0...
Procesing chor341.krn_chunk_1...
Procesing chor341.krn_chunk_10...
Procesing chor341.krn_chunk_11...
Procesing chor341.krn_chunk_12...
Procesing chor341.krn_chunk_13...
Procesing chor341.krn_chunk_2...
Procesing chor341.krn_chunk_3...
Procesing chor341.krn_chunk_4...
Procesing chor341.krn_chunk_5...
Procesing chor341.krn_chunk_6...
Procesing chor341.krn_chunk_7...
Procesing chor341.krn_chunk_8...
Procesing chor341.krn_chunk_9...
Procesing chor342.krn_chunk_0...
Procesing chor342.krn_chunk_1...
Procesing chor342.krn_chunk_2...
Procesing chor342.krn_chunk_3...
Procesing chor342.krn_chunk_4...
Procesing chor342.krn_chunk_5...
Proces

Procesing chor367.krn_chunk_0...
Procesing chor367.krn_chunk_1...
Procesing chor367.krn_chunk_2...
Procesing chor367.krn_chunk_3...
Procesing chor367.krn_chunk_4...
Procesing chor367.krn_chunk_5...
Procesing chor367.krn_chunk_6...
Procesing chor367.krn_chunk_7...
Procesing chor367.krn_chunk_8...
Procesing chor369.krn_chunk_0...
Procesing chor369.krn_chunk_1...
Procesing chor369.krn_chunk_2...
Procesing chor369.krn_chunk_3...
Procesing chor369.krn_chunk_4...
Procesing chor369.krn_chunk_5...
Procesing chor369.krn_chunk_6...
Procesing chor369.krn_chunk_7...
Procesing chor369.krn_chunk_8...
Procesing chor370.krn_chunk_0...
Procesing chor370.krn_chunk_1...
Procesing chor370.krn_chunk_2...
Procesing chor370.krn_chunk_3...
Procesing chor370.krn_chunk_4...
Procesing chor370.krn_chunk_5...
Procesing chor370.krn_chunk_6...
Procesing chor370.krn_chunk_7...
Procesing chor370.krn_chunk_8...
Procesing chor370.krn_chunk_9...
Procesing chor371.krn_chunk_0...
Procesing chor371.krn_chunk_1...
Procesing 

In [9]:
x = np.array(x)
y = np.array(y)

In [10]:
np.save('input.npy', x)
np.save('output.npy', y)

We have written the given (soprano) and accompanying (alto, tenor, and bass) melodies as one-hot encoded vectors. 

These vectors can be easily applied as training data for the neural network. 

At this point, the values of those one-hot encoded representations are barely recognizable as music information. They are mostly large arrays of 1s and 0s.

In the next part, we use them to train the MiniBach network.