## Install dependencies

In [1]:
import ddsp
import tensorflow as tf
import numpy
import matplotlib
import os
import glob
from data.abc import ABCPreProcessor

## Overview of Preprocessing steps

In this notebook, we will be preprocessing two types of data --> **ABC Notation** data and **Audio** data

### ABC Notation Data

- Strip away **Tune body**, **key**, **meter** and store all other fields of an ABC track as metadata
- Use key and meter as conditioning symbols when generating a tune
- Tokenize according to vocabulary of musical transcription tokens

- Create a TFRecord Dataset consisting sequence examples like --> **[ One-hot encoded tune body, meter, key ]**

### Audio Data
- Turning the full audio into short examples (4 seconds by default, but adjustable with flags)
- Inferring the fundamental frequency (or "pitch") with CREPE
- Computing the loudness features

- Create TFRecord Dataset consisting sequence examples like --> **[ Audio, f0_feature, loudness_feature ]**

#### Each tune be indexed such that using its ID, we can find its ABC Notation as well as related audio files
- A tune can be associated with more than one audio file (Different audio lengths!!)

At the end of the file, we should merge both the datasets, to obtain a single TFRecord file containing preprocessed ABC data and preprocessed audio files indexed according to the different tunes

## Initialize common variables

In [2]:
# Mention the path to the datastore
BASE_DIR = "/home/richhiey/Desktop/workspace/projects/AI_Music_Challenge_2020/"
ABC_DATA_DIR = os.path.join(BASE_DIR, "datasets", "abc_data")
AUDIO_DATA_DIR = os.path.join(BASE_DIR, "datasets", "audio")
ABC_TFRECORD_DIR = os.path.join(BASE_DIR, "tfrecords", "abc.tfrecord")
AUDIO_TFRECORD_DIR = os.path.join(BASE_DIR, "tfrecords", "audio.tfrecord")

## Preprocessing - ABC Notation Dataset

In [3]:
preprocessor = ABCPreProcessor(ABC_DATA_DIR)
preprocessor.process()
preprocessor.calculate_statistics()
preprocessor.save_as_tfrecord_dataset()

------ 0. /home/richhiey/Desktop/workspace/projects/AI_Music_Challenge_2020/datasets/abc_data/hnj2.abc ------
103
------------------- Extracted Tune --------------------

------------------- Extracted Tune --------------------

------------------- Extracted Tune --------------------

------------------- Extracted Tune --------------------
ABA DED | dcA AGE | ABA ABc | dAB cde |ABA ~D3 | dcA AGE | GFG Ade |1 fdc d2c :|2 fdc d2e |||: f2d edc | Add cde | fed edc | ABc def |~g3 age | dcA AGE | GFG Ade |1 fdc d2e :|2 fdc d2c |||: ~A3 D2d | d=cA AGE | ~A3 ABc | dAB cde |~A3 D2d | d=cA AGE | ~G3 Ade |1 fdc d2c :|2 fdc d2e ||fed edc | AdB cde | fed efg | ABc def |g2a age | d=cA AGE | GFG Ade | fdc d2e |fed edc | AdB cde | ~f3 gec | ABc def |g2a age | d=cA AGE | ~G3 Ade | fdc d2c |||: ~A3 DED | dcA AGE | ~A3 A2B | =c2A BAG |~A3 DED | dcA AGE | ~G3 Ade |1 fdc d2c :|2 fdc d2e |||: fed edc | ~A3 cde | fed edc | ABc def |~g3 age | dcA AGE | ~G3 Ade |1 fdc d2e :|2 fdc d2c ||
------------------- Extr

------------------- Extracted Tune --------------------
B|AF~F2 ~E3F|~F2DE FA~A2|Be~e2 fedB|ABBA d2AD|EFED ~E3F|~F2DE FA~A2|Be~e2 fedB|AFED ~E3A||~F3D ~E3F|~F2DE FA~A2|Be~e2 fedB|ABBA d2FD|EF~F2 ~E3F|~F2DE FA~A2|Be~e2 fedB|AFED ~E3z||fe~e2 fedB|fede fbaf|fe~e2 fedB|~A3B dABd|fe~e2 dB~B2|~f3e fa~a2|bf~f2 afed|Bd~d2 de~e2|fe~e2 fedB|~f3e fbaf|fe~e2 fedB|AB~B2 d2 (3Bcd|fe~e2 dB~B2|~f3e fa~a2|bf~f2 ~a3b|afed e3||
------------------- Extracted Tune --------------------
AF~F2 AFdB | AF~F2 ~G3B | AF~F2 ABdf |1 efge fedB :|2 efge fddc |||: d2de faaf | eA{c}BA eA{c}BA |1 dcde faaf | gfeg fddc :|2 ~f3a g2fg | afge fedB |||: AF~F2 AFdB | ADFA ~G3B | A2FA ABdf |1 efge fedB :|2 efge fedc |||: d2df ~a3f | eA{c}BA eA{c}BA |1 dcdf ~a3g | fgeg fedc :|2 ~f3g ~g2fg | gffe fedB ||
------------------- Extracted Tune --------------------
eA~A2 e2dc|BAGA Bcdg|eA~A2 e2dc|BAGA BA~A2|eA~A2 e2dc|BAGA Bcd2|~e3f gaaf|gedB A2 (3Bcd:|eaag egdc|BAGA Bcdg|eaag egdc|BAGA BA~A2|eaag egdc|BAGA Bcd2|~e3f gaaf|gedB A2 (3Bc

~d3 edc|dAF DFA|~G3 EFG|ABc dfe|~d3 edc|dAF DFA|~G3 EAG|FDD D3:||:~d3 ~a3|dag afd|B2g gfg|Beg bge|~d3 ~a3|dag a2b|afd gec|dAF D3:|
------------------- Extracted Tune --------------------
fAA fAA|BAG FGE|~D3 AFA|Bcd ede|fAA fAA|BAG FGE|~D3 AFA|dfd e2d:||:~f3 gfg|afd ede|fef gfg|afd e2d|~f3 gfg|afd edB|~A3 AFA|dfd e2d:|
------------------- Extracted Tune --------------------
FED EFG|AdB cAG|~A3 BAG|FAF GED|FED EFG|A2d cAG|FAF GEA|1 ~D3 D2E:|2 ~D3 D3|||:d2e fed|efd cAG|~A3 BAG|FAF GED|d2e fed|efd cAG|FAF GEA|~D3 D3:||:~D3 c3|AdB cAG|AB^c ded|ded cAF|~D3 c3|AdB cAG|FAF GEA|~D3 D3:||:d2e fdd|Add fdd|^c2d eAA|fAA eAA|d2e fdd|Add ^cde|faf ge^c|1 ded d2A:|2 ded d2e|||:fed ed^c|ded =cAG|~A3 BAG|FAF GED|1 fed ed^c|ded =cAG|FAF GEA|~D3 D2e:|2 fef gfg|afd =cAG|FAF GEA|~D3 D2E||
------------------- Extracted Tune --------------------
GBG FAF|GEE EFE|DFA dAG|FDD DEF|GFG AGA|BAB ~g3|edB BAF|GEE E2F:||:~G3 dBG|GAB dBG|~A3 ecA|AB/c/d ecA|GFG dBG|efg gfg|edB BAF|1 GEE E2F:|2 GEE E3|||:efe edB|def g2e|fd

------------------- Extracted Tune --------------------
DFA BFA|Bcd AFA|DFA BFA|B2c d2D|DFA BFA|Bcd AFA|fed edc|1 B2c d2D:|2 B2c d2e|||:fed edc|Bcd AFA|fed edc|B2e e2g|[1 fed edc|Bcd AFA|~D3 AFA|B2c d2e:|[2 afd gec|dcB AFA|~D3 AFA|B2c d2D||
------------------- Extracted Tune --------------------
FED A2D|B2D A2D|F2A A2F|E3 E2G|FED A2D|B2D A2A|Bcd e2c|1 d2B A2G:|2 d3 d2B||A2d f2d|e2d f2d|A2d f2d|efe d2B|A2d f2d|e2d e2f|gfe f2d|B3 A2F|A2d f2d|e2d f2d|A2d f2d|e2d e2f|gfe f2d|e2d B2A|Bcd e2c|d2B A2G||
------------------- Extracted Tune --------------------
A2G|:F2D ~D3|A2G F2D|~A3 B2G|A3 A2G|F2D ~D3|A2G F2D|G2B A2F|1 G3 A2G:|2 G3 G2g|||:f2d d2e|f2d cAG|~A3 B2G|A3 A2g|f2d d2e|1 f2d c2A|G2B A2F|G3 G2g:|2 f2g a2g|f2d c2A|G3||
------------------- Extracted Tune --------------------
f2a g2e|fed B2A|def agf|e3 a2g|f2a g2e|fed B2A|dcd f2e|1 d3 d2e:|2 d3 d2f|||:e3 efg|a3 agf|e2e efe|d2B BAB|e3 efg|a3 agf|efe dcB|1 A3 A2f:|2 A3 a2g||
------------------- Extracted Tune --------------------
ABA A2F|DE

------------------- Extracted Tune --------------------
F2 | E4D2 | E4F2 | B4A2 | F4F2 | E4D2 | B,4A,2 | B,6- | B,4A2 | B4c2 | d4c2 | (B3A)F2 | A3(Bc2) | B3(AF2) | D4E2 | F6- | F4A2 | B4c2 | d4c2 | B3AF2 | A3(Bc2) | B3(AF2) | D4E2 | F6- | F4F2 | E4D2 | E4F2 | B4A2 | F4F2 | E4D2 | B,4A,2 | B,6- | B,4 ||
------------------- Extracted Tune --------------------
DE | FF FF | F<A AF | FE EE | E2A>A | BG dc | BA FD | ED D>E | D2DE | FF FF | FA AF | FE EE | E2A>A | BG d>c | BA FD | ED D>E | D2 ||DE | FF F>F | (FA) A>F | FE E>E | E2AA | BG d>c | BA FD | ED D>E | D2 ||
------------------- Extracted Tune --------------------
(DE) | G2 G2 (AG) | (GE) D4- | D4 (Bd) | e2 e2 (dB) | d4(Bd) | e2 (ed) (BA) | G2 E2 (DE) | (GB) (AG) (EG) | D4 :|DE | G2 G2 (AG) | (GE) D4- | D4 Bd | e2 e2 dB | d4Bd | e2 (ed) (BA) | G2 E2 DE | (GB) (AG) (EG) | D4 ||
------------------- Extracted Tune --------------------
FG | A3 GFE | A,/D/ D3Ac | d2 dA ce | d3AAB | c3 BAD | FG2FFG | A3 (G/F/) E(C/E/) | D4 ||
---------------

------------------- Extracted Tune --------------------
E<G | A2A2 B>AG<A | B2A2 A2e>d | B2A>G e2d<B | d2G2 G2E<G |A2A2 B>AG<A | B2A2 A2e>d | B2A>G e2d<B | e2A2 A2 :||: B<d | e2d<e g2e<g | a2e<a g3f | e2d<e g2f>g |1 e>dB>A G>AB<d |e2d<e g2e<g | a2e<a g3f | g>ef>d g>ed<B | e2A2 A2 :|[2 e>dB<A G>AB<G | c2B<c d2c<d | e2d<e a3f | g>ef>d g>ed<B | e2A2 A2 ||
------------------- Extracted Tune --------------------
D2E F2A | BAF E2F | D2E F2D | FEE EFE |D2E F2A | BAF E2F | D2E F2D |1 EDD D2D :|2 EDD DFA |||: B3 B3 | BcB BAF | A2A ABc | BAF ABc |de/f/d Bc/d/B | AFD E2F | D2E F2D |1 EDD DFA :|2 EDD D2D ||
------------------- Extracted Tune --------------------
A2e2 e2dB | d2ef gea2 | A2e2 e2dB | g2ed B2AG |A2e2 e2dB | d2ef gea2 | gedB g2ed | B2A2 A2BG :||: A2a2 a2ge | d2ef gea2 | A2a2 a2ge | dged B2AG |A2a2 a2ge | d2ef gea2 | gedB g2ed | B2A2 A2BG :||: A2e2 e2dB | d2ef gea2 | A2e2 e2dB | (3efg ed BAGB |A2e2 e2dB | d2ef gea2 | eged (3efg ed | B2A2 A2BG :||: A2a2 a2ge | d2ef gea2 | A2a2 a2ge | dge

103
------------------- Extracted Tune --------------------

------------------- Extracted Tune --------------------

------------------- Extracted Tune --------------------

------------------- Extracted Tune --------------------
gdBd edBA|G2Bd edBd|gdBd edef|gefd efga|bggd edBA|G2Bd edBd|efgb agef|1 gbaf g2gf:|2 gbaf g2ga||be~e2 edBd|e2Be dega|be~e2 edBd|efgb a2ef|~g3d edBA|G2Bd edBd|efgb agef|1 gbaf g2ga:|2 gbaf g2gf||
------------------- Extracted Tune --------------------
D2DE FDFA | DFAF EDB,E | D2DE FDFA |1 (3Bcd AF EFDE :|2 (3Bcd AF EFD2 || |: d2dc dAFA | DFAd egfe | d2dc dAFA | (3Bcd AF EFD2 :| a2fa g2eg | fedf edBd | a2af g2ge | fedf ecdf | ~a3f g2eg | fedf edBe | dfaf g2eg | fdec dcdB || |: Adfd efge | fedf edBd | Adfd efge |1 fedf e2dB :|2 agec dBAF || |: D3 FDFA | DFAF EDB,A, | D2DE FDFA |1 (3Bcd AF EFDE :|2 (3Bcd AF EFD2 || |: d2dc dAFA | DFAd egfe | d2dc dAFA | (3Bcd AF EFD2 :| ~a3f ~g3e | fedf edBd | a2af g2ge | fedf eAdf | ~a3f ~g3e | fedf edBe | dfaf g2ag | fdec dcdB 

------------------- Extracted Tune --------------------
D2 | D2 G2 (GF) | D4 FG | A2 (AG) FG | A4FG | A2 (AB) ^c2 | d2 D2 DE | F2 (GF) D2 | C4DE | F2 E2 D2 | d4 fd | (cA) (FG) (AB) | c4FG | A2 G2 A2 | (GF) D2 (FG) | A2 (CD) E2 | D4 ||
------------------- Extracted Tune --------------------
D D2E | G2G A2A | A2G- G2d | d2B d2e | e2d- dBd | e2e g2f | e2d B2G | AAB A2G | E3D2E | G2G A2A | A2G- G2d | d2B d2e | e2d- dBd | e2e g2f | e2d B2G | ABA G2F | G4- G ||D | GGG A2A | A2G- G2d | d2B d2e | e2d- dBd | eee g2f | e2d B2G | AAB A2G | E3D2E | G2G A2A | A2G- Gdd | d2B d2e | e2d- dBd | e2e g2f | e2d B2G | ABA G2F | G2 ||
------------------- Extracted Tune --------------------
F2D2 D2C2 D4 | F2D2 D2F2 (EF)G2 |F2D2 D2C2 D4 | E3D C2G2 (EF)G2 |D2d2 d2c2 d4 | A3d d2e2 (fe)d2 |A2d2 (dc)B2 c4 | G3E C2G2 E2G2 :|
------------------- Extracted Tune --------------------
(dc) | A3 G F2 | D D3 ((3DEF) | G4 ((3AGF) | G4FG | A2 B2 (cB) | A2 G2 GA | F D3-D2 | C4(dc) | A3 G F2 | D4 ((3DEF) | G2 A2 (GF) | G4(FG

------------------- Extracted Tune --------------------
Bc|dBGB cAFA|~G3F GABc|dg~g2 eg~g2|dBGB A2Bc|dBGB cAFA|~G3F GABc|dg~g2 ecAF|G2GF G2:||:Bc|dg~g2 eg~g2|dg~g2 edBc|dg~g2 eg~g2|dBGB A2Bc|dBGB cAFA|~G3F GABc|dg~g2 ecAF|G2GF G2:||:z2|dBGD cAFD|~G3D GABc|dg~g2 effe|dGGB A2Bc|dB~B2 cBBA|G2GD GABc|d2g2 ecAF|GBAF G2:||:z2|dg~g2 effe|dg~g2 effe|dg~g2 effe|d^c=cB A2Bc|dB~B2 cBBA|~G3D GABc|d2~g2 ecAF|GBAF G2:|
------------------- Extracted Tune --------------------
(3ABc|dAFA GBAG|FEFA dFGE|FAdc dcBA|BGEF GABc|dAFA GBAG|FEFA dFGE|FAdf gecd|(3efe dc d2:||:de|fddc dfaf|edcd efge|fddc dfaf|(3gfe (3dcB ADFA|BG~G2 BG (3Bcd|AF (3ABc defd|Afed Bgec|dfec d2:|
------------------- Extracted Tune --------------------
EG |: A2AB A2Bd | efed B2AG | EGGF G2GB | BAGB dBAG |A2AB A2Bd | efed B2d2 | (3efg fa gedB |1 BAAB AGEG :|2 BAAG A2AB |||: cBcd c2AG | cdef gedB | cBAc e2ed | cdec dBGB |cBcd c2AG | cdef gedg | eaag (3efg dB | BAAG A2AB :||: cdcB ABAG | ABcd (3efg dc | BG~G2 dG~G2 | cdec dBGB |cdcB ABAG |

D2FA D2FA|dfed cABc|d2cA (3Bcd AF|GBAF GFEF|D2FA D2FA|dfed cABc|d2cA (3Bcd AF|GFEG FDD2:||:a2fd faaf|gfed (3Bcd ef|a2fd faaf|bg (3efg fddf|a2fd faaf|gfed cdef|afge fded|cAGE EDD2:||:D2FA DAFA|dfed cABc|d2cA (3Bcd AF|~G3F GBAF|D2FA DAFA|dfed cABc|d2cA (3Bcd AF|GBAG FDD2:||:a2fa dafa|gfed (3Bcd A2|a2fa dafa|bg (3efg fddf|a2fa dafa|gfed BdA2|agec dBAF|GBAG FDD2:|
------------------- Extracted Tune --------------------
G2DG BGBd|g2dg fdcA|BG~G2 FGAF|GBAG FDEF|G2DG BGBd|g2dg fdcA|BG~G2 FGAg|1 fdcA GDEF:|2 fdcA G2Bd|||:g2dg bgaf|g2dg fdcA|BG~G2 AGFD|1 FGAg fdef|g2dg bgaf|g2dg fdcA|BG~G2 FGAg|fdcA G2Bd:|2 FGAg fdcA|BG~G2 FGAf|g2eg fgag|bg~g2 agfg|fdcA GDEF||
------------------- Extracted Tune --------------------
EAAG A2Bd|eA~A2 BAGE|G2DG EGDE|GABd edBG|EAAG A2Bd|eA~A2 BAGE|GABd gedB|1 (3ABc BG ABAG:|2 (3ABc BG ABcd|||:eaag egfg|eA~A2 edBe|d2BG AGEF|GABd g2fg|eaag egfg|eA~A2 BAGE|GABd gedB|(3ABc BG ABcd:||:eaag a2ga|bg~g2 abge|dggf g2eg|dB~B2 GABd|eaag a2ga|bg~g2 abge|dB~B2 gedB|1 (3ABc BG AB

(E/F/G).E FDD|FDd cAG|(E/F/G).E FDD|FDd (A2G)|(E/F/G).E FDD|FDd cde|({g}f)ed efg|({g}f)ed cAG:||:Ecc Add|Acc cAG|Ecc Add|edc (TA2G)|Ecc Add|Acc cAG|Ecc efg|edc (TA2G):|
------------------- Extracted Tune --------------------
ABc|dcd ABc|d2A AGE|dcd ABc|def gfe|dcd ABc|d2A AGE|GAG EDE|c3:||:ABc|d2e fed|ecA AcA|d2e fed|(f<a)f gfe|d2e fed|ecA AGF|GFG EDE|c3:||:ABc|(Td2A) (Tc2A)|(Td2A) ABc|(Td2A) (Tc2A)|def gfe|(Td2B) (T=c2A)|(TB2G) (TA2F)|GAG EDE|c3:|
------------------- Extracted Tune --------------------
EAA cAA|eAA cAA|EAA cAA|GAB BGE|EAA cAA|eAA cAA|aga ecA|BAA A3:||:aga ecA|cAc efg|aga ecA|BAF Ace|aga ecA|cAc efg|aga ecA|BAA A3:||:EAA cee|dff cee|EAA cee|BAF AFE|EAA cee|dff cee|aga ecA|BAA A3:|
------------------- Extracted Tune --------------------
D3 FDF|ECE (TG2E)|D>ED F2G|ABc BGE|D3 FDF|ECE (TG2E)|AGE ({d}c2)A|[1GEC CDE:|[2GEC EFG|||:A2G FEF|DEF (TG2E)|A2G FEF|ABc BGE|A2G FEF|DEF(TG2E)|AGE ({d}c2)A|[1GEC CDE:|[2GEC EFG|||:dAG FDF|ECE (TG2E)|dAG F2G|ABc BGE|dAG FDF|ECE (TG2E)|AGE 

------------------- Extracted Tune --------------------
AB | c2 e3 c | B2 d3 B | A2 G4- | G4 AB |c2 e3 c | B2 d3 B | A6- | A4 AB | c2 e3 c | B2 g3 B | A2 G4- | G4 gf |ed cB AG | E2 A3 B | A6- | A4 :||: AB | c3 def | g3 edB | A2 G4- | G4 AB |c3 def | g2 e2 g2 | a6- | a4 ba | g3 edB | d3 BdB | A2 G4- | G4 gf |ed cB AG | E2 A3 B | A6- | A4 :|
------------------- Extracted Tune --------------------
A2 A2 Bc | d2 d2 d/e/f | e2 cABc | d4 f>d |c2 BA GF | G2 AB AG | F2 D2 DF | D4 ||DE | F2 G2 AB | A/G/F D2 d/e/f | e2 cABc | d4 f>d |c2 BA GF | G2 AB AG | F2 D2 DF | D6 ||
------------------- Extracted Tune --------------------
GA | B>A BD G3G | Gg gd g3 d/c/ | B>D GA B2 AG | E6 GA |B>A BD G3G | Gg g>d g3 d/c/ | BD GA B2 A>G | G6 d2 ||dB de g3e | ed ge d3 d/c/ | BD GA B2 AG | E6 GA |B>A BD G3G | Gg gd g3 d/c/ | BD GA B2 AG | G6 ||
------------------- Extracted Tune --------------------
B=c | B2EF G2AG | F2EF D2B,A, | B,EEF GABc | d2cB ABcd |B2EF G2AG | F2EF D2B,A, | B,EEF GABA |1 GEFD E2 :|2 GEFD