Load Primus Dataset

https://github.com/OMR-Research/tf-end-to-end
https://grfia.dlsi.ua.es/primus/
https://mdpi-res.com/d_attachment/applsci/applsci-08-00606/article_deploy/applsci-08-00606-v3.pdf end to end neurla optical music recognition of monophonic scores
https://mediatum.ub.tum.de/doc/1292048/file.pdf Labelling unsegmented sequence data with RNNs
https://apacha.github.io/OMR-Datasets/ optical music recognition curated datasets
1) Get Primus Dataset
2) ETL and get model to work (semantic model)
3) Try on sample inputs
4) Get labelled data/create expert system to label inputs with fingering
5) train model(bilstm/transformer) with inputs
6) test model
7) create functionality to transform images taken from different angles as flat
8) create app to scan image with phone and feed into entire system
9) test entire system

TODO:
1) Makefile to download:
    semantic model into SEMANTIC_MODEL_PATH folder (https://grfia.dlsi.ua.es/primus/models/PrIMuS/Semantic-Model.zip)
    vocabulary_semantic.txt into SEMANTIC_MODEL_PATH folder (https://github.com/OMR-Research/tf-end-to-end/raw/master/Data/vocabulary_semantic.txt)
2) need to change tf reset_default_graph in SemanticGenerator to accomodate loading a second model in tf (https://stackoverflow.com/questions/41990014/load-multiple-models-in-tensorflow)

In [41]:
import cv2
import os
import numpy as np
import tensorflow as tf
import tensorflow.compat.v1 as tfc

from PIL import Image, ImageFont, ImageDraw

tfc.disable_eager_execution()

In [14]:
SEMANTIC_MODEL_FOLDER = '../models/Semantic-Model'
SEMANTIC_MODEL_PATH = SEMANTIC_MODEL_FOLDER + '/semantic_model.meta'
SEMANTIC_VOCABULARY_PATH = SEMANTIC_MODEL_FOLDER + '/vocabulary_semantic.txt'

In [56]:
# Utility functions
# Copied from ctc_utils on https://github.com/OMR-Research/tf-end-to-end/blob/master/ctc_utils.py
def normalize(image):
    return (255. - image) / 255.

def resize(image, height):
    width = int(float(height * image.shape[1]) / image.shape[0])
    sample_img = cv2.resize(image, (width, height))
    return sample_img

def sparse_tensors_to_strs(sparse_tensor):
    indices = sparse_tensor[0][0]
    values = sparse_tensor[0][1]
    dense_shape = sparse_tensor[0][2]

    strs = [[] for _ in range(dense_shape[0])]

    string = []
    ptr = 0
    b = 0

    for idx in range(len(indices)):
        if indices[idx][0] != b:
            strs[b] = string
            string = []
            b = indices[idx][0]

        string.append(values[ptr])
        ptr += 1

    strs[b] = string
    return strs

In [96]:
class SemanticGenerator:
    def __init__(self):
        tfc.reset_default_graph()
        self.session = tfc.InteractiveSession()
        self.vocab_list = None
        with open(SEMANTIC_VOCABULARY_PATH, 'r') as vocab_file:
            self.vocab_list = vocab_file.read().splitlines()
        saver = tfc.train.import_meta_graph(SEMANTIC_MODEL_PATH)
        saver.restore(self.session, SEMANTIC_MODEL_PATH[:-5])
        
        graph = tfc.get_default_graph()
        
        self.input = graph.get_tensor_by_name("model_input:0")
        self.seq_len = graph.get_tensor_by_name("seq_lengths:0")
        self.rnn_keep_prob = graph.get_tensor_by_name("keep_prob:0")
        self.height_tensor = graph.get_tensor_by_name("input_height:0")
        self.width_reduction_tensor = graph.get_tensor_by_name("width_reduction:0")
        self.logits = tfc.get_collection("logits")[0]
        
        # Constants that are saved inside the model itself
        self.WIDTH_REDUCTION, self.HEIGHT = self.session.run([self.width_reduction_tensor, self.height_tensor])
        
        self.decoded, _ = tf.nn.ctc_greedy_decoder(self.logits, self.seq_len)
    
    def map_output(self, vec):
        return [s for s in map(lambda x: self.vocab_list[x], vec)]
    
    def predict(self, img_file):
        image = Image.open(img_file).convert('L')
        image = np.array(image)
        image = resize(image, self.HEIGHT)
        image = normalize(image)
        image = np.asarray(image).reshape(1, image.shape[0], image.shape[1], 1)
        
        seq_lengths = [image.shape[2] / self.WIDTH_REDUCTION]
        
        prediction = self.session.run(self.decoded, feed_dict = {
            self.input: image,
            self.seq_len: seq_lengths,
            self.rnn_keep_prob: 1.0,
        })
        
        # predictions is of shape (1, n) where n is number of predictions
        predictions = sparse_tensors_to_strs(prediction)
        return self.map_output(predictions[0])
    

In [94]:
def main():
    semantic_generator = SemanticGenerator()
    
    img = '../images/test1.png'
    print(semantic_generator.predict(img))

In [97]:
main()

'model_variables' collection should be of type 'byte_list', but instead is of type 'node_list'.
INFO:tensorflow:Restoring parameters from ./models/Semantic-Model/semantic_model
['clef-G2', 'keySignature-FM', 'timeSignature-3/2', 'note-A4_quarter', 'note-Bb4_quarter', 'note-A4_quarter', 'note-G4_quarter', 'note-F4_eighth', 'note-G4_eighth', 'barline', 'note-F4_quarter.', 'note-F4_quarter', 'note-F4_eighth', 'barline', 'note-F4_quarter', 'note-Bb4_quarter', 'note-Bb4_quarter', 'note-Bb4_quarter', 'note-C5_quarter', 'note-D5_quarter', 'barline', 'note-C5_half.', 'barline']
