Datasets in dataset_cache #63

alisawuffles · 2019-11-21T18:53:35Z

We noticed that only the dataset in dataset_cache/tensor_datasets/ is required to train the model and generate new chorales. However, the provided dataset in tensor_datasets/ in the zip file is named ChoraleDataset([0],bach_chorales,['fermata', 'tick', 'key'],8,4), indicating that it only contains the soprano voice.

If this dataset is used for training, should it not contain all four voices? Otherwise if it is used to fix the soprano part at generation time, it seems from our manual observation of the generated chorales that all notes are being sampled, and the soprano part are not real Lutheran melodies.

Also, what is the difference in purpose between the datasets in the datasets/ and tensor_datasets/ folder?

Thank you so much!

The text was updated successfully, but these errors were encountered:

Ghadjeres · 2019-11-22T10:34:30Z

Hi,

Yes you're right. It seems that I put the wrong ChoraleDataset in tensor_dataset.... Sorry for that. So you'll have to recreate it if you want to train a new model (This takes some time because of all the transpositions together with the key analyzer of music21).
The difference between datasets/ and tensor_datasets/ is that datasets/ contains metadata about the dataset (size of the sequences, voices used, etc.) and tensor_dataset/ is only the tensor of size (num_examples, num_voices, chorale_length) + the metadata tensor (containing fermata indications). The idea was that when you want to generate, you don't necessarily need to load the whole dataset (which takes time and space) so you can only use information present in datasets/.

Maybe you can find the correct ChoraleDataset in the docker, but I'm not sure.

Best,

alisawuffles · 2019-11-22T17:52:16Z

No worries! I recreated the dataset, which was very straight-forward to do with the provided code. :) By the way, there were a bunch of KeyError with chorale 309 printed, but I assume this is some unimportant problem with the key analyzer, as a comment in your code indicates. Do you remember seeing something similar? Training a model on this dataset and then generating chorales still produced good results, so I assume the dataset was constructed fine.

Oh, the reason for saving two different datasets in datasets/ and tensor_datasets/ makes complete sense. Thank you so much for explaining that!

Ghadjeres · 2019-12-02T11:07:48Z

Yes, I have exactly the same error with one of the chorales because of the key analyzer. This particular chorale will be just skipped and won't appear in the dataset. So no worries!
Best

Ghadjeres closed this as completed Dec 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets in dataset_cache #63

Datasets in dataset_cache #63

alisawuffles commented Nov 21, 2019

Ghadjeres commented Nov 22, 2019

alisawuffles commented Nov 22, 2019 •

edited

Loading

Ghadjeres commented Dec 2, 2019

Datasets in dataset_cache #63

Datasets in dataset_cache #63

Comments

alisawuffles commented Nov 21, 2019

Ghadjeres commented Nov 22, 2019

alisawuffles commented Nov 22, 2019 • edited Loading

Ghadjeres commented Dec 2, 2019

alisawuffles commented Nov 22, 2019 •

edited

Loading