Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets in dataset_cache #63

Closed
alisawuffles opened this issue Nov 21, 2019 · 3 comments
Closed

Datasets in dataset_cache #63

alisawuffles opened this issue Nov 21, 2019 · 3 comments

Comments

@alisawuffles
Copy link

We noticed that only the dataset in dataset_cache/tensor_datasets/ is required to train the model and generate new chorales. However, the provided dataset in tensor_datasets/ in the zip file is named ChoraleDataset([0],bach_chorales,['fermata', 'tick', 'key'],8,4), indicating that it only contains the soprano voice.

If this dataset is used for training, should it not contain all four voices? Otherwise if it is used to fix the soprano part at generation time, it seems from our manual observation of the generated chorales that all notes are being sampled, and the soprano part are not real Lutheran melodies.

Also, what is the difference in purpose between the datasets in the datasets/ and tensor_datasets/ folder?

Thank you so much!

@Ghadjeres
Copy link
Owner

Hi,

Yes you're right. It seems that I put the wrong ChoraleDataset in tensor_dataset.... Sorry for that. So you'll have to recreate it if you want to train a new model (This takes some time because of all the transpositions together with the key analyzer of music21).
The difference between datasets/ and tensor_datasets/ is that datasets/ contains metadata about the dataset (size of the sequences, voices used, etc.) and tensor_dataset/ is only the tensor of size (num_examples, num_voices, chorale_length) + the metadata tensor (containing fermata indications). The idea was that when you want to generate, you don't necessarily need to load the whole dataset (which takes time and space) so you can only use information present in datasets/.

Maybe you can find the correct ChoraleDataset in the docker, but I'm not sure.

Best,

@alisawuffles
Copy link
Author

alisawuffles commented Nov 22, 2019

No worries! I recreated the dataset, which was very straight-forward to do with the provided code. :) By the way, there were a bunch of KeyError with chorale 309 printed, but I assume this is some unimportant problem with the key analyzer, as a comment in your code indicates. Do you remember seeing something similar? Training a model on this dataset and then generating chorales still produced good results, so I assume the dataset was constructed fine.

Oh, the reason for saving two different datasets in datasets/ and tensor_datasets/ makes complete sense. Thank you so much for explaining that!

@Ghadjeres
Copy link
Owner

Yes, I have exactly the same error with one of the chorales because of the key analyzer. This particular chorale will be just skipped and won't appear in the dataset. So no worries!
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants