Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loader for jamendo moodtheme #505

Merged
merged 39 commits into from Sep 7, 2021
Merged

Adding loader for jamendo moodtheme #505

merged 39 commits into from Sep 7, 2021

Conversation

PRamoneda
Copy link
Collaborator

@PRamoneda PRamoneda commented Apr 28, 2021

Title

Please use the following title: "Adding loader for MyDATASET". If your pull request is work in progress, change your title to "[WIP] Adding loader for MyDATASET" to avoid reviews while the loader is not ready.

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the size of the datasets (e.g. number files and duration, hours)
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of music included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
  • Create a module in mirdata, e.g. mirdata/my_dataset.py
  • Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py
  • Add your module to docs/source/mirdata.rst and docs/source/quick_reference.rst
  • Run tests/test_full_dataset.py on your dataset.

@PRamoneda
Copy link
Collaborator Author

hey!! ready to review!

@codecov
Copy link

codecov bot commented Apr 28, 2021

Codecov Report

Merging #505 (5a69b50) into master (6e087cf) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #505      +/-   ##
==========================================
+ Coverage   99.00%   99.01%   +0.01%     
==========================================
  Files          45       46       +1     
  Lines        5415     5491      +76     
==========================================
+ Hits         5361     5437      +76     
  Misses         54       54              

Copy link

@philtgun philtgun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The .download() description tells to put the audio under mtg_jamendo_autotagging_moodtheme/audio, but the .validate() is checking under audios directory - an inconsistency.
  • The output of .validate() is huge when there are no tracks, not sure if such behavior is static across mirdata, but for such a big dataset is a bit too much output. It also doesn't say that the tracks are missing.
  • Not sure when is the metadata downloaded - I put the audio files manually, successfully ran .validate(), but still can't access the tags. Not sure if the metadata should be downloaded with .download(partial_download=['metadata']) which still just shows the instructions for downloading the audio and does nothing.

mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
docs/source/table.rst Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
@magdalenafuentes
Copy link
Collaborator

Wow thanks @philtgun for the detailed review!

@PRamoneda
Copy link
Collaborator Author

Thank you so much for your review!!! @philtgun . I am going to add your suggestions and errors now :)

@PRamoneda
Copy link
Collaborator Author

Hi Philip! @philtgun

The output of .validate() is huge when there are no tracks, not sure if such behavior is static across mirdata, but for such a big dataset is a bit too much output. It also doesn't say that the tracks are missing.

This is the usual behaviour... but you are right maybe a tiny general description would be better

@PRamoneda
Copy link
Collaborator Author

hi @philtgun good catch!

Not sure when is the metadata downloaded - I put the audio files manually, successfully ran .validate(), but still can't access the tags. Not sure if the metadata should be downloaded with .download(partial_download=['metadata']) which still just shows the instructions for downloading the audio and does nothing.

@PRamoneda
Copy link
Collaborator Author

It is caused because i havent run the full_dataset test. I am running it now!

@PRamoneda PRamoneda requested a review from rabitt May 1, 2021 12:18
@PRamoneda
Copy link
Collaborator Author

four hours later....

 % pytest -s tests/test_full_dataset.py --local --dataset mtg_jamendo_autotagging_moodtheme
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pep8.py:110: FutureWarning: Possible nested set at position 1
  EXTRANEOUS_WHITESPACE_REGEX = re.compile(r'[[({] | []}),;:]')
============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.7.9, pytest-6.2.2, py-1.9.0, pluggy-0.13.1
rootdir: /Users/pedroramonedafranco/PycharmProjects/mirdata3
plugins: localserver-0.5.0, mock-3.3.1, cov-2.10.1, pep8-1.0.6
collected 4 items                                                                                                                                                                

tests/test_full_dataset.py If this dataset does not have openly downloadable data, follow the instructions printed by the download message and rerun this test.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18486/18486 [11:33<00:00, 26.67it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18486/18486 [4:27:18<00:00,  1.15it/s]
..

================================================================================ warnings summary ================================================================================
tests/test_full_dataset.py: 18486 warnings
  /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/librosa/core/audio.py:162: UserWarning: PySoundFile failed. Trying audioread instead.
    warnings.warn("PySoundFile failed. Trying audioread instead.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html
================================================================ 4 passed, 18486 warnings in 16739.36s (4:38:59) =================================================================
(base) 

@PRamoneda
Copy link
Collaborator Author

Ready to merge! @rabitt @magdalenafuentes

@PRamoneda
Copy link
Collaborator Author

hi! I would like to remove the dataset from my laptop. What is your opinion about this loader?

@rabitt
Copy link
Collaborator

rabitt commented Aug 5, 2021

Hey @PRamoneda I'll try to take a look tomorrow!

Copy link
Collaborator

@rabitt rabitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New loader checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py
  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json
  • Create a module in mirdata, e.g. mirdata/my_dataset.py
  • Create tests in tests/, e.g. test_my_dataset.py
  • Add the module to docs/source/mirdata.rst
  • Add the module to docs/source/table.rst
  • Run test_full_dataset.py
  • Make sure the docs build properly

mirdata/datasets/good_sounds.py Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
mirdata/datasets/mtg_jamendo_autotagging_moodtheme.py Outdated Show resolved Hide resolved
docs/source/table.rst Show resolved Hide resolved
@rabitt
Copy link
Collaborator

rabitt commented Sep 7, 2021

Looks great @PRamoneda ! Merging now

@rabitt rabitt merged commit 1c0eeda into master Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants