Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loader for 3D-MARCo #71

Merged
merged 22 commits into from
Oct 15, 2021
Merged

Conversation

iranroman
Copy link
Contributor

3D-MARCo

Description

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the total duration of the dataset in hours, and (optionally) also list the number of individual files
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of audio included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
  • Run the script on the canonical version of the dataset and save the index in soundata/indexes/ e.g. my_dataset_index.json.
  • Create a module in soundata, e.g. soundata/my_dataset.py
  • Create tests for your loader in tests/, e.g. test_my_dataset.py
  • Add your module to docs/source/soundata.rst and docs/source/quick_reference.rst
  • Run tests/test_full_dataset.py on your dataset.

@iranroman
Copy link
Contributor Author

Adding my work toward the 3D-MARCo dataset loader. The dataset does not have a metadata file. The metadata lives in the filenames. As a result, the metadata is obtained from the index.json file that is specific to the dataset.

Please advice on how to properly process and test this kind of metadata. By default, soundata checks that the properties and annotations are loaded from a metadata file, not from the index.json. Hence I do not think the tests in this loader are as thorough as they need to be.

@codecov
Copy link

codecov bot commented Oct 13, 2021

Codecov Report

Merging #71 (24c6504) into master (2fa3aca) will increase coverage by 0.12%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
+ Coverage   97.02%   97.14%   +0.12%     
==========================================
  Files          17       18       +1     
  Lines        1377     1437      +60     
==========================================
+ Hits         1336     1396      +60     
  Misses         41       41              

Copy link
Collaborator

@magdalenafuentes magdalenafuentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iranroman looks good to me! Just left minor comments.

As part of this PR we could reduce the size of the eigenscape test file from 2.5s to 1s, since now it's quite heavy (>8MB).

time (str): time when the audio signal was recorded
date (str): date when the audio signal was recorded
additional information (str): notes included by the dataset
authors with otherdetails relevant to the specific clip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo


source_label = self._clip_metadata.get("source_label")
if source_label is None:
self.source_label = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line not covered by tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing since there will always be a source_label.

)

if not os.path.exists(json_path):
raise FileNotFoundError("Metadata not found")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line not covered by tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also removing. Since the metadata is extracted from the index.json and that comes built-in with soundata, this file should be there if soundata was installed properly.

Comment on lines 226 to 244
for path_filename in all_paths_filenames:

clip_id = path_filename

path, filename = path_filename.split("/")

source_label = path

clip_metadata = filename.split("_")

# remove arbitrary clip numbering used by dataset authors
clip_metadata = [
data for data in clip_metadata if data != "" and data[0] != "0"
]

microphone_info = clip_metadata[1:]

if "deg" in clip_metadata[0]:
source_angle = "".join(clip_metadata[0].partition("deg")[:2])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe remove in-between empty lines?

tests/datasets/test_marco.py Show resolved Hide resolved
# validate metadata
assert jam.file_metadata.duration == 1.0
assert jam.sandbox.microphone_info == ["OCT3D", "2", "FR"]
print(jam.annotations)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -17,6 +17,7 @@ def run_clip_tests(clip, expected_attributes, expected_property_types):

# test clip attributes
for attr in clip_attr["attributes"]:
print(attr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@iranroman
Copy link
Contributor Author

Thanks a lot for your comments @magdalenafuentes. I have addressed them and pushed my new commits.

Copy link
Collaborator

@magdalenafuentes magdalenafuentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iranroman great! Ready to merge!

@magdalenafuentes magdalenafuentes merged commit f9fedd8 into soundata:master Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants