Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Luigi pipeline update #78

Merged
merged 29 commits into from
Jul 22, 2021
Merged

[WIP] Luigi pipeline update #78

merged 29 commits into from
Jul 22, 2021

Conversation

jorshi
Copy link
Contributor

@jorshi jorshi commented Jul 21, 2021

Idea for updating the config and creating a more generic pipeline that can run based on the config files. To help keep the config files consistent I made them python classes -- these can be instantiated / modified by json in the future if we want.

@jorshi jorshi requested review from khumairraj and turian July 21, 2021 09:01
heareval/tasks/config/speech_commands.py Outdated Show resolved Hide resolved
heareval/tasks/config/speech_commands.py Outdated Show resolved Hide resolved
heareval/tasks/config/__init__.py Outdated Show resolved Hide resolved
heareval/tasks/util/dataset_builder.py Outdated Show resolved Hide resolved
heareval/tasks/util/dataset_builder.py Outdated Show resolved Hide resolved
@jorshi
Copy link
Contributor Author

jorshi commented Jul 21, 2021

Added NSynth pitch dataset with the new builder method


# Filter out pitches that are not within the range of a standard piano
metadata = metadata[metadata["pitch"] >= 21]
metadata = metadata[metadata["pitch"] <= 108]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these config per @khumairraj's suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe we can define it in the NsynthPitchConfig after the super(). Basically, everything in the super is something that is required across all datasets. Any config specific to the dataset can be after the super. Mostly such a kind of config attribute will be affecting the ConfigureProcessMetadata like this pitch thing.

Copy link
Contributor

@turian turian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome

Comment on lines +87 to +91
task_class = new_class(
name=name,
bases=(base,),
exec_body=partial(self.add_requirements, requirements),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is v cool.

Copy link
Contributor

@khumairraj khumairraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These refactors are great.

# train: 85,111; valid: 10,102; test: 4890. These values will be a bit less
# after filter the pitches to be only within the piano range.
# To subsample a partition, set the max_files to an integer.
# TODO: Should we subsample NSynth?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full nsynth is quite large. Should we subsample it?

@jorshi jorshi mentioned this pull request Jul 22, 2021
@jorshi jorshi merged commit 979bdba into main Jul 22, 2021
@jorshi jorshi deleted the luigi-pipeline-update branch July 22, 2021 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants