-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Luigi cleanup #8
Merged
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
373c730
Added coughvid.py pipeline
turian f2c19a3
Black
turian 3f2be27
Convert to mono wav corpus
turian 1223598
Added precommit info
turian f8b7f1d
Small bugfix
turian 109789d
Another comment
turian 8bb8886
Updated TODO
turian f579231
Fix resampling
turian aa2ef62
Starting to refactor
turian 1aee316
Refactoring
turian b149d74
Refactor works
turian f0f915f
Tar file
turian a455bf3
S3 code works
turian 594bc5f
Luigi pipeline requirements
jorshi 3c68890
Starting config file. s3utils
jorshi fc5a24e
Creating subdirectories for config and utils
jorshi 14fedbe
Moving all config over to config file
jorshi 62dfb78
Move utils over to luigi util
jorshi a575f93
Creating audio utils
jorshi 396def3
Moving out s3 code
jorshi 2c28afc
Check output on unzip
jorshi b7d0720
Cleaning up extract method
jorshi e9d5472
Progress bar for downloads
jorshi 2ffb18a
Wrong slugigy package
jorshi b866faa
gitignore for eval tasks
jorshi 92e8a93
S3 caching config
jorshi a871e64
Change S3 config back to defaults
jorshi 51625fc
Added a couple todos
jorshi bfb636a
Updating S3 config and pulling out from coughvid
jorshi 9055cbc
Removing S3 from coughvid
jorshi f9a6eb1
Moving gitignore and requirements into top-level folder
jorshi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
repos: | ||
- repo: https://github.com/kynan/nbstripout | ||
rev: 0.3.9 | ||
hooks: | ||
- id: nbstripout | ||
- repo: https://github.com/mwouts/jupytext | ||
rev: v1.11.2 | ||
hooks: | ||
- id: jupytext | ||
args: [--sync, --pipe, black] | ||
additional_dependencies: | ||
- black==21.5b0 # Matches hook | ||
- repo: https://github.com/psf/black | ||
rev: 21.5b0 | ||
hooks: | ||
- id: black | ||
language_version: python3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,10 @@ | ||
# hear2021-eval-kit | ||
|
||
Evaluation kit for HEAR 2021 NeurIPS competition | ||
|
||
|
||
If you are pushing code to this repo, please make sure you have | ||
pre-commit hooks installed: | ||
``` | ||
pre-commit install | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
evaluation-tasks | ||
================ | ||
|
||
This folder contain Luigi pipelines to download and preprocess | ||
evaluation tasks into a common format. Luigi checkpoints are saved | ||
into directory .checkpoints so preprocessing can be resumed if | ||
interrupted. After preprocessing, tar'ed outputs are saved to your | ||
S3 bucket. This avoids hitting dataset providers repeatedly. | ||
|
||
For each evaluation task, the directory structure is: | ||
taskname/ | ||
task.json | ||
README | ||
LICENSE | ||
train.csv | ||
[filename],... | ||
test.csv | ||
[filename],... | ||
audio/[sr]/train/[filename] | ||
|
||
## More details | ||
|
||
task.json also specifies the hop_size that we will use for the | ||
evaluation. | ||
|
||
If this is a task involving multiple classes or labels, the max | ||
number of classes/labels will be provided. We might have two versions | ||
of label files, ones with strings and ones converted to ints for | ||
convenience. | ||
|
||
## train.csv and test.csv | ||
|
||
For classification/multi-classification of the entire sound: | ||
``` | ||
filename, non-negative integer class | ||
``` | ||
|
||
For tagging (multilabel sound event classification) of the entire sound: | ||
``` | ||
filename, list of string labels | ||
``` | ||
|
||
For frame-based temporal multilabel (e.g. transcription and sound event detection): | ||
``` | ||
filename, float timestamp in seconds, list of string labels | ||
``` | ||
|
||
For ranking tasks: | ||
``` | ||
list of filenames in ranked order | ||
``` | ||
|
||
For JND tasks: | ||
``` | ||
filename1, filename2, 0/1 indicates whether the audio is perceptually different to human listeners. | ||
``` | ||
|
||
If the dataset provides a validation.csv, that will be included | ||
too. Otherwise, participants do partition train into train/val | ||
however they like. | ||
|
||
## Caching with S3 | ||
|
||
1. Download and configure the AWS CLI if you haven't done that already: | ||
* [Intallation](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) | ||
* [Configuration](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html) | ||
|
||
2. Update the S3 config file: `config/s3.py` | ||
* `S3_CACHE = True` enables S3 caching. Set this to False if you want to disable | ||
caching for all tasks. | ||
* `HANDLE` is a string that is used to create an S3 bucket for all the evaluation | ||
tasks. Every S3 bucket must have a unique name, so you should use this to create | ||
one for yourself. The value of `HANDLE` is appended to `hear2021-`. For example, | ||
if I set `HANDLE=jordie` then all my tasks will be cached in a bucket named | ||
`hear2021-jordie`. | ||
* `S3_REGION_NAME` sets the region for your S3 buckets. You can set this to `None` | ||
to use the default value set during CLI configuration. |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
""" | ||
Configuration for the coughvid task | ||
""" | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some of these could probably be moved to a global / luigi config file (similar to the s3 config stuff that I pulled out) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
# TODO: move some of these to a global config and import that here | ||
# See: https://github.com/neuralaudio/hear2021-eval-kit/issues/10 | ||
|
||
TASKNAME = "coughvid-v2.0.0" | ||
|
||
# Number of CPU workers for Luigi jobs | ||
NUM_WORKERS = 4 | ||
# NUM_WORKERS = 1 | ||
# If you only use one sample rate, you should have an array with | ||
# one sample rate in it. | ||
# However, if you are evaluating multiple embeddings, you might | ||
# want them all. | ||
SAMPLE_RATES = [48000, 44100, 22050, 16000] | ||
# TODO: Pick the 75th percentile length? | ||
SAMPLE_LENGTH_SECONDS = 8.0 | ||
# TODO: Do we want to call this FRAME_RATE or HOP_SIZE | ||
FRAME_RATE = 4 | ||
# Set this to None if you want to use ALL the data. | ||
# NOTE: This will be, expected, 225 test files only :\ | ||
# NOTE: You can make this smaller during development of this | ||
# preprocessing script, to keep the pipeline fast. | ||
# WARNING: If you change this value, you *must* delete _workdir | ||
# or working dir. | ||
# Most of the tasks iterate over every audio file present, | ||
# except for the one that downsamples the corpus. | ||
# (This is why we should have one working directory per task) | ||
MAX_FRAMES_PER_CORPUS = 20 * 3600 | ||
|
||
MAX_FILES_PER_CORPUS = int(MAX_FRAMES_PER_CORPUS / FRAME_RATE / SAMPLE_LENGTH_SECONDS) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
""" | ||
Configuration specific to AWS S3 | ||
""" | ||
|
||
# You should pick a unique handle, since this determine the S3 path | ||
# (which must be globally unique across all S3 users). | ||
HANDLE = "hear" | ||
S3_BUCKET = f"hear2021-{HANDLE}" | ||
|
||
# If this is None, boto will use whatever is in your | ||
# ~/.aws/config or AWS_DEFAULT_REGION environment variable | ||
S3_REGION_NAME = "eu-central-1" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does caching mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically just turns on and off the S3 caching. Won't need this if we pull out the S3 stuff into a separate script.