Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore the lengths of the noisy dataset and test dataset #25

Closed
JoshVarty opened this issue May 2, 2019 · 1 comment
Closed

Explore the lengths of the noisy dataset and test dataset #25

JoshVarty opened this issue May 2, 2019 · 1 comment

Comments

@JoshVarty
Copy link
Owner

We should take a look at the noisy and test datasets in our exploratory data analysis. My understanding is that the test dataset is from the same source as the curated dataset:

The test set is used for system evaluation and consists of manually-labeled data from FSD. Since most of the train data come from YFCC, some acoustic domain mismatch between the train and test set can be expected. All the acoustic material present in the test set is labeled, except human error, considering the vocabulary of 80 classes used in the competition.

Are the clips in our test set the same length as the ones in the curated set?

@JoshVarty
Copy link
Owner Author

JoshVarty commented May 2, 2019

Curated Training Set

image

Noisy Training Set

image

Almost all clips are 15 seconds long.

Test Set

image

So it looks to me like the test set is taken from roughly the same distribution as the curated training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant