Explore the lengths of the noisy dataset and test dataset #25

JoshVarty · 2019-05-02T03:06:47Z

We should take a look at the noisy and test datasets in our exploratory data analysis. My understanding is that the test dataset is from the same source as the curated dataset:

The test set is used for system evaluation and consists of manually-labeled data from FSD. Since most of the train data come from YFCC, some acoustic domain mismatch between the train and test set can be expected. All the acoustic material present in the test set is labeled, except human error, considering the vocabulary of 80 classes used in the competition.

Are the clips in our test set the same length as the ones in the curated set?

The text was updated successfully, but these errors were encountered:

JoshVarty · 2019-05-02T18:20:45Z

Curated Training Set

Noisy Training Set

Almost all clips are 15 seconds long.

Test Set

So it looks to me like the test set is taken from roughly the same distribution as the curated training set.

JoshVarty closed this as completed May 2, 2019

JoshVarty mentioned this issue May 2, 2019

Investigate Validation Set #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore the lengths of the noisy dataset and test dataset #25

Explore the lengths of the noisy dataset and test dataset #25

JoshVarty commented May 2, 2019

JoshVarty commented May 2, 2019 •

edited

Loading

Explore the lengths of the noisy dataset and test dataset #25

Explore the lengths of the noisy dataset and test dataset #25

Comments

JoshVarty commented May 2, 2019

JoshVarty commented May 2, 2019 • edited Loading

Curated Training Set

Noisy Training Set

Test Set

JoshVarty commented May 2, 2019 •

edited

Loading