Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 617 clipdf #631

Merged
merged 14 commits into from
Jan 10, 2023
Merged

Issue 617 clipdf #631

merged 14 commits into from
Jan 10, 2023

Conversation

sammlapp
Copy link
Collaborator

@sammlapp sammlapp commented Jan 7, 2023

resolves #617, #589, and #630 by refactoring make_clip_df and related code

  • make_clip_df now only returns a clip_df, unless user requests that invalid samples are also returned as a second argument
  • 'unsafe' renamed to 'invalid' for variable names and arguments (eg, unsafe_samples is now invalid_samples)
  • improved documentation
  • invalid_samples and _invalid_samples collections are now set() rather than list, so that they contain only unique values
  • when a dataframe has a multi-index, the invalid_sample value (for a sample = pd.Series) is just the file path (first item in the name tuple) rather than the entire name tuple

Also contains a bug fix that should have been in another branch: workaround for librosa/soundfile trying to load float32 from mp3 and getting empty sample array

move binary 0/1 prediction to functions in `metrics` module
by default now only returns clip_df. If user specifies `return_unsafe_samples=True`, will return unsafe samples as  a second return value.

Also, now supports labels. If passed a dataframe with file paths as index and labels as values (one column per class, 0/1 values), it will copy the values from each file to all clips belonging to that file in the resulting clip df. Note that if the label df passed has duplicated paths in the index, only the _first_ row for any unique path is used as the labels in the resulting clip_df.
for safe dataset and other variable names, use 'invalid' rather than 'unsafe' to indicate samples that failed to preprocess. Similarly, use 'valid' rather than 'safe'.
Makes sense to only track unique values of paths that caused preprocessing errors. Resolves unsafe samples should be set #630
- remove comments refering to returned predictions
- remove argument `threshold` in .predict()
was allowing attempt to write metadata with non-allowed formats because of logical error in the if statement
also remove outdated `threshold` args in tests
pass return_invalid_samples=True to .predict() to return a set of paths for files that failed to preprocess. Includes a test.
@sammlapp sammlapp merged commit 7716260 into develop Jan 10, 2023
@sammlapp sammlapp deleted the issue_617_clipdf branch January 10, 2023 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant