Issue 617 clipdf #631

sammlapp · 2023-01-07T21:11:25Z

resolves #617, #589, and #630 by refactoring make_clip_df and related code

make_clip_df now only returns a clip_df, unless user requests that invalid samples are also returned as a second argument
'unsafe' renamed to 'invalid' for variable names and arguments (eg, unsafe_samples is now invalid_samples)
improved documentation
invalid_samples and _invalid_samples collections are now set() rather than list, so that they contain only unique values
when a dataframe has a multi-index, the invalid_sample value (for a sample = pd.Series) is just the file path (first item in the name tuple) rather than the entire name tuple

Also contains a bug fix that should have been in another branch: workaround for librosa/soundfile trying to load float32 from mp3 and getting empty sample array

move binary 0/1 prediction to functions in `metrics` module

resolves #529

by default now only returns clip_df. If user specifies `return_unsafe_samples=True`, will return unsafe samples as a second return value. Also, now supports labels. If passed a dataframe with file paths as index and labels as values (one column per class, 0/1 values), it will copy the values from each file to all clips belonging to that file in the resulting clip df. Note that if the label df passed has duplicated paths in the index, only the _first_ row for any unique path is used as the labels in the resulting clip_df.

for safe dataset and other variable names, use 'invalid' rather than 'unsafe' to indicate samples that failed to preprocess. Similarly, use 'valid' rather than 'safe'.

Makes sense to only track unique values of paths that caused preprocessing errors. Resolves unsafe samples should be set #630

- remove comments refering to returned predictions - remove argument `threshold` in .predict()

see librosa/librosa#1622, bastibe/python-soundfile#349

was allowing attempt to write metadata with non-allowed formats because of logical error in the if statement

also remove outdated `threshold` args in tests

pass return_invalid_samples=True to .predict() to return a set of paths for files that failed to preprocess. Includes a test.

sammlapp added 14 commits January 4, 2023 19:11

modify predict to only return score df

abf1487

move binary 0/1 prediction to functions in `metrics` module

round clip times to avoid floating point mismatch

b5dbdf7

resolves #529

rename 'unsafe' to 'invalid'

5f7d405

for safe dataset and other variable names, use 'invalid' rather than 'unsafe' to indicate samples that failed to preprocess. Similarly, use 'valid' rather than 'safe'.

invalid samples as set() rather than list

a88296c

Makes sense to only track unique values of paths that caused preprocessing errors. Resolves unsafe samples should be set #630

remove outdated args/comments

9ea64f3

- remove comments refering to returned predictions - remove argument `threshold` in .predict()

temp. workaround for mp3 loads empty sample array

554ffd9

see librosa/librosa#1622, bastibe/python-soundfile#349

fix logical error in if/else structure

81c0f93

was allowing attempt to write metadata with non-allowed formats because of logical error in the if statement

Merge branch 'develop' into issue_573_preds

e5c1741

Merge branch 'issue_573_preds' into issue_617_clipdf

c0d4d1b

fix reference to ._invalid_indices

f7663c9

also remove outdated `threshold` args in tests

add optional invalid_samples return in .predict()

90bd13d

pass return_invalid_samples=True to .predict() to return a set of paths for files that failed to preprocess. Includes a test.

add dtype workaround for load_channels_as_audio

c58a7a8

fix: mono=False and sample_rate for load channels

b961864

sammlapp merged commit 7716260 into develop Jan 10, 2023

sammlapp deleted the issue_617_clipdf branch January 10, 2023 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 617 clipdf #631

Issue 617 clipdf #631

sammlapp commented Jan 7, 2023 •

edited

Loading

Issue 617 clipdf #631

Issue 617 clipdf #631

Conversation

sammlapp commented Jan 7, 2023 • edited Loading

sammlapp commented Jan 7, 2023 •

edited

Loading