Add WandB logging and Support training on a dataframe of clip labels #609

sammlapp · 2022-12-15T21:10:32Z

This PR adds two features to training CNNs in OpenSoundscape:

monitor training and prediction progress in Weights and Biases, where you can interact with samples, training progress, charts of training and validation metrics, and compare metadata/hyperparameters/performance across model training runs
train CNN using a dataframe describing labeled audio segments without splitting the audio files into clips

for instance, load annotations from a Raven file, map them onto a clip dataframe, and train without generating short audio clips

logs training and validation data separately. Also logs epochs and batches.

added code to set up wandb with init(config) and logging samples + metrics during train. still need to test and add it to .predict()

Full working version of wandb integration for training and prediction. Instead of wandb.init() inside train, the user starts a wandb session and passes it to train or predict. The train and predict functions log sample-preview tables and progress/metrics to the wandb session, and update the config.

to train on clips from longer files, pass a training dataframe with a multi-index of (file,start_time,end_time). The preprocessor will first extract clips based on the specified start_time and end_time. It will then further trim the clips (randomly or from the beginning, depending on preprocessor/augmentation settings) to match the necessary sample duration. I think the same behavior applies to validation without any issues, but should check that its working as expected.

Needed to change AudioSplittingDataset to return a df with multi-index (file,start_time,end_time). That df is used to set up the prediction dataframe for predicting on clips. Seems to be working now, but deserves tests.

needed file, start_time, end_time as columns

CNN.predict() can recieve a list of files, a dataframe with files as index, or (New) a dataframe which already contains clip information (multi-index of file, start_time, end_time)

AudioClipLoader now takes _end_time rather than _clip_duration. Update tests with correct arguments.

wasn't implemented correctly, was just initializing an empty dataset. Now, creates empty dataset then adds the clip_df as dataset.label_df.

- predicting from a list of audio files - predicting from a clip_df specifying clip start and end times

was using unnecessarily complex logic. Now correctly uses the base class AudioFileDataset and provides better comments describing the logic

sammlapp added 30 commits June 19, 2022 07:32

add wandb logging to model training

3dd0de4

try logging 2 line plot

fd9a4ee

add missing import

022f809

add x

f8262d0

x dim?

c465af8

try differnt plotting

e21dbc8

add wandb logging

32feb60

logs training and validation data separately. Also logs epochs and batches.

Merge remote-tracking branch 'origin/develop' into feat_wandb

cccd3eb

in-progress wandb logging feature

63b4f53

added code to set up wandb with init(config) and logging samples + metrics during train. still need to test and add it to .predict()

wandb working for .train()

f05ac7b

add wandb dependency

46462ae

Fix prediction error with duplicate scores

6685473

Needed to change AudioSplittingDataset to return a df with multi-index (file,start_time,end_time). That df is used to set up the prediction dataframe for predicting on clips. Seems to be working now, but deserves tests.

fix creation of blank clip_times_df

757bd3d

needed file, start_time, end_time as columns

fix error in creating blank df

0e50b3b

allow predict() input to be clip_df

494d778

CNN.predict() can recieve a list of files, a dataframe with files as index, or (New) a dataframe which already contains clip information (multi-index of file, start_time, end_time)

Merge branch 'develop' into feat_train_on_clips

fa665dd

remove accidental print; resolves #597

2b1637d

(re)add input validation for CNN.predict()

81dcc30

update tests

48a6293

AudioClipLoader now takes _end_time rather than _clip_duration. Update tests with correct arguments.

Merge branch 'feat_soundfile' into feat_train_on_clips

9d2ed9e

update poetry lock file

92efd67

Merge branch 'develop' into feat_train_on_clips

feb5541

Merge branch 'develop' into feat_train_on_clips

4efc1ac

fix input validation to not fail with zero samples

b33e08e

fix predict(clip_df) behavior

8b04070

wasn't implemented correctly, was just initializing an empty dataset. Now, creates empty dataset then adds the clip_df as dataset.label_df.

Merge branch 'develop' into feat_train_on_clips

41423ea

add tests for prediction scenarios

cd2fb70

- predicting from a list of audio files - predicting from a clip_df specifying clip start and end times

clean up dataset choice

8962aab

was using unnecessarily complex logic. Now correctly uses the base class AudioFileDataset and provides better comments describing the logic

sammlapp merged commit 7629e8d into develop Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WandB logging and Support training on a dataframe of clip labels #609

Add WandB logging and Support training on a dataframe of clip labels #609

sammlapp commented Dec 15, 2022

Add WandB logging and Support training on a dataframe of clip labels #609

Add WandB logging and Support training on a dataframe of clip labels #609

Conversation

sammlapp commented Dec 15, 2022