-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WandB logging and Support training on a dataframe of clip labels #609
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
logs training and validation data separately. Also logs epochs and batches.
added code to set up wandb with init(config) and logging samples + metrics during train. still need to test and add it to .predict()
Full working version of wandb integration for training and prediction. Instead of wandb.init() inside train, the user starts a wandb session and passes it to train or predict. The train and predict functions log sample-preview tables and progress/metrics to the wandb session, and update the config.
to train on clips from longer files, pass a training dataframe with a multi-index of (file,start_time,end_time). The preprocessor will first extract clips based on the specified start_time and end_time. It will then further trim the clips (randomly or from the beginning, depending on preprocessor/augmentation settings) to match the necessary sample duration. I think the same behavior applies to validation without any issues, but should check that its working as expected.
Needed to change AudioSplittingDataset to return a df with multi-index (file,start_time,end_time). That df is used to set up the prediction dataframe for predicting on clips. Seems to be working now, but deserves tests.
needed file, start_time, end_time as columns
CNN.predict() can recieve a list of files, a dataframe with files as index, or (New) a dataframe which already contains clip information (multi-index of file, start_time, end_time)
AudioClipLoader now takes _end_time rather than _clip_duration. Update tests with correct arguments.
wasn't implemented correctly, was just initializing an empty dataset. Now, creates empty dataset then adds the clip_df as dataset.label_df.
- predicting from a list of audio files - predicting from a clip_df specifying clip start and end times
was using unnecessarily complex logic. Now correctly uses the base class AudioFileDataset and provides better comments describing the logic
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds two features to training CNNs in OpenSoundscape: