Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset for audio tagging #1241

Merged
merged 4 commits into from
Mar 27, 2024

Conversation

marcoyang1998
Copy link
Contributor

This PR adds a dataset for audio tagging. It can be used to train an audio tagging model to predict the sound event of an audio clip.

It requires a new custom field named audio_event in the supervision of each cut, an example of this would be like:

{"id": "balanced/-1TLtjPtnms_10.000.wav", "start": 0.0, "duration": 10.0, "channel": 0, "supervisions": [{"id": "balanced/-1TLtjPtnms_10.000.wav", "recording_id": "balanced/-1TLtjPtnms_10.000.wav", "start": 0.0, "duration": 10.0, "channel": 0, "custom": {"audio_event": "220;137;519"}}], "features": {"type": "kaldi-fbank", "num_frames": 1000, "num_features": 80, "frame_shift": 0.01, "sampling_rate": 16000, "start": 0.0, "duration": 10.0, "storage_type": "lilcom_chunky", "storage_path": "data/fbank_audioset/balanced_balanced_feats/feats-0.lca", "storage_key": "77756,38006,38749", "channels": 0}, "recording": {"id": "balanced/-1TLtjPtnms_10.000.wav", "sources": [{"type": "file", "channels": [0], "source": "downloads/audioset/balanced/-1TLtjPtnms_10.000.wav"}], "sampling_rate": 16000, "num_samples": 160000, "duration": 10.0, "channel_ids": [0]}, "type": "MonoCut"}

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a unit test for this dataset? Thanks.

- multi-channel: currently not supported
'supervisions': [
{
# For audio event, which can be mapped to a multi-hot tensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you prefer to construct the multi-hot tensor inside the dataset instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for getting back so late.

My intention was that users might need the name of the audio event for other tasks, for example, audio captioning.

@marcoyang1998
Copy link
Contributor Author

Added the unit test.

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@pzelasko pzelasko merged commit 1c2a1b5 into lhotse-speech:master Mar 27, 2024
11 checks passed
@pzelasko pzelasko added this to the v1.23.0 milestone Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants