Add dataset for audio tagging #1241

marcoyang1998 · 2023-12-19T08:45:49Z

This PR adds a dataset for audio tagging. It can be used to train an audio tagging model to predict the sound event of an audio clip.

It requires a new custom field named audio_event in the supervision of each cut, an example of this would be like:

{"id": "balanced/-1TLtjPtnms_10.000.wav", "start": 0.0, "duration": 10.0, "channel": 0, "supervisions": [{"id": "balanced/-1TLtjPtnms_10.000.wav", "recording_id": "balanced/-1TLtjPtnms_10.000.wav", "start": 0.0, "duration": 10.0, "channel": 0, "custom": {"audio_event": "220;137;519"}}], "features": {"type": "kaldi-fbank", "num_frames": 1000, "num_features": 80, "frame_shift": 0.01, "sampling_rate": 16000, "start": 0.0, "duration": 10.0, "storage_type": "lilcom_chunky", "storage_path": "data/fbank_audioset/balanced_balanced_feats/feats-0.lca", "storage_key": "77756,38006,38749", "channels": 0}, "recording": {"id": "balanced/-1TLtjPtnms_10.000.wav", "sources": [{"type": "file", "channels": [0], "source": "downloads/audioset/balanced/-1TLtjPtnms_10.000.wav"}], "sampling_rate": 16000, "num_samples": 160000, "duration": 10.0, "channel_ids": [0]}, "type": "MonoCut"}

pzelasko

Could you add a unit test for this dataset? Thanks.

pzelasko · 2023-12-20T19:39:55Z

lhotse/dataset/audio_tagging.py

+                      - multi-channel: currently not supported
+            'supervisions': [
+                {
+                    # For audio event, which can be mapped to a multi-hot tensor


Don't you prefer to construct the multi-hot tensor inside the dataset instead?

Sorry for getting back so late.

My intention was that users might need the name of the audio event for other tasks, for example, audio captioning.

marcoyang1998 · 2024-03-21T04:27:32Z

Added the unit test.

pzelasko

Thanks

marcoyang1998 added 2 commits December 19, 2023 16:27

add audio tagging dataset

f8109ac

minor fix

e32f6ee

pzelasko reviewed Dec 20, 2023

View reviewed changes

add test for audio tagging

89958e9

Merge branch 'master' into audio_tagging

086f6b1

pzelasko approved these changes Mar 27, 2024

View reviewed changes

pzelasko merged commit 1c2a1b5 into lhotse-speech:master Mar 27, 2024
11 checks passed

pzelasko added this to the v1.23.0 milestone Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset for audio tagging #1241

Add dataset for audio tagging #1241

marcoyang1998 commented Dec 19, 2023

pzelasko left a comment

pzelasko Dec 20, 2023

marcoyang1998 Mar 21, 2024

marcoyang1998 commented Mar 21, 2024

pzelasko left a comment

Add dataset for audio tagging #1241

Add dataset for audio tagging #1241

Conversation

marcoyang1998 commented Dec 19, 2023

pzelasko left a comment

Choose a reason for hiding this comment

pzelasko Dec 20, 2023

Choose a reason for hiding this comment

marcoyang1998 Mar 21, 2024

Choose a reason for hiding this comment

marcoyang1998 commented Mar 21, 2024

pzelasko left a comment

Choose a reason for hiding this comment