Create Audio feature #2324

albertvillanova · 2021-05-05T15:55:22Z

Create Audio feature to handle raw audio files.

Some decisions to be further discussed:

I have chosen soundfile as the audio library; another interesting library is librosa, but this requires soundfile (see here). If we require some more advanced functionalities, we could eventually switch the library.
I have implemented the audio feature as an extra: pip install datasets[audio]. For the moment, the typical datasets user uses only text datasets, and there is no need for them for additional package requirements for audio/image if they do not need them.
For tests, I require audio dependencies (so that all audio functionalities are checked with our CI test suite); I exclude Linux platforms, which require an additional library to be installed with the distribution package manager
- I also require pytest-datadir, which allow to have (audio) data files for tests
The audio data contain: array and sample_rate.
The array is reshaped as 1D array (expected input for Wav2Vec2).

Note that to install soundfile on Linux, you need to install libsndfile using your distribution’s package manager, for example sudo apt-get install libsndfile1.

Requirements Specification

Access example with audio loading and resampling:
```
ds[0]["audio"]
```

Map with audio loading & resampling:

def preprocess(batch):
     batch["input_values"] = processor(batch["audio"]).input_values
     return batch

ds = ds.map(preprocess)

Map without audio loading and resampling:

def preprocess(batch):
     batch["labels"] = processor(batch["target_text"]).input_values
     return batch

ds = ds.map(preprocess)

Additional requirement specification (see Create Audio feature #2324 (review)): Cast audio column to change sampling sate:
```
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
```

tests/features/test_audio.py

albertvillanova · 2021-10-12T12:34:30Z

I think the last thing we need to do is make sure that cast_column changes the fingerprint of the dataset. Feel free to use the fingerprint_transform decorator, as for remove_columns.

(note that cast currently doesn't use the decorator since it's based on map that already updates the fingerprint).

@lhoestq note that cast_column may call cast in some cases, and the decorator would not be necessary for these cases...

I did it by setting inplace=False, and updating fingerprint explicitly only when cast is not called.

albertvillanova · 2021-10-12T14:29:45Z

I think current state of this PR could be included in our next release, as experimental feature, for stress testing it and try to find all potential issues. What do you think?

CC: @lhoestq @patrickvonplaten @anton-l

anton-l · 2021-10-13T08:12:46Z

Looks great! Ready to try it out on the transformers examples after the release :)

lhoestq

This is awesome, good job @albertvillanova !

src/datasets/arrow_dataset.py

patrickvonplaten · 2021-10-13T09:59:59Z

Think we are good to merge here no? :-)

albertvillanova added 8 commits May 5, 2021 13:24

Refactor features as a package

c27da03

Move translation features into own module

4d99ff9

Create Audio feature

e8a7457

Fix style

5110c68

Make late import for audio library

f69a7a9

Fix imports

fcdb110

Ignore flake8 errors

e6e8ff1

Fix imports

2d2adcd

albertvillanova added this to the 1.7 milestone May 5, 2021

albertvillanova added 5 commits May 5, 2021 19:09

Fix imports

e7d8904

Fix imports of private classes/functions

585bba7

Fix imports of private classes/functions

827b263

Fix test patch

dc5b582

Merge remote-tracking branch 'upstream/master' into features-audio

d8e7441

albertvillanova modified the milestones: 1.7, 1.8 May 31, 2021

albertvillanova modified the milestones: 1.8, 1.9 Jun 8, 2021

albertvillanova added 2 commits June 21, 2021 17:18

Mimic features package for tests

76cb67d

Merge remote-tracking branch 'upstream/master' into features-audio

7b72de8

albertvillanova mentioned this pull request Jun 22, 2021

Use Audio features for AutomaticSpeechRecognition task template #2536

Closed

albertvillanova added 9 commits June 25, 2021 14:14

Merge remote-tracking branch 'upstream/master' into features-audio

e0b77c1

Add required Feature attributes to Audio

3c28a61

Implement __call__

496a1fb

Add coding_format attribute

158c545

Validate audio coding format

f13b7e8

Add Audio docs

7b658f9

Test Audio instantiation

22f4131

Add audio test data

e95d3f9

Add Audio dependency requirements

100ceb1

patrickvonplaten reviewed Oct 11, 2021

View reviewed changes

tests/features/test_audio.py Show resolved Hide resolved

albertvillanova added 7 commits October 12, 2021 14:05

Test decode column in dataset with Audio feature

9c56ee9

Implement Features.decode_column

2afa6b6

Implement PythonFeaturesDecoder.decode_column

7088e17

Make PythonFormatter.format_column use PythonFeaturesDecoder

4dd3160

Add type hints and rename column to column_name

377cfcb

Merge remote-tracking branch 'upstream/master' into features-audio

c0515f2

Fix typo

a4a6f25

albertvillanova added 7 commits October 12, 2021 14:52

Update fingerprint within cast_column

7854ea9

Improve docstring

bf94ac9

Test decode column in formatted dataset with Audio feature

eb7b8c5

Make NumpyFormatter.format_column use PythonFeaturesDecoder

0678af2

Implement PandasFeaturesDecoder.decode_column

b1ed47e

Make PandasFormatter.format_column use PandasFeaturesDecoder

a05c9a5

Rename variable in PandasFeaturesDecoder.decode_row

97bf4ed

Merge remote-tracking branch 'upstream/master' into features-audio

583b24d

lhoestq approved these changes Oct 13, 2021

View reviewed changes

src/datasets/arrow_dataset.py Outdated Show resolved Hide resolved

albertvillanova added 3 commits October 13, 2021 11:45

Add type hint to cast_column

98bcbc8

Implement cast_column for DatasetDict

d29cfee

Add cast_column to API docs

556a7a1

albertvillanova added 2 commits October 13, 2021 12:06

Add example of cast_column to docs how-to guide

50cbe21

Fix Sphinx role name

87943c2

albertvillanova merged commit 92a3ee5 into huggingface:master Oct 13, 2021

patrickvonplaten mentioned this pull request Oct 14, 2021

[Audio datasets] Adapting all audio datasets #3081

Merged

mariosasko mentioned this pull request Oct 25, 2021

Add Image feature #3163

Merged

1 task

albertvillanova mentioned this pull request Dec 23, 2021

Iterating over a vision dataset doesn't decode the images #3473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Audio feature #2324

Create Audio feature #2324

albertvillanova commented May 5, 2021 •

edited

albertvillanova commented Oct 12, 2021 •

edited

albertvillanova commented Oct 12, 2021 •

edited

anton-l commented Oct 13, 2021

lhoestq left a comment

patrickvonplaten commented Oct 13, 2021

Create Audio feature #2324

Create Audio feature #2324

Conversation

albertvillanova commented May 5, 2021 • edited

Requirements Specification

albertvillanova commented Oct 12, 2021 • edited

albertvillanova commented Oct 12, 2021 • edited

anton-l commented Oct 13, 2021

lhoestq left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Oct 13, 2021

albertvillanova commented May 5, 2021 •

edited

albertvillanova commented Oct 12, 2021 •

edited

albertvillanova commented Oct 12, 2021 •

edited