Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Audio feature #2324

Merged
merged 180 commits into from Oct 13, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
180 commits
Select commit Hold shift + click to select a range
c27da03
Refactor features as a package
albertvillanova May 5, 2021
4d99ff9
Move translation features into own module
albertvillanova May 5, 2021
e8a7457
Create Audio feature
albertvillanova May 5, 2021
5110c68
Fix style
albertvillanova May 5, 2021
f69a7a9
Make late import for audio library
albertvillanova May 5, 2021
fcdb110
Fix imports
albertvillanova May 5, 2021
e6e8ff1
Ignore flake8 errors
albertvillanova May 5, 2021
2d2adcd
Fix imports
albertvillanova May 5, 2021
e7d8904
Fix imports
albertvillanova May 5, 2021
585bba7
Fix imports of private classes/functions
albertvillanova May 5, 2021
827b263
Fix imports of private classes/functions
albertvillanova May 5, 2021
dc5b582
Fix test patch
albertvillanova May 7, 2021
d8e7441
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova May 7, 2021
76cb67d
Mimic features package for tests
albertvillanova Jun 21, 2021
7b72de8
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Jun 21, 2021
e0b77c1
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Jun 25, 2021
3c28a61
Add required Feature attributes to Audio
albertvillanova Jun 25, 2021
496a1fb
Implement __call__
albertvillanova Jun 25, 2021
158c545
Add coding_format attribute
albertvillanova Jun 25, 2021
f13b7e8
Validate audio coding format
albertvillanova Jun 28, 2021
7b658f9
Add Audio docs
albertvillanova Jun 28, 2021
22f4131
Test Audio instantiation
albertvillanova Jun 28, 2021
e95d3f9
Add audio test data
albertvillanova Jun 28, 2021
100ceb1
Add Audio dependency requirements
albertvillanova Jun 28, 2021
1d29232
Fix Audio call
albertvillanova Jun 28, 2021
e51a8a9
Test Audio encode example
albertvillanova Jun 28, 2021
b16bc65
Fix style
albertvillanova Jun 28, 2021
977fa33
Add audio dependency requirements to tests
albertvillanova Jun 28, 2021
e0672e4
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Jun 28, 2021
e5d87a7
Skip test for linux
albertvillanova Jun 28, 2021
e02e387
Fix import of private function
albertvillanova Jun 28, 2021
340989a
Return 1D array
albertvillanova Jun 29, 2021
bd98f78
Encode example for Audio feature
albertvillanova Jun 29, 2021
4230843
Test dataset with Audio feature
albertvillanova Jun 29, 2021
bfe7517
Replace Audio encode_example with decode_example
albertvillanova Jun 29, 2021
1a230bd
Implement Features decode_example
albertvillanova Jun 29, 2021
c153e6d
Fix Audio __call__
albertvillanova Jun 29, 2021
13a1341
Test decoding of dataset with Audio feature
albertvillanova Jun 29, 2021
9501298
Replace soundfile with librosa
albertvillanova Aug 18, 2021
e9e6879
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Aug 18, 2021
d25b1d7
Remove array reshape
albertvillanova Aug 18, 2021
34d5b7b
Fix tests
albertvillanova Aug 18, 2021
85fa108
Flatten decode_nested_example
albertvillanova Aug 18, 2021
1342eba
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Sep 2, 2021
401327f
Implement Features.decode_batch
albertvillanova Sep 8, 2021
bd312d1
Decode features in _getitem
albertvillanova Sep 8, 2021
a848928
Refactor test_dataset_with_audio_feature
albertvillanova Sep 8, 2021
f91cf6b
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Sep 8, 2021
5dc0ba8
Fix style
albertvillanova Sep 8, 2021
42ea474
Implement PythonFeaturesDecoder
albertvillanova Sep 13, 2021
8afdb25
Compose Formatter with PythonFeaturesDecoder
albertvillanova Sep 13, 2021
e4eef07
Refactor PythonFormatter.format_row to use PythonFeaturesDecoder
albertvillanova Sep 13, 2021
80a3d06
Pass features to instantiate formatter
albertvillanova Sep 13, 2021
c314ec5
Fix style
albertvillanova Sep 13, 2021
a122552
Refactor decode_nested_example with default for the rest of features
albertvillanova Sep 13, 2021
c0de3aa
Fix missing pass features to instantiate formatter
albertvillanova Sep 13, 2021
b3214e1
Revert flatten of decode_nested_example to return nested examples
albertvillanova Sep 13, 2021
42426e8
Fix test_dataset_with_audio_feature for nested output
albertvillanova Sep 13, 2021
2067083
Return also audio path in decode_example
albertvillanova Sep 13, 2021
af5fc26
Add path to audio tests
albertvillanova Sep 13, 2021
1499f3e
Fix Formatter and NumpyFormatter init
albertvillanova Sep 13, 2021
23973f2
Fix format_table with python_formatter without features
albertvillanova Sep 13, 2021
56e0b7d
Fix PythonFeaturesDecoder.decode_row only if features
albertvillanova Sep 13, 2021
a8d836e
Fix all Formatter subclasses init
albertvillanova Sep 13, 2021
f5b1d13
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Sep 13, 2021
caf2153
Fix typo
albertvillanova Sep 13, 2021
6c1ea4b
Test batch in dataset with Audio feature
albertvillanova Sep 14, 2021
5ec9031
Implement PythonFeaturesDecoder.decode_batch
albertvillanova Sep 14, 2021
ead8001
Use PythonFeaturesDecoder.decode_batch in PythonFormatter.format_batch
albertvillanova Sep 14, 2021
1f1f730
Add docstrings
albertvillanova Sep 14, 2021
8930573
Add mono attribute to Audio feature
albertvillanova Sep 15, 2021
c4e7905
Test formatted dataset with Audio feature
albertvillanova Sep 21, 2021
4623cdd
Implement ArrowFeaturesDecoder
albertvillanova Sep 21, 2021
a7071cc
Compose Formatter with ArrowFeaturesDecoder
albertvillanova Sep 21, 2021
90c5873
Make NumpyFormatter.format_row use SimpleArrowExtractor and ArrowFeat…
albertvillanova Sep 21, 2021
217782c
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Sep 21, 2021
8587ea1
Fix decode_nested_example to decode only keys present in example
albertvillanova Sep 21, 2021
6a916e4
Refactor NumpyFormatter.format_row
albertvillanova Sep 21, 2021
b648074
Test pandas formatted dataset with Audio feature
albertvillanova Sep 21, 2021
5f274e1
Implement PandasFeaturesDecoder
albertvillanova Sep 21, 2021
7dac12a
Compose Formatter with PandasFeaturesDecoder
albertvillanova Sep 21, 2021
7af69e6
Make PandasFormatter.format_row use PandasFeaturesDecoder
albertvillanova Sep 21, 2021
4e34d21
Fix PandasFeaturesDecoder.decode_row for None features and keys not i…
albertvillanova Sep 21, 2021
77f67a8
Fix PandasFeaturesDecoder.decode_row to call transform only for featu…
albertvillanova Sep 21, 2021
2d87e96
Fix unused imports
albertvillanova Sep 21, 2021
8ede53b
Remove ArrowFeaturesDecoder and _nest
albertvillanova Sep 21, 2021
53e18bd
Fix typo
albertvillanova Sep 21, 2021
fdb050e
Remove test skip if linux
albertvillanova Sep 21, 2021
329abc5
Revert "Remove test skip if linux"
albertvillanova Sep 21, 2021
da7edc5
Fix PandasFeaturesDecoder.decode_row to transform and assign transfor…
albertvillanova Sep 22, 2021
149dc51
Make Audio instances hashable
albertvillanova Sep 22, 2021
109c186
Make Audio.decode_example return original value if dependencies not i…
albertvillanova Sep 22, 2021
0c66689
Fix style
albertvillanova Sep 22, 2021
b3939b1
Test audio resampling
albertvillanova Sep 22, 2021
f2e29ac
Test Audio feature decode mp3
albertvillanova Sep 22, 2021
9c09f4c
Refactor Audio.decode_example to support mp3 with torchaudio
albertvillanova Sep 22, 2021
0fd87d7
Fix style
albertvillanova Sep 22, 2021
4c08a52
Fix logic in Audio.decode_example
albertvillanova Sep 23, 2021
7c16d8a
Require torchaudio dependency for tests
albertvillanova Sep 23, 2021
fa3e068
Require torch to test audio mp3
albertvillanova Sep 23, 2021
423dbba
Refactor decoding with torchaudio with mono and librosa resampling
albertvillanova Sep 23, 2021
8174fc2
Set sox_io backend when decoding with torchaudio
albertvillanova Sep 23, 2021
5881d9a
Fix test_audio with more specific pytest markers
albertvillanova Sep 23, 2021
7c0bd9d
Fix unused torchaudio.functional
albertvillanova Sep 23, 2021
443d4e0
Fix requirement of sndfile
albertvillanova Sep 24, 2021
b0e345f
Fix require_sox
albertvillanova Sep 24, 2021
6a23bdb
Refactor import of find_spec
albertvillanova Sep 24, 2021
09ac25f
Revert torchaudio resampling using librosa
albertvillanova Sep 24, 2021
0b36596
Simplify torchaudio resampling
albertvillanova Sep 24, 2021
15cea1e
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Sep 24, 2021
799b138
Fix require_sndfile
albertvillanova Sep 24, 2021
e64b30a
Implement conditionally decoding
albertvillanova Sep 24, 2021
22984b0
Implement decoded param in _getitem
albertvillanova Sep 24, 2021
4069c82
Pass decoded=False when iterating in map
albertvillanova Sep 24, 2021
86d33a1
Rename sampling_rate
albertvillanova Sep 24, 2021
2e5dae9
Fix NumpyFormatter
albertvillanova Sep 24, 2021
f58f802
Test dataset with not decoded Audio feature
albertvillanova Sep 24, 2021
3597d44
Test map dataset with Audio feature is decoded
albertvillanova Sep 27, 2021
d51f1a6
Use lazy dict to decorate arg of mapped function
albertvillanova Sep 27, 2021
e9188ba
Fix decorator in map
albertvillanova Sep 27, 2021
e108697
Fix sampling_rate by torchaudio
albertvillanova Sep 27, 2021
13ff8aa
Fix test decoding mp3
albertvillanova Sep 27, 2021
bacf5d1
Add GitHub Action for audio CI tests
albertvillanova Sep 27, 2021
3169d0f
Remove audio dependencies from test dependencies
albertvillanova Sep 27, 2021
1548bca
Comment unused audio pytest marker
albertvillanova Sep 27, 2021
3ce50cd
Refactor LazyDict
albertvillanova Sep 27, 2021
9d7c3e8
Fix tests
albertvillanova Sep 27, 2021
ac7ef25
Run audio tests in parallel
albertvillanova Sep 28, 2021
46cdebd
Rename audio test job
albertvillanova Sep 28, 2021
4a76773
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Sep 28, 2021
902d173
Call parallel test on Linux as on Windows
albertvillanova Sep 28, 2021
02b1572
Implement Audio decode_batch
albertvillanova Sep 28, 2021
b7cf206
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Sep 28, 2021
9a1b63d
Test batched map dataset with Audio feature is decoded
albertvillanova Sep 29, 2021
835a4d4
Fix _map_single to avoid decoding of batches
albertvillanova Sep 29, 2021
87cb4fd
Implement Example and Batch from LazyDict
albertvillanova Sep 29, 2021
0c46754
Decorate mapped function with Example/Batch lazy dict
albertvillanova Sep 29, 2021
137b29e
Fix PythonFormatter for batch conditional decoding
albertvillanova Sep 29, 2021
58e6a01
Fix typo
albertvillanova Sep 29, 2021
c68505a
Refactor iter
albertvillanova Sep 29, 2021
955bd01
Fix _map_single for batched
albertvillanova Sep 29, 2021
07b6bc5
Refactor _get_item
albertvillanova Sep 29, 2021
0f80e6e
Fix style
albertvillanova Sep 29, 2021
cd5311b
Refactor resampling using torchaudio
albertvillanova Sep 29, 2021
98d962d
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Oct 6, 2021
c26a60e
Add docstring and comments to decorate function
albertvillanova Oct 6, 2021
c22cb6b
Remove comment
albertvillanova Oct 6, 2021
c0d6eec
Test batch numpy formatted dataset with Audio feature
albertvillanova Oct 6, 2021
efba528
Make NumpyFormatter.format_batch use decoder
albertvillanova Oct 6, 2021
71891f0
Test batch pandas formatted dataset with Audio feature
albertvillanova Oct 6, 2021
57720c7
Implement PandasFeaturesDecoder.decode_batch
albertvillanova Oct 6, 2021
da5d138
Make PandasFormatter.format_batch use decoder
albertvillanova Oct 6, 2021
a8ffee8
Fix style
albertvillanova Oct 6, 2021
ec32d2f
Make CustomFormatter use decoder
albertvillanova Oct 6, 2021
3aee1f1
Change Features.decode_example/batch
albertvillanova Oct 7, 2021
f5bd62a
Fix CustomFormatter
albertvillanova Oct 7, 2021
0ce0832
Test resampling when loading a dataset
albertvillanova Oct 8, 2021
e85c61b
Test resampling after loading a dataset
albertvillanova Oct 8, 2021
53d6d73
Implement cast a column to a feature for decoding w/o caching
albertvillanova Oct 8, 2021
633ef09
Make cast_column call cast if not decoding
albertvillanova Oct 8, 2021
9c56ee9
Test decode column in dataset with Audio feature
albertvillanova Oct 12, 2021
2afa6b6
Implement Features.decode_column
albertvillanova Oct 12, 2021
7088e17
Implement PythonFeaturesDecoder.decode_column
albertvillanova Oct 12, 2021
4dd3160
Make PythonFormatter.format_column use PythonFeaturesDecoder
albertvillanova Oct 12, 2021
377cfcb
Add type hints and rename column to column_name
albertvillanova Oct 12, 2021
c0515f2
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Oct 12, 2021
a4a6f25
Fix typo
albertvillanova Oct 12, 2021
7854ea9
Update fingerprint within cast_column
albertvillanova Oct 12, 2021
bf94ac9
Improve docstring
albertvillanova Oct 12, 2021
eb7b8c5
Test decode column in formatted dataset with Audio feature
albertvillanova Oct 12, 2021
0678af2
Make NumpyFormatter.format_column use PythonFeaturesDecoder
albertvillanova Oct 12, 2021
b1ed47e
Implement PandasFeaturesDecoder.decode_column
albertvillanova Oct 12, 2021
a05c9a5
Make PandasFormatter.format_column use PandasFeaturesDecoder
albertvillanova Oct 12, 2021
97bf4ed
Rename variable in PandasFeaturesDecoder.decode_row
albertvillanova Oct 12, 2021
583b24d
Merge remote-tracking branch 'upstream/master' into features-audio
albertvillanova Oct 13, 2021
98bcbc8
Add type hint to cast_column
albertvillanova Oct 13, 2021
d29cfee
Implement cast_column for DatasetDict
albertvillanova Oct 13, 2021
556a7a1
Add cast_column to API docs
albertvillanova Oct 13, 2021
50cbe21
Add example of cast_column to docs how-to guide
albertvillanova Oct 13, 2021
87943c2
Fix Sphinx role name
albertvillanova Oct 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/datasets/arrow_dataset.py
Expand Up @@ -1332,6 +1332,19 @@ def cast(
dataset = dataset.with_format(**format)
return dataset

def cast_column(self, column, feature):
"""Cast column to feature for decoding.

Args:
column: Column name.
feature: Target feature.

Returns:
:class:`Dataset`
"""
self.features[column] = feature
return self

@deprecated(help_message="Use Dataset.remove_columns instead.")
@fingerprint_transform(inplace=True)
def remove_columns_(self, column_names: Union[str, List[str]]):
Expand Down
24 changes: 24 additions & 0 deletions tests/features/test_audio.py
Expand Up @@ -110,6 +110,30 @@ def test_resampling_at_loading_dataset_with_audio_feature(shared_datadir):
assert batch["audio"][0]["sampling_rate"] == 16000


@require_sndfile
albertvillanova marked this conversation as resolved.
Show resolved Hide resolved
def test_resampling_after_loading_dataset_with_audio_feature(shared_datadir):
audio_path = str(shared_datadir / "test_audio_44100.wav")
data = {"audio": [audio_path]}
features = Features({"audio": Audio()})
dset = Dataset.from_dict(data, features=features)
item = dset[0]
assert item["audio"]["sampling_rate"] == 44100
dset = dset.cast_column("audio", Audio(sampling_rate=16000))
item = dset[0]
assert item.keys() == {"audio"}
assert item["audio"].keys() == {"path", "array", "sampling_rate"}
assert item["audio"]["path"] == audio_path
assert item["audio"]["array"].shape == (73401,)
assert item["audio"]["sampling_rate"] == 16000
batch = dset[:1]
assert batch.keys() == {"audio"}
assert len(batch["audio"]) == 1
assert batch["audio"][0].keys() == {"path", "array", "sampling_rate"}
assert batch["audio"][0]["path"] == audio_path
assert batch["audio"][0]["array"].shape == (73401,)
assert batch["audio"][0]["sampling_rate"] == 16000


@require_sndfile
def test_dataset_with_audio_feature_map_is_not_decoded(shared_datadir):
audio_path = str(shared_datadir / "test_audio_44100.wav")
Expand Down