[SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy #13650

anton-l · 2021-09-20T12:16:58Z

What does this PR do?

Resolves #13539

Since speech models universally use Numpy float32 arrays as input features (standard way of representing waveforms), it was decided to rewrite SequenceFeatureExtractor from pure python lists (akin to traditional tokenizers) to numpy arrays. It will also help with solving some inconsistent normalization issues (#13538, #13585) due to float->np.float32 conversions.

The feature extractor itself is still dtype-agnostic (can pad np.float64 in the future if needed), while the model-specific feature extractors were updated to only work with np.float32

anton-l · 2021-09-20T12:19:29Z

src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py

            x = np.subtract(x, mean)
        if normalize_vars:
-            var = square_sums / x[:input_length].shape[0] - mean ** 2
-            std = np.sqrt(np.maximum(var, 1e-10))
+            std = x[:input_length].std(axis=0)


Switched this logic to pure numpy to squeeze out a bit more precision when working with np.float32 instead of casted float->np.float64

anton-l · 2021-09-20T12:20:11Z

src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py

-        if is_batched and not isinstance(raw_speech[0], np.ndarray):
-            raw_speech = [np.asarray(speech) for speech in raw_speech]
+        if is_batched:
+            raw_speech = [np.asarray(speech, dtype=np.float32) for speech in raw_speech]


Forcing float32 here for consistency with other feature extractors

sgugger

Thanks for fixing!

src/transformers/feature_extraction_sequence_utils.py

patil-suraj

LGTM!

anton-l · 2021-09-20T14:07:19Z

tests/test_modeling_speech_to_text.py

@@ -724,7 +724,7 @@ def map_to_array(batch):
            return batch

        ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
-        ds = ds.select(range(num_samples)).map(map_to_array)
+        ds = ds.sort("id").select(range(num_samples)).map(map_to_array)


Unrelated to this PR: the newdatasets version re-shuffled this dataset, so sorting is needed for reproducibility.

anton-l · 2021-09-20T14:07:30Z

tests/test_pipelines_automatic_speech_recognition.py

        output = speech_recognizer(waveform)
        self.assertEqual(output, {"text": ""})

        from datasets import load_dataset

-        ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
-        filename = ds[0]["file"]
+        ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation").sort("id")


Unrelated to this PR: the newdatasets version re-shuffled this dataset, so sorting is needed for reproducibility.

anton-l · 2021-09-20T14:09:45Z

tests/test_pipelines_automatic_speech_recognition.py

@@ -42,9 +42,9 @@ def test_torch_small(self):
            tokenizer="facebook/s2t-small-mustc-en-fr-st",
            framework="pt",
        )
-        waveform = np.zeros((34000,))
+        waveform = np.linspace(0, 1, 34000, dtype=np.float32)


Now that the slight variability due to type-casting is eliminated, this needs to be something specific, because the model becomes unstable (non-reproducible in different environments) with all-zero inputs.

src/transformers/feature_extraction_sequence_utils.py

src/transformers/file_utils.py

src/transformers/feature_extraction_sequence_utils.py

tests/test_feature_extraction_speech_to_text.py

patrickvonplaten

Looks good to me! Left a couple of nits:

Think we don't have to use np.int64 -> we are using tf.int32 everywhere
Would be nice to add jax to the to_py_obj and to_numpy_obj function
Do we still need a higher variance for the no-padding test for Speech2Text?

patrickvonplaten · 2021-09-20T18:05:56Z

Also did you notice a speed-up for larger inputs?

LysandreJik

Super cool, LGTM @anton-l!

anton-l · 2021-09-20T19:25:35Z

@patrickvonplaten the benchmarking results are pretty promising:

Input lengths from 8000 to 16000 (1 sec max), batch size 64, feature_extractor only:
- Python: 52.1 ms ± 2.35 ms
- Numpy: 32.1 ms ± 1.13 ms
Input lengths from 8000 to 160000 (10 sec max), batch size 64, feature_extractor only:
- Python: 276 ms ± 950 µs
- Numpy: 68.2 ms ± 491 µs

tests/test_feature_extraction_speech_to_text.py

patrickvonplaten · 2021-09-20T23:09:06Z

Great job @anton-l - feel free to merge!

…numpy (huggingface#13650) * Test np padding * Pass feature extraction tests * Update type hints * Fix flaky integration tests * Try a more stable waveform * Add to_numpy jax support * int32 attention masks * Refactor normalization tests

anton-l added 3 commits September 18, 2021 21:08

Test np padding

1c1c366

Pass feature extraction tests

a710490

Update type hints

f435f7c

anton-l requested review from patrickvonplaten, patil-suraj, LysandreJik and sgugger September 20, 2021 12:16

anton-l commented Sep 20, 2021

View reviewed changes

anton-l changed the title ~~Numpify speech padding~~ [SequenceFeatureExtraction] Rewrite padding logic from pure python to numpy Sep 20, 2021

anton-l changed the title ~~[SequenceFeatureExtraction] Rewrite padding logic from pure python to numpy~~ [SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy Sep 20, 2021

sgugger approved these changes Sep 20, 2021

View reviewed changes

src/transformers/feature_extraction_sequence_utils.py Show resolved Hide resolved

patil-suraj approved these changes Sep 20, 2021

View reviewed changes

Fix flaky integration tests

45c55e4

anton-l commented Sep 20, 2021

View reviewed changes

Try a more stable waveform

ba18d90

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

src/transformers/feature_extraction_sequence_utils.py Show resolved Hide resolved

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

src/transformers/file_utils.py Show resolved Hide resolved

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

src/transformers/feature_extraction_sequence_utils.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

src/transformers/feature_extraction_sequence_utils.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

tests/test_feature_extraction_speech_to_text.py Show resolved Hide resolved

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

tests/test_feature_extraction_speech_to_text.py Outdated Show resolved Hide resolved

patrickvonplaten approved these changes Sep 20, 2021

View reviewed changes

LysandreJik approved these changes Sep 20, 2021

View reviewed changes

anton-l added 2 commits September 20, 2021 21:31

Add to_numpy jax support

63a8843

int32 attention masks

1d5c36c

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

tests/test_feature_extraction_speech_to_text.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Sep 20, 2021

View reviewed changes

tests/test_feature_extraction_speech_to_text.py Outdated Show resolved Hide resolved

Refactor normalization tests

efbf71c

anton-l merged commit 1417978 into huggingface:master Sep 21, 2021

anton-l deleted the numpify-speech-padding branch September 21, 2021 14:16

patrickvonplaten mentioned this pull request Sep 21, 2021

New Wav2Vec2 padding has slightly backward breaking changes #13689

Closed

anton-l mentioned this pull request Sep 22, 2021

[Wav2Vec2FeatureExtractor] Fix extractor.pad() dtype backwards compatibility #13693

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy #13650

[SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy #13650

anton-l commented Sep 20, 2021 •

edited

anton-l Sep 20, 2021

anton-l Sep 20, 2021 •

edited

patrickvonplaten Sep 20, 2021

sgugger left a comment

patil-suraj left a comment

anton-l Sep 20, 2021

anton-l Sep 20, 2021

anton-l Sep 20, 2021

patrickvonplaten left a comment

patrickvonplaten commented Sep 20, 2021

LysandreJik left a comment

anton-l commented Sep 20, 2021

patrickvonplaten commented Sep 20, 2021

[SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy #13650

[SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy #13650

Conversation

anton-l commented Sep 20, 2021 • edited

What does this PR do?

anton-l Sep 20, 2021

Choose a reason for hiding this comment

anton-l Sep 20, 2021 • edited

Choose a reason for hiding this comment

patrickvonplaten Sep 20, 2021

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

patil-suraj left a comment

Choose a reason for hiding this comment

anton-l Sep 20, 2021

Choose a reason for hiding this comment

anton-l Sep 20, 2021

Choose a reason for hiding this comment

anton-l Sep 20, 2021

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Sep 20, 2021

LysandreJik left a comment

Choose a reason for hiding this comment

anton-l commented Sep 20, 2021

patrickvonplaten commented Sep 20, 2021

anton-l commented Sep 20, 2021 •

edited

anton-l Sep 20, 2021 •

edited