SEW-D

Overview

SEW-D (Squeezed and Efficient Wav2Vec with Disentangled attention) was proposed in Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

The abstract from the paper is the following:

This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes.

This model was contributed by anton-l.

Usage tips

SEW-D is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
SEWDForCTC is fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded using [Wav2Vec2CTCTokenizer].

Resources

Audio classification task guide
Automatic speech recognition task guide

SEWDConfig

[[autodoc]] SEWDConfig

SEWDModel

[[autodoc]] SEWDModel - forward

SEWDForCTC

[[autodoc]] SEWDForCTC - forward

SEWDForSequenceClassification

[[autodoc]] SEWDForSequenceClassification - forward

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sew-d.md

sew-d.md

SEW-D

Overview

Usage tips

Resources

SEWDConfig

SEWDModel

SEWDForCTC

SEWDForSequenceClassification

Files

sew-d.md

Latest commit

History

sew-d.md

File metadata and controls

SEW-D

Overview

Usage tips

Resources

SEWDConfig

SEWDModel

SEWDForCTC

SEWDForSequenceClassification