can't allocate memory error with wav2vec2

I am trying out the wav2vec2 model for ASR from the huggingface library. Here, I am passing a 7 min(~15 MB file) long wav file having a conversation(english) to the wav2vec2 model. I am getting "can't allocate memory" error. I found that the model uses all 64 GB of the available RAM. Can anyone help with this.


- `transformers` version: 4.3.2
- Platform: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.3
- PyTorch version (GPU?): 1.7.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: (NA)
- Using distributed or parallel set-up in script?: (NA)


Code
```
import os
import librosa
import soundfile as sf
from pydub import AudioSegment

def convert_audio_segment(fp, upload_dir_path):
    """Convert audio file"""
    
    USER_UPLOAD_DIR = upload_dir_path
    formats_to_convert = ['.m4a']
    dirpath = os.path.abspath(USER_UPLOAD_DIR)

    if fp.endswith(tuple(formats_to_convert)):

        (path, file_extension) = os.path.splitext(fp)
        file_extension_final = file_extension.replace('.', '')
        file_handle = ''

        try:
            track = AudioSegment.from_file(fp,
                    file_extension_final)
            print("track", track)
            wav_path = fp.replace(file_extension_final, 'wav')            
            file_handle = track.export(wav_path, format='wav')
        except Exception:
            print("ERROR CONVERTING " + str(fp))
        return file_handle
    else:
        print("No file format conversion required " + str(fp))
        return fp

def load_wav2vec_100h_model():
    tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-100h")
    model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-100h")    
    return tokenizer, model

def correct_sentence(input_text):
    sentences = nltk.sent_tokenize(input_text)
    return (' '.join([s.replace(s[0],s[0].capitalize(),1) for s in sentences]))

def asr_transcript(tokenizer, model, input_file):
   
    speech, fs = sf.read(input_file)

    if len(speech.shape) > 1: 
        speech = speech[:,0] + speech[:,1]

    if fs != 16000:
        speech = librosa.resample(speech, fs, 16000)

    input_values = tokenizer(speech, return_tensors="pt").input_values
    logits = model(input_values).logits
    
    predicted_ids = torch.argmax(logits, dim=-1)
    
    transcription = tokenizer.decode(predicted_ids[0])

    return correct_sentence(transcription.lower())

if __name__ == "__main__":


    tokenizer_100h, model_100h = load_wav2vec_100h_model()
    wav_input = 'Recording_biweu.wav'
    fp = wav_input

    processed_file = convert_audio_segment(str(fp), str(data_dir))
    text = asr_transcript(tokenizer_100h,model_100h,processed_file)
    print(text)
```
I am adding more details about my wav file here
```
General
Complete name                            : Recording_biweu.wav
Format                                   : Wave
File size                                : 13.8 MiB
Duration                                 : 7 min 30 s
Overall bit rate mode                    : Constant
Overall bit rate                         : 256 kb/s
Track name                               : Recording_biweu
Recorded date                            : 2021
Writing application                      : Lavf57.83.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 7 min 30 s
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 16.0 kHz
Bit depth                                : 16 bits
Stream size                              : 13.8 MiB (100%)
```

Error
```
Some weights of the model checkpoint at facebook/wav2vec2-base-100h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.mask_time_emb_vector']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "asr_wav2vec2.py", line 130, in <module>
    text = asr_transcript(tokenizer_100h,model_100h,processed_file)
  File "asr_wav2vec2.py", line 96, in asr_transcript
    logits = model(input_values).logits
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 795, in forward
    outputs = self.wav2vec2(
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 646, in forward
    encoder_outputs = self.encoder(
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 457, in forward
    hidden_states, attn_weights = layer(hidden_states, output_attentions=output_attentions)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 392, in forward
    hidden_states, attn_weights, _ = self.attention(hidden_states, output_attentions=output_attentions)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 286, in forward
    attn_weights = torch.bmm(query_states, key_states.transpose(1, 2))
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 24373495488 bytes. Error code 12 (Cannot allocate memory)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

can't allocate memory error with wav2vec2 #10366

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

can't allocate memory error with wav2vec2 #10366

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions