Skip to content

can't allocate memory error with wav2vec2 #10366

@kleekaai

Description

@kleekaai

I am trying out the wav2vec2 model for ASR from the huggingface library. Here, I am passing a 7 min(~15 MB file) long wav file having a conversation(english) to the wav2vec2 model. I am getting "can't allocate memory" error. I found that the model uses all 64 GB of the available RAM. Can anyone help with this.

  • transformers version: 4.3.2
  • Platform: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17
  • Python version: 3.8.3
  • PyTorch version (GPU?): 1.7.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: (NA)
  • Using distributed or parallel set-up in script?: (NA)

Code

import os
import librosa
import soundfile as sf
from pydub import AudioSegment

def convert_audio_segment(fp, upload_dir_path):
    """Convert audio file"""
    
    USER_UPLOAD_DIR = upload_dir_path
    formats_to_convert = ['.m4a']
    dirpath = os.path.abspath(USER_UPLOAD_DIR)

    if fp.endswith(tuple(formats_to_convert)):

        (path, file_extension) = os.path.splitext(fp)
        file_extension_final = file_extension.replace('.', '')
        file_handle = ''

        try:
            track = AudioSegment.from_file(fp,
                    file_extension_final)
            print("track", track)
            wav_path = fp.replace(file_extension_final, 'wav')            
            file_handle = track.export(wav_path, format='wav')
        except Exception:
            print("ERROR CONVERTING " + str(fp))
        return file_handle
    else:
        print("No file format conversion required " + str(fp))
        return fp

def load_wav2vec_100h_model():
    tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-100h")
    model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-100h")    
    return tokenizer, model

def correct_sentence(input_text):
    sentences = nltk.sent_tokenize(input_text)
    return (' '.join([s.replace(s[0],s[0].capitalize(),1) for s in sentences]))

def asr_transcript(tokenizer, model, input_file):
   
    speech, fs = sf.read(input_file)

    if len(speech.shape) > 1: 
        speech = speech[:,0] + speech[:,1]

    if fs != 16000:
        speech = librosa.resample(speech, fs, 16000)

    input_values = tokenizer(speech, return_tensors="pt").input_values
    logits = model(input_values).logits
    
    predicted_ids = torch.argmax(logits, dim=-1)
    
    transcription = tokenizer.decode(predicted_ids[0])

    return correct_sentence(transcription.lower())

if __name__ == "__main__":


    tokenizer_100h, model_100h = load_wav2vec_100h_model()
    wav_input = 'Recording_biweu.wav'
    fp = wav_input

    processed_file = convert_audio_segment(str(fp), str(data_dir))
    text = asr_transcript(tokenizer_100h,model_100h,processed_file)
    print(text)

I am adding more details about my wav file here

General
Complete name                            : Recording_biweu.wav
Format                                   : Wave
File size                                : 13.8 MiB
Duration                                 : 7 min 30 s
Overall bit rate mode                    : Constant
Overall bit rate                         : 256 kb/s
Track name                               : Recording_biweu
Recorded date                            : 2021
Writing application                      : Lavf57.83.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 7 min 30 s
Bit rate mode                            : Constant
Bit rate                                 : 256 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 16.0 kHz
Bit depth                                : 16 bits
Stream size                              : 13.8 MiB (100%)

Error

Some weights of the model checkpoint at facebook/wav2vec2-base-100h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.mask_time_emb_vector']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "asr_wav2vec2.py", line 130, in <module>
    text = asr_transcript(tokenizer_100h,model_100h,processed_file)
  File "asr_wav2vec2.py", line 96, in asr_transcript
    logits = model(input_values).logits
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 795, in forward
    outputs = self.wav2vec2(
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 646, in forward
    encoder_outputs = self.encoder(
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 457, in forward
    hidden_states, attn_weights = layer(hidden_states, output_attentions=output_attentions)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 392, in forward
    hidden_states, attn_weights, _ = self.attention(hidden_states, output_attentions=output_attentions)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/joel/pyvenv/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 286, in forward
    attn_weights = torch.bmm(query_states, key_states.transpose(1, 2))
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 24373495488 bytes. Error code 12 (Cannot allocate memory)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions