New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wave2vec OOM while doing inference #3359
Comments
Seems to be an issue of using the Huggingface implmentaton. I just ran your example on an XLSR model on a CPU server and didn't have any problems at all. Except for the output as it is a German model and not very good with Indian English :-) |
Hy, thanks for the help @olafthiele would you please tell me the did you usued hugging face for XLSR model? |
Ah, sorry, should have made that clearer. We don't use Huggingface's implementation. Just pure wav2vec 2.0. So I guess it is something within their code ... |
@olafthiele would you please share the snippet for doing the inference using fireseq's wev2vec 2.0 ? thanks . |
Inference is built upon this great repo by @mailong25. Don't know whether that is compatible though. |
Hey @olafthiele - make sure wrap your code into a import soundfile as sf
import librosa
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer
tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-960h")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
input_audio, _ = librosa.load(filename,
sr=16000)
input_values = tokenizer(input_audio, return_tensors="pt").input_values
with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
text = tokenizer.batch_decode(predicted_ids)[0] |
Here a colab to run it succesfully: https://colab.research.google.com/drive/1m54QPo07ptp_GRdTLztuc0OdCHk7j28C?usp=sharing Actually, I forgot to put the |
Thanks @patrickvonplaten , that should help @abhinavsp0730 |
Thanks, @patrickvonplaten for the help I'm closing this issue as it has been resolved. |
❓ Questions and Help
Before asking:
What is your question?
When I'm trying to do inference on a audio of length of around 52 sec , I'm getting this error
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 326730288 bytes. Error code 12 (Cannot allocate memory)
the inference is going to take almost 326.730288 MB. And when I ranfree -h
I'm having this much of free space.Would you please help me regarding this issue.
@patrickvonplaten .
code
sample audio file in wave format
https://github.com/abhinavsp0730/video-to-text-ap/blob/main/sample_audio_1.wav
The text was updated successfully, but these errors were encountered: