Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory spike at the end of transcription #53

Closed
vackosar opened this issue Mar 18, 2023 · 1 comment
Closed

Memory spike at the end of transcription #53

vackosar opened this issue Mar 18, 2023 · 1 comment

Comments

@vackosar
Copy link

Hello, great work! I experimented a bit with this and came across an anomaly. While transcribing George Bush Columbia talk, the memory stays around 2.5GB, but then I encounter a sudden spike beyond 3.5GB in VRAM in case of GPU usage and RAM in case of CPU usage, when using int8, when all spoken text was already out of the model. Is it due to silence at the end or some additional operations? Would you know why this happens and how to prevent this?

!wget https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
model_path = "whisper-large-v2-ct2/"
model = WhisperModel(model_path, device="cuda", compute_type="int8",)
segments, info = model.transcribe("./George_W_Bush_Columbia_FINAL.ogg", beam_size=1, language="en", condition_on_previous_text=False)

The output:

...
[183.06s -> 185.82s]  are safely home.
[185.82s -> 192.62s]  May God bless the grieving families and may God continue to bless America.
['transcribe /home/ubuntu/src/faster-whisper/run.py:10', 'time_delta', 44.965]
Traceback (most recent call last):
  File "/home/ubuntu/src/faster-whisper/run.py", line 16, in <module>
    for segment in segments:
  File "/home/ubuntu/src/faster-whisper/faster_whisper/transcribe.py", line 285, in generate_segments
    result, avg_log_prob, temperature = self.generate_with_fallback(
  File "/home/ubuntu/src/faster-whisper/faster_whisper/transcribe.py", line 461, in generate_with_fallback
    result = self.model.generate(
RuntimeError: CUDA failed with error out of memory
@guillaumekln
Copy link
Contributor

guillaumekln commented Mar 18, 2023

Hi,

The last segment triggers the "temperature fallback" which runs with best_of=5 by default. Since you are using beam_size=1 you may want to use the same value for best_of:

segments, info = model.transcribe(
    "./George_W_Bush_Columbia_FINAL.ogg",
    beam_size=1,
    best_of=1,
    language="en",
    condition_on_previous_text=False,
)

Alternatively, you can also disable the temperature fallback entirely with temperature=0:

segments, info = model.transcribe(
    "./George_W_Bush_Columbia_FINAL.ogg",
    beam_size=1,
    temperature=0,
    language="en",
    condition_on_previous_text=False,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants