Memory spike at the end of transcription #53

vackosar · 2023-03-18T07:18:10Z

Hello, great work! I experimented a bit with this and came across an anomaly. While transcribing George Bush Columbia talk, the memory stays around 2.5GB, but then I encounter a sudden spike beyond 3.5GB in VRAM in case of GPU usage and RAM in case of CPU usage, when using int8, when all spoken text was already out of the model. Is it due to silence at the end or some additional operations? Would you know why this happens and how to prevent this?

!wget https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
model_path = "whisper-large-v2-ct2/"
model = WhisperModel(model_path, device="cuda", compute_type="int8",)
segments, info = model.transcribe("./George_W_Bush_Columbia_FINAL.ogg", beam_size=1, language="en", condition_on_previous_text=False)

The output:

...
[183.06s -> 185.82s]  are safely home.
[185.82s -> 192.62s]  May God bless the grieving families and may God continue to bless America.
['transcribe /home/ubuntu/src/faster-whisper/run.py:10', 'time_delta', 44.965]
Traceback (most recent call last):
  File "/home/ubuntu/src/faster-whisper/run.py", line 16, in <module>
    for segment in segments:
  File "/home/ubuntu/src/faster-whisper/faster_whisper/transcribe.py", line 285, in generate_segments
    result, avg_log_prob, temperature = self.generate_with_fallback(
  File "/home/ubuntu/src/faster-whisper/faster_whisper/transcribe.py", line 461, in generate_with_fallback
    result = self.model.generate(
RuntimeError: CUDA failed with error out of memory

The text was updated successfully, but these errors were encountered:

guillaumekln · 2023-03-18T08:20:09Z

Hi,

The last segment triggers the "temperature fallback" which runs with best_of=5 by default. Since you are using beam_size=1 you may want to use the same value for best_of:

segments, info = model.transcribe(
    "./George_W_Bush_Columbia_FINAL.ogg",
    beam_size=1,
    best_of=1,
    language="en",
    condition_on_previous_text=False,
)

Alternatively, you can also disable the temperature fallback entirely with temperature=0:

segments, info = model.transcribe(
    "./George_W_Bush_Columbia_FINAL.ogg",
    beam_size=1,
    temperature=0,
    language="en",
    condition_on_previous_text=False,
)

vackosar closed this as completed Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory spike at the end of transcription #53

Memory spike at the end of transcription #53

vackosar commented Mar 18, 2023

guillaumekln commented Mar 18, 2023 •

edited

Memory spike at the end of transcription #53

Memory spike at the end of transcription #53

Comments

vackosar commented Mar 18, 2023

guillaumekln commented Mar 18, 2023 • edited

guillaumekln commented Mar 18, 2023 •

edited