You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, great work! I experimented a bit with this and came across an anomaly. While transcribing George Bush Columbia talk, the memory stays around 2.5GB, but then I encounter a sudden spike beyond 3.5GB in VRAM in case of GPU usage and RAM in case of CPU usage, when using int8, when all spoken text was already out of the model. Is it due to silence at the end or some additional operations? Would you know why this happens and how to prevent this?
!wget https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
model_path = "whisper-large-v2-ct2/"
model = WhisperModel(model_path, device="cuda", compute_type="int8",)
segments, info = model.transcribe("./George_W_Bush_Columbia_FINAL.ogg", beam_size=1, language="en", condition_on_previous_text=False)
The output:
...
[183.06s -> 185.82s] are safely home.
[185.82s -> 192.62s] May God bless the grieving families and may God continue to bless America.
['transcribe /home/ubuntu/src/faster-whisper/run.py:10', 'time_delta', 44.965]
Traceback (most recent call last):
File "/home/ubuntu/src/faster-whisper/run.py", line 16, in <module>
for segment in segments:
File "/home/ubuntu/src/faster-whisper/faster_whisper/transcribe.py", line 285, in generate_segments
result, avg_log_prob, temperature = self.generate_with_fallback(
File "/home/ubuntu/src/faster-whisper/faster_whisper/transcribe.py", line 461, in generate_with_fallback
result = self.model.generate(
RuntimeError: CUDA failed with error out of memory
The text was updated successfully, but these errors were encountered:
The last segment triggers the "temperature fallback" which runs with best_of=5 by default. Since you are using beam_size=1 you may want to use the same value for best_of:
Hello, great work! I experimented a bit with this and came across an anomaly. While transcribing George Bush Columbia talk, the memory stays around 2.5GB, but then I encounter a sudden spike beyond 3.5GB in VRAM in case of GPU usage and RAM in case of CPU usage, when using int8, when all spoken text was already out of the model. Is it due to silence at the end or some additional operations? Would you know why this happens and how to prevent this?
The output:
The text was updated successfully, but these errors were encountered: