-
|
I am using t4 gpu to run whisper medium en the code sometimes gives different results at different inference also, the time is also variable that is making the benchmark process difficult also, any way to decrease inference time |
Beta Was this translation helpful? Give feedback.
Answered by
jongwook
Jan 31, 2023
Replies: 1 comment
-
|
Please see #81 for the nondeterminism. Some avenues for optimization includes quantization and CUDA kernel fusion, which has been discussed in #454 and #115, but are not yet included in this repo. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
jongwook
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please see #81 for the nondeterminism. Some avenues for optimization includes quantization and CUDA kernel fusion, which has been discussed in #454 and #115, but are not yet included in this repo.