Skip to content
Discussion options

You must be logged in to vote

Because the command line needs to launch a fresh new process which will then read the model from the disk (it also does SHA256 hash check which may take a few seconds). Loading the model to the GPU should be faster but the CUDA runtime also takes some time to initialize. If you have multiple files, running whisper *.mp3 will load the model only once and run faster than running the commands for each file. (See #153)

In Python you can reuse the model returned by load_model() to avoid the startup delay, and to overkill, keeping an API server running like in #132 will allow you to send request from either the command line or Python without the loading delay.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@vinniyo
Comment options

Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants