Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main : check if input files exist before proceeding #1872

Merged
merged 1 commit into from Feb 19, 2024

Conversation

Theldus
Copy link
Contributor

@Theldus Theldus commented Feb 16, 2024

Until the most recent commit (3d42463), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model.

This PR addresses this by checking in advance whether the input files exist or not.

Until the most recent commit (3d42463), the main.cpp sample file does
not check whether the input files exist or not. Consequently, the
model is loaded first before reporting whether there was a failure or
not when processing a file. In environments with HDD, this can take
about 50 seconds or more, depending on the loaded model.

This commit addresses this issue by checking in advance whether the
input files exist or not.
@bobqianic
Copy link
Collaborator

A more graceful approach might be to first load the file and then the model. This way, we can leverage dr_wav to verify the existence of the file.

@Theldus
Copy link
Contributor Author

Theldus commented Feb 19, 2024

@bobqianic Yes, I do agree with you, but I thought about not doing that in this PR, since main.cpp allows an arbitrary number of wav files, and the user could even run out of memory* before loading the model.

Although perhaps this might not be exactly an issue for whisper.cpp users, I'm not sure.

(*From what I understood, in the current code, each wav is loaded and then processed, one at a time.)

@ggerganov ggerganov merged commit dda4b0e into ggerganov:master Feb 19, 2024
39 checks passed
@Theldus Theldus deleted the check_file_exist branch February 19, 2024 16:47
@tamo tamo mentioned this pull request Feb 21, 2024
@mcdallas
Copy link

This needs to account for reading from stdin with -

Theldus added a commit to Theldus/whisper.cpp that referenced this pull request Feb 22, 2024
In commit dda4b0e of PR ggerganov#1872, I've introduced a check for the
existence of files before loading the model. However, I haven't
considered the case where whisper.cpp might read from stdin as well,
and in such cases, the checks should ignore the "-" argument as it
does not represent a regular file.

Additionally, this commit removes the usage of 'stat()' in favor of
the recently introduced function 'is_file_exist()' in common.cpp from
PR ggerganov#1871.

Apologies for the bug introduced in the previous PR and any
inconvenience it may have caused.
@Theldus
Copy link
Contributor Author

Theldus commented Feb 22, 2024

@mcdallas You're right; I apologize for introducing this bug in my last PR. I did not consider the stdin case.

Although simple, I have created a new PR (#1889) addressing this issue.

ggerganov pushed a commit that referenced this pull request Feb 22, 2024
In commit dda4b0e of PR #1872, I've introduced a check for the
existence of files before loading the model. However, I haven't
considered the case where whisper.cpp might read from stdin as well,
and in such cases, the checks should ignore the "-" argument as it
does not represent a regular file.

Additionally, this commit removes the usage of 'stat()' in favor of
the recently introduced function 'is_file_exist()' in common.cpp from
PR #1871.

Apologies for the bug introduced in the previous PR and any
inconvenience it may have caused.
jiahansu pushed a commit to OOPRY/whisper.cpp that referenced this pull request Apr 17, 2024
Until the most recent commit (3d42463), the main.cpp sample file does
not check whether the input files exist or not. Consequently, the
model is loaded first before reporting whether there was a failure or
not when processing a file. In environments with HDD, this can take
about 50 seconds or more, depending on the loaded model.

This commit addresses this issue by checking in advance whether the
input files exist or not.
jiahansu pushed a commit to OOPRY/whisper.cpp that referenced this pull request Apr 17, 2024
In commit dda4b0e of PR ggerganov#1872, I've introduced a check for the
existence of files before loading the model. However, I haven't
considered the case where whisper.cpp might read from stdin as well,
and in such cases, the checks should ignore the "-" argument as it
does not represent a regular file.

Additionally, this commit removes the usage of 'stat()' in favor of
the recently introduced function 'is_file_exist()' in common.cpp from
PR ggerganov#1871.

Apologies for the bug introduced in the previous PR and any
inconvenience it may have caused.
viktor-silakov pushed a commit to viktor-silakov/whisper_node_mic.cpp that referenced this pull request May 11, 2024
Until the most recent commit (3d42463), the main.cpp sample file does
not check whether the input files exist or not. Consequently, the
model is loaded first before reporting whether there was a failure or
not when processing a file. In environments with HDD, this can take
about 50 seconds or more, depending on the loaded model.

This commit addresses this issue by checking in advance whether the
input files exist or not.
viktor-silakov pushed a commit to viktor-silakov/whisper_node_mic.cpp that referenced this pull request May 11, 2024
In commit dda4b0e of PR ggerganov#1872, I've introduced a check for the
existence of files before loading the model. However, I haven't
considered the case where whisper.cpp might read from stdin as well,
and in such cases, the checks should ignore the "-" argument as it
does not represent a regular file.

Additionally, this commit removes the usage of 'stat()' in favor of
the recently introduced function 'is_file_exist()' in common.cpp from
PR ggerganov#1871.

Apologies for the bug introduced in the previous PR and any
inconvenience it may have caused.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants