Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper transcription engine bug fixing #5436

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

lkiesow
Copy link
Member

@lkiesow lkiesow commented Nov 30, 2023

This patch fixes a number of minor bugs and inconveniences in the whisper transcription engine. This includes loading of default values when the configuration has been modified which incorrectly kept their old values in some cases. It also does proper debug logging, not filtering out the actual output, which is what you want for debugging.

Your pull request should…

@lkiesow lkiesow added the bug label Nov 30, 2023
@lkiesow lkiesow requested a review from marwyg November 30, 2023 22:48
Copy link
Member

@marwyg marwyg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested several cases. It seems to work. I have just 2 little things.

@@ -187,10 +174,10 @@ public Map<String, Object> generateSubtitlesFile(File mediaFile,
transcriptionCommand.add(language);
}

if (quantization.isSome()) {
logger.debug("Using quantization {}", quantization.get());
if (quantization != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is important. If you comment the configuration out, the variable will be null at this point. But if you comment it in and no value is configured (like this whisper.quantization=), it will become an empty string here and an error will be thrown later. Would an additional check for the empty string be beneficial at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the argument for both sides:

  • Setting a key to an empty string should be a valid input. The admin set this to "" after all. Why should we discard their input. What if a key is e.g. for displaying additional text? Shouldn't you be able to set an empty text?
  • Setting a key to an empty string is basically setting this to nothing. If nothing is configured, the default value should apply.

Unfortunately, Opencast is not very consistent when it comes to evaluating empty strings :-(
That also means, to be sure any given setting uses the default, you must comment it out right now. Otherwise, it might be the default… or not.

That's why I think that evaluating it like this isn't a problem unless we change all of Opencast's settings to consistently evaluate "" as null and fall back to the default value (or throw an exception if the setting is required).

I also actually prefer this for two reasons:

  • Evaluating this is very easy since you can usually just use Java's built-in Objects::toString(prop_value, default) to parse the configuration value, handle null values and set defaults. No need to build more complex code or load any additional libraries.
  • You actually handle the case of an empty string being valid quite easily and don't have to worry about that. If an admin wants to set that as a value, they can.

That said, I don't have a strong opinion and while I don't think it will really help, I can make sure an empty string is turned into null as well if you want me to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let us keep it as it is then :)

isVADEnabled = OsgiUtil.getOptCfgAsBoolean(cc.getProperties(), WHISPER_VAD);
logger.debug("Whisper Voice Activity Detection set to {}", isVADEnabled);
isVADEnabled = OsgiUtil.getOptCfgAsBoolean(prop, WHISPER_VAD);
logger.debug("Whisper Voice Activity Detection set to {}", isVADEnabled);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log outputs something like that: "Whisper Voice Activity Detection set to Some(true)" 😂 But it works, so I think it doesn't matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this was different before… but it's fixed now. Instead of printing the Opt value, it is now printing the boolean value. So, now you should get something like:

Whisper Voice Activity Detection set to true

This patch fixes a number of minor bugs and inconveniences in the
whisper transcription engine. This includes loading of default values
when the configuration has been modified which incorrectly kept their
old values in some cases. It also does proper debug logging, not
filtering out the actual output, which is what you want for debugging.
@lkiesow
Copy link
Member Author

lkiesow commented Dec 12, 2023

@marwyg, are you happy with the change and explanation? Can we merge this into r/15.x before the 15.0 release tomorrow?

@marwyg
Copy link
Member

marwyg commented Dec 14, 2023

Sorry for the late reply. Everything looks fine to me.

@lkiesow
Copy link
Member Author

lkiesow commented Jan 4, 2024

Thanks for the review → merging

@lkiesow lkiesow merged commit 58fefa3 into opencast:r/15.x Jan 4, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants