Is your feature request related to a problem? Please describe.
When calling the transcription endpoint with the format verbose_json, localai currently only returns a list of segments with text, start & end. The words attribute is always None.
Describe the solution you'd like
Localai should also provide word level timestamps when requested with timestamp_granularities=["word"]
Describe alternatives you've considered
Additional context
I'm trying to generate subtitles for videos with hardware acceleration (Vulkan). The data returned with format as srt works, but the timestamps are inaccurate. I would like to use stable-ts to improve the timestamps, but without word level timestamps the results are still suboptimal.
Is your feature request related to a problem? Please describe.
When calling the transcription endpoint with the format
verbose_json, localai currently only returns a list of segments withtext,start&end. Thewordsattribute is always None.Describe the solution you'd like
Localai should also provide word level timestamps when requested with
timestamp_granularities=["word"]Describe alternatives you've considered
Additional context
I'm trying to generate subtitles for videos with hardware acceleration (Vulkan). The data returned with format as srt works, but the timestamps are inaccurate. I would like to use stable-ts to improve the timestamps, but without word level timestamps the results are still suboptimal.