I can get word-level timings, but not the same utterance-level timings that the openai whisper model provides. Is there a way to generate these sorts of timings?
I thought this would be adding {"without_timestamps": False} but setting that changed nothing when I tested it, and I the only other setting that I see related to timing is {"word_timestamps": True}.