Skip to content

No release notes? What is "MultiHeadAttention to return qk as well" #822

Answered by glangford
turnkit asked this question in Q&A
Discussion options

You must be logged in to vote

It looks like the "MultiHeadAttention to return qk as well" change enables the demo code in the Multilingual_ASR notebook. This is mentioned (briefly) in #332.

A comment in the notebook describes the purpose as

...we use the cross-attention weights to determine more granular, word-level timestamps. It uses a set of heuristics and dynamic time warping (DTW) to find the alignment between the audio and the transcript.

The accuracy of timestamps has been discussed in #3 and #435 .

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@glangford
Comment options

Answer selected by turnkit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants