Skip to content

Getting time offsets of beginning and end of each word #3

Answered by jianfch
shashanoid asked this question in Q&A
Discussion options

You must be logged in to vote

Update with full script
https://github.com/jianfch/stable-ts

You can actually get the timestamp prediction for each word because it's part of the predictions but it's filtered and reserved for the start time and end time tokens. That means you can clone the logits to filter it then return it along with the other results.

Add those lines marked with "# <----add this" in decoding.DecodingTask._main_loop:

    def _main_loop(self, audio_features: Tensor, tokens: Tensor):
        assert audio_features.shape[0] == tokens.shape[0]
        n_batch = tokens.shape[0]
        sum_logprobs: Tensor = torch.zeros(n_batch, device=audio_features.device)
        no_caption_probs = [np.nan] * n_batch

    …

Replies: 8 comments 44 replies

Comment options

You must be logged in to vote
5 replies
@R4ZZ3
Comment options

@R4ZZ3
Comment options

@jongwook
Comment options

@aliJabra
Comment options

@R4ZZ3
Comment options

Comment options

You must be logged in to vote
3 replies
@usergit
Comment options

@lectair
Comment options

@RaulKite
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
27 replies
@eschmidbauer
Comment options

@q00u
Comment options

@jianfch
Comment options

@doesdev
Comment options

@jianfch
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
1 reply
@Jxspa
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
8 replies
@ryanheise
Comment options

@mu4farooqi
Comment options

@Jeronymous
Comment options

@bfeist
Comment options

@Jeronymous
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet