You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How does word regularization calculate the sampling score returned by the function "sample_encode_and_score"?
Does the sentence sampling score rely on the token score recorded in the vocabulary file?
If the score of token is the log probability of the token in the unigram model, how does the model calculate the sentence sampling score?
And I saw in the origin paper, the tokens are sorted according to the loss of likelihood if this token is removed from the corpus. I thought this loss is another score. Where can I see it?
The text was updated successfully, but these errors were encountered:
Sampling score is relying on the score (logprob) of the vocab file.
Given one possible segmentation W=w1, w2, ... wn, the generation probability of W is computed as P(W) = exp(\sum_k logprob(w_k)). We sample the sequence W with respect to the probability P(W).
There are several sampling modes (e.g., nbest-sampling, include-best, without replenishment), but we use the forward-filtering-and-backward-sampling algorithm as the basic algorithm.
How does word regularization calculate the sampling score returned by the function "sample_encode_and_score"?
Does the sentence sampling score rely on the token score recorded in the vocabulary file?
If the score of token is the log probability of the token in the unigram model, how does the model calculate the sentence sampling score?
And I saw in the origin paper, the tokens are sorted according to the loss of likelihood if this token is removed from the corpus. I thought this loss is another score. Where can I see it?
The text was updated successfully, but these errors were encountered: