New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can the pre-trained model be used as a language model? #35

wangwang110 opened this Issue Nov 2, 2018 · 2 comments


None yet
3 participants
Copy link

wangwang110 commented Nov 2, 2018

how can we use the pre-trained model to get the probability of one sentence?


This comment has been minimized.

Copy link

jacobdevlin-google commented Nov 2, 2018

It can't, you can only use it to get probabilities if a single missing word in a sentence (or a small number of missing words). This is one of the fundamental ideas, that masked LMs give you deep bidirectionality, but you no longer have a well-formed probability distribution over the sentence (which in general, we don't care about).


This comment has been minimized.

Copy link

xu-song commented Nov 20, 2018

What about mask each word sequentially. Then score sentence by summary of word score.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment