Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
can the pre-trained model be used as a language model? #35
It can't, you can only use it to get probabilities if a single missing word in a sentence (or a small number of missing words). This is one of the fundamental ideas, that masked LMs give you deep bidirectionality, but you no longer have a well-formed probability distribution over the sentence (which in general, we don't care about).