Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can the pre-trained model be used as a language model? #35

Closed
wangwang110 opened this issue Nov 2, 2018 · 6 comments
Closed

can the pre-trained model be used as a language model? #35

wangwang110 opened this issue Nov 2, 2018 · 6 comments

Comments

@wangwang110
Copy link

@wangwang110 wangwang110 commented Nov 2, 2018

how can we use the pre-trained model to get the probability of one sentence?

@jacobdevlin-google
Copy link
Collaborator

@jacobdevlin-google jacobdevlin-google commented Nov 2, 2018

It can't, you can only use it to get probabilities if a single missing word in a sentence (or a small number of missing words). This is one of the fundamental ideas, that masked LMs give you deep bidirectionality, but you no longer have a well-formed probability distribution over the sentence (which in general, we don't care about).

@xu-song
Copy link

@xu-song xu-song commented Nov 20, 2018

What about mask each word sequentially. Then score sentence by summary of word score.

@hscspring
Copy link

@hscspring hscspring commented Apr 10, 2019

using BERT as a language Model · Issue #37 · huggingface/pytorch-pretrained-BERT

It's actually like what @jacobdevlin-google have said, bert is really not a language model.

@WolfNiu
Copy link

@WolfNiu WolfNiu commented Apr 18, 2019

What about mask each word sequentially. Then score sentence by summary of word score.

That way your calculation won't be correct.

Let's say the sentence has only two tokens x1 and x2. Your calculation well give P(x1 | x2) * P(x2 | x1), which doesn't lead to the probability of the whole sentence. Note that this is not to say what you intended was not doable -- it's just that your way probably won't work.

@Bachstelze
Copy link

@Bachstelze Bachstelze commented Apr 30, 2019

Alex Wang and Kyunghyun Cho are using in BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model the unnormalized log-probabilities to rank a set of sentences. For this purpose it seems to work.

@Shujian2015
Copy link

@Shujian2015 Shujian2015 commented May 31, 2019

You can fine-tune BERT to be LM: https://arxiv.org/abs/1904.09408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants