Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to generate vocab file that BERT model was trained on? #59

Closed
allenzhang010 opened this issue Nov 6, 2018 · 3 comments
Closed

how to generate vocab file that BERT model was trained on? #59

allenzhang010 opened this issue Nov 6, 2018 · 3 comments

Comments

@allenzhang010
Copy link

I was weird how to generate the vocab file when specified --vocab_file to create_pretraining_data.py?

I noticed the released BERT model indeed include the vocab file? so, how's you guy generate it via for instance, enlish Wikipedia dump file? as I am going to do the pre-training from scratch. Appreciate your help!

Thanks,
Allen Zhang

@jacobdevlin-google
Copy link
Contributor

We couldn't include that code, see this section of the README for alternatives.

@zihaolucky
Copy link

zihaolucky commented Jan 1, 2019

@allenzhang010 have you work out the solution about this?

@arunzz
Copy link

arunzz commented Jun 20, 2019

you can use this to create your vocabulary
https://github.com/kwonmha/bert-vocab-builder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants