Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #5

Closed
sheetalsh456 opened this issue Apr 8, 2019 · 1 comment

Comments

@sheetalsh456
Copy link

These are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.

  1. I downloaded the pre-trained weights 'biobert_pubmed_pmc.tar.gz' from the Releases page.

  2. I ran this command to convert the tf checkpoint to pytorch model

python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"

This created a file 'biobert.model' in the specified path.

  1. As mentioned in this link , I compressed 'biobert.model' created above and 'biobert/pubmed_pmc_470k/bert_config.json' together into a biobert_model.tar.gz

  2. I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.

python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case

I get the error

'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte' 

in the line

tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)

Am I doing something wrong?

I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?

@jhyuklee
Copy link
Collaborator

jhyuklee commented Apr 8, 2019

Duplicate issue in dmis-lab/biobert#26.
You don't have to post the same issues in different repository.

@jhyuklee jhyuklee closed this as completed Apr 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants