Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #457

sheetalsh456 · 2019-04-08T07:08:40Z

These are the steps I followed to get Biobert working with the existing Bert hugging face pytorch code.

I downloaded the pre-trained weights 'biobert_pubmed_pmc.tar.gz' from the Releases page.
I ran this command to convert the tf checkpoint to pytorch model

python pytorch-pretrained-BERT/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py --tf_checkpoint_path="biobert/pubmed_pmc_470k/biobert_model.ckpt.index" --bert_config_file="biobert/pubmed_pmc_470k/bert_config.json" --pytorch_dump_path="biobert/pubmed_pmc_470k/Pytorch/biobert.model"

This created a file 'biobert.model' in the specified path.

As mentioned in this link , I compressed 'biobert.model' created above and 'biobert/pubmed_pmc_470k/bert_config.json' together into a biobert_model.tar.gz
I then ran the run_classifier.py of hugging face bert with the following command, using the tar.gz created above.

python pytorch-pretrained-BERT/examples/run_classifier.py --data_dir="Data/" --bert_model="biobert_model.tar.gz" --task_name="qqp" --output_dir="OutputModels/Pretrained/" --do_train --do_eval --do_lower_case

I get the error

'UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte'

in the line

tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)

Am I doing something wrong?

I just wanted to run run_classifier.py code provided by hugging face with biobert pretrained weights in the same way that we run bert with it. Is there a way to do this?

The text was updated successfully, but these errors were encountered:

thomwolf · 2019-04-11T14:31:45Z

Have you tried the solutions discussed in the other issues on this topic:

stale · 2019-06-10T14:40:18Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nikhilsid · 2019-08-05T03:50:57Z

Have you tried the solutions discussed in the other issues on this topic:

Problems converting TF BioBERT model to PyTorch #312

cannot load BERTAdam when restoring from BioBert #239

Hi @thomwolf ,

I followed the instructions here to convert the checkpoint and then placing the files (pytorch_model.bin, bert_config.json, and vocab.txt) in one folder to compress it. I did not need to ignore any weights as mentioned in any solutions you mentioned above ( #312 or #239 )

I copied the compressed folder to the home folder of 'pytorch-transformers'. Then I ran the following command from there itself to run the example code ('examples/run_glue.py') on my data.

Then, I am trying to run the example code given by running the following command,

python ./examples/run_glue.py \ --model_type bert \ --model_name_or_path biobert.gz \ --task_name=sts-b \ --do_train \ --do_eval \ --do_lower_case \ --data_dir=$DIR \ --max_seq_length 128 \ --per_gpu_eval_batch_size=8 \ --per_gpu_train_batch_size=8

But, I get the same error as mentioned in the main discussion:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

at this location:

File "./examples/run_glue.py", line 424, in main config = config_class.from_pretrained(args.config_name if args.config_name else args.model_name_or_path, num_labels=num_labels, finetuning_task=args.task_name)

Can you please tell what to change.

stefan-it · 2019-08-05T21:15:00Z

Try to pass the extracted folder of your converted bioBERT model to the --model_name_or_path :)

Here's a short example:

Download the BioBERT v1.1 (+ PubMed 1M) model (or any other model) from the bioBERT repo
Extract the downloaded file, e.g. with tar -xzf biobert_v1.1_pubmed.tar.gz
Convert the bioBERT model TensorFlow checkpoint to a PyTorch and PyTorch-Transformers compatible one: pytorch_transformers bert biobert_v1.1_pubmed/model.ckpt-1000000 biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/pytorch_model.bin
Move config mv biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/config.json

Then pass the folder name to the --model_name_or_path argument. You can run this simple script to check, if everything works:

from pytorch_transformers import BertModel
model = BertModel.from_pretrained('biobert_v1.1_pubmed')

nt-sumit · 2019-09-11T15:09:42Z

How we load a tensor as a pretrained model in bert

nipunsadvilkar · 2020-03-26T09:11:53Z

@stefan-it As per new transformers-cli third command would change as follows:

transformers-cli convert --model_type bert \
--tf_checkpoint biobert_v1.1_pubmed/model.ckpt-1000000 \
--config biobert_v1.1_pubmed/bert_config.json \
--pytorch_dump_output biobert_v1.1_pubmed/pytorch_model.bin

JessicaLopezEspejel · 2020-04-13T16:59:35Z

Hello!
Just to complement the @stefan-it instructions in step number 3, it works for me the following code:

import os
from pytorch_pretrained_bert.convert_tf_checkpoint_to_pytorch import convert_tf_checkpoint_to_pytorch
path_bin = 'my_directory/pytorch_model.bin'
path_bert = 'my_bert_directory/'

if (not os.path.exists(path_bin)):
convert_tf_checkpoint_to_pytorch(
path_bert + "biobert_model.ckpt",
path_bert + "bert_config.json",
path_bert + "pytorch_model.bin"
)

My folder (biobert_v1.0_pmc) was originally 5 files:
3 Tensorflow checkpoint files
A vocab file
A config file

tokarev-i-v · 2020-04-16T01:29:28Z

Move config mv biobert_v1.1_pubmed/bert_config.json biobert_v1.1_pubmed/config.json

Tnank you very much! It did help me!

abkds · 2020-05-01T17:16:53Z

Thank you every one, this works fine now!

mariaBio · 2020-05-19T11:39:12Z

Dear all,
Thank you very much for the suggestions on how to prepare the model to be used with hugginfaces. I am trying to use huggingfaces BertTokenizer to perform NER on biomedical data with the pre-trained weights.
It seems to work fine until then point when I want to map annotated tokens to entity labels. I have token ids and prediction ids, but I cannot figure out how/where to get label_list to follow an example of mapping from https://huggingface.co/transformers/usage.html#named-entity-recognition
Thank you very much for any help you can provide!
Maria

TamMinhVo · 2020-05-31T15:41:24Z

Dear all,
I am a newbie and I don not have so much experience. Does anyone have a full tutorial or code for a regression task, Please share with me! I would greatly appreciate it! Thank you.

varunp2k · 2020-06-03T07:01:53Z

@stefan-it @nipunsadvilkar Thank you for your solutions.

thomwolf added BERT Discussion Discussion on a topic (keep it focused or open a new issue though) labels Apr 11, 2019

stale bot added the wontfix label Jun 10, 2019

stale bot closed this as completed Jun 17, 2019

ypapanik mentioned this issue Oct 25, 2019

Loading from ckpt is not possible for bert, neither tf to pytorch conversion works in 2.1.1 #1632

Closed

2 tasks

zampierimatteo91 mentioned this issue Nov 25, 2019

NER - sciBERT weights not initialized. #1941

Closed

ianovski mentioned this issue Apr 5, 2020

KeyError when running NER on pretrained BioBERT model naver/biobert-pretrained#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #457

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #457

sheetalsh456 commented Apr 8, 2019

thomwolf commented Apr 11, 2019

stale bot commented Jun 10, 2019

nikhilsid commented Aug 5, 2019

stefan-it commented Aug 5, 2019

nt-sumit commented Sep 11, 2019

nipunsadvilkar commented Mar 26, 2020

JessicaLopezEspejel commented Apr 13, 2020 •

edited

Loading

tokarev-i-v commented Apr 16, 2020 •

edited

Loading

abkds commented May 1, 2020

mariaBio commented May 19, 2020

TamMinhVo commented May 31, 2020

varunp2k commented Jun 3, 2020 •

edited

Loading

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #457

Load Biobert pre-trained weights into Bert model with Pytorch bert hugging face run_classifier.py code #457

Comments

sheetalsh456 commented Apr 8, 2019

thomwolf commented Apr 11, 2019

stale bot commented Jun 10, 2019

nikhilsid commented Aug 5, 2019

stefan-it commented Aug 5, 2019

nt-sumit commented Sep 11, 2019

nipunsadvilkar commented Mar 26, 2020

JessicaLopezEspejel commented Apr 13, 2020 • edited Loading

tokarev-i-v commented Apr 16, 2020 • edited Loading

abkds commented May 1, 2020

mariaBio commented May 19, 2020

TamMinhVo commented May 31, 2020

varunp2k commented Jun 3, 2020 • edited Loading

JessicaLopezEspejel commented Apr 13, 2020 •

edited

Loading

tokarev-i-v commented Apr 16, 2020 •

edited

Loading

varunp2k commented Jun 3, 2020 •

edited

Loading