Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pretrain/PyTorch/dataprep/create_pretraining.py fails to locate BertTokenizer #18

Closed
kaiidams opened this issue Jul 29, 2019 · 6 comments

Comments

@kaiidams
Copy link

We don't have pretrain/PyTorch/pytorch_pretrained_bert.py.

$ python create_pretraining.py --input_dir=~/Data/bert_data/out3 --
output_dir=/home/kaiida/Data/bert_data/out4
Traceback (most recent call last):
  File "create_pretraining.py", line 29, in <module>
    from pytorch_pretrained_bert.tokenization import BertTokenizer
ModuleNotFoundError: No module named 'pytorch_pretrained_bert'
@kaiidams
Copy link
Author

kaiidams commented Jul 29, 2019

With BertTokenizer from Huggingface's Transformer, the script stops complaining. Not sure if it is compatible with Azure ML's Transformer.

from pytorch_transformers.tokenization_bert import BertTokenizer

@kaiidams
Copy link
Author

Original package path was pytorch_pretrained_bert, Huggingface Transformer changed it in this commit.

huggingface/transformers@0bab55d#diff-415debafd5265b334603c02768a29b0d

@kaiidams
Copy link
Author

kaiidams commented Aug 2, 2019

pretrain/PyTorch/docker/Dockerfile have this line. It seems pytorch-pretrained-bert v0.6.2 is expected instead of pytorch-transformers v1.0.0. README.md should specify which version to use.

RUN /opt/miniconda/envs/amlbert/bin/pip install --no-cache-dir pytorch-pretrained-bert==0.6.2 && \

@xiaoyongzhu
Copy link
Contributor

Hi @kaiidams looks you are right... Huggingface has changed their folder structure.

@skaarthik
Copy link
Contributor

skaarthik commented Aug 31, 2019

@kaiidams if you are interested, feel free to send a PR to update the README file.

@skaarthik
Copy link
Contributor

Merged the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants