Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download community models #2392

Closed
2 of 4 tasks
cbowdon opened this issue Jan 3, 2020 · 6 comments
Closed
2 of 4 tasks

Unable to download community models #2392

cbowdon opened this issue Jan 3, 2020 · 6 comments
Assignees

Comments

@cbowdon
Copy link

cbowdon commented Jan 3, 2020

馃悰 Bug

Model I am using (Bert, XLNet....): bert-base-cased-finetuned-conll03-english

Language I am using the model on (English, Chinese....): English

The problem arise when using:

  • the official example scripts: running a small snippet from docs (see below)
  • my own modified scripts: (give details)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: just trying to load the model at this stage

To Reproduce

Steps to reproduce the behavior:

I'm following the instructions at https://huggingface.co/bert-large-cased-finetuned-conll03-english but failing at the first hurdle. This is the snippet from the docs that I've run:

tokenizer = AutoTokenizer.from_pretrained("bert-large-cased-finetuned-conll03-english")
model = AutoModel.from_pretrained("bert-large-cased-finetuned-conll03-english")

It fails with this message:

OSError: Model name 'bert-base-cased-finetuned-conll03-english' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-japanese, bert-base-japanese-whole-word-masking, bert-base-japanese-char, bert-base-japanese-char-whole-word-masking, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

The message mentions looking at https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json and finding nothing.

I also tried with the CLI: transformers-cli download bert-base-cased-finetuned-conll03-english but I'm afraid that failed with a similar message. However both methods work for the namespaced models, e.g. dbmdz/bert-base-italian-cased.

Expected behavior

The community model should download. :)

Environment

  • OS: openSUSE Tumbleweed 20200101
  • Python version: 3.7
  • PyTorch version: 1.3.1
  • PyTorch Transformers version (or branch): 2.3.0
  • Using GPU ? n/a
  • Distributed of parallel setup ? n/a
  • Any other relevant information:

Additional context

I browsed https://s3.amazonaws.com/models.huggingface.co/ and see that the model is there, but paths are like:

https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-config.json

rather than:

https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english/config.json

(note -config.json vs /config.json)

If I download the files manually and rename, the model loads. So it looks like just a naming problem.

@mandubian
Copy link

I confirm what you see... in current master code, bert-large-cased-finetuned-conll03-english has no mapping in tokenizers or models so it can't find it in the same way as bert-base-uncased for example.

but it works if you target it directly:

AutoTokenizer.from_pretrained("https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-config.json")

AutoModel.from_pretrained("https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-conll03-english-pytorch_model.bin")

@julien-c
Copy link
Member

Hmm, I think I see the issue. @stefan-it @mfuntowicz we could either:

  • move bert-large-cased-finetuned-conll03-english to dbmdz/bert-large-cased-finetuned-conll03-english
  • or add shortcut model names inside the codebase (config, model, tokenizer)

What do you think?

(also kinda related to #2281)

@stefan-it
Copy link
Collaborator

@julien-c I think it would be better to move the model under the dbmdz namespace - as it is no "official" model!

@mfuntowicz
Copy link
Member

@julien-c moving to dbmdz is fine. We need to update the default NER pipeline's model provider to reflect the new path.

@julien-c
Copy link
Member

Model now lives at https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english

Let me know if everything works correctly!

@cbowdon
Copy link
Author

cbowdon commented Jan 17, 2020

Works perfectly now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants