Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for private models from huggingface.co #9141

Merged
merged 5 commits into from
Dec 16, 2020
Merged

Conversation

julien-c
Copy link
Member

Add a use_auth_token flag (or string) to all from_pretrained entry points, to specify token to use as Bearer authorization for remote files.

  • if it's a string, use it
  • If it's true, will get token from ~/.huggingface/token (will throw if no token there)

You can test this with:

model = AutoModelForMaskedLM.from_pretrained("pierric/hf-private", use_auth_token=True)

We'll add unit tests down the line but need to think about which environment those tests are going to hit.

⚠️ For now, I decided against adding token by default to all calls if user is logged in. Let's discuss though!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, though the documentation needs tweaking IMO.

Comment on lines 321 to 322
Specify token to use as Bearer authorization for remote files. If true, will get token from
~/.huggingface.
Copy link
Collaborator

@sgugger sgugger Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Specify token to use as Bearer authorization for remote files. If true, will get token from
~/.huggingface.
The token to use as bearer authorization for the remote files. If :obj:`True`, will use the token
generated when running :obj:`transformers-cli login` (stored in :obj:`~/.huggingface`).

Think it's better to tell the user what to do to get that token in the right place than where the token is stored (unless I misunderstood where that token is coming from).

This might also warrant a note like:

            .. note::

                Passing a token is required when you want to use a private model.

if the default is to not use the token when a user is logged in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense @sgugger. I'll wait for other potential comments and batch them tomorrow.

Re. the note, where would I add it? e.g. https://huggingface.co/transformers/philosophy.html#main-concepts ?

Thoughts on your side on the "should make it enabled by default" question?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking in the doc string of from_pretrained, but it can also go there. The more the merrier since people don't read all the doc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for "should make it enabled by default", I have no strong feeling either way. It makes sense to me to have some added code where something won't work out of the box for all users. It also makes sense to want logged-in users to be able to use it automatically. I think we can start with your approach and see users thoughts.

Copy link
Member

@Pierrci Pierrci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ For now, I decided against adding token by default to all calls if user is logged in. Let's discuss though!

I think that for a more seamless experience it makes sense to add the token automatically for logged-in users, but otherwise LGTM!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd vote for use_auth_token to be just Optional[str] and to automatically allow access to privet model by default if user is authorized. I don't see a reason why someone would not want to have access to his/her model if authorized.

IMO, this is both cleaner in terms of design (don't like Union[str, bool]) and a nicer UX.

Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

On the default approach, I am also in favor of taking the token by default too.
The only drawback I can see is that now code depends on the token file, which might lead to some confusion for people running code in a different place. We probably should make sure the error they receive is something different than 404 does not exist but 403 forbidden so users can figure out what is the problem.

What do you think ?

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@julien-c
Copy link
Member Author

@patrickvonplaten I don't follow here. In this PR we want to have a way to either pass a token directly, or to opt in to use the one that's store in ~/. Don't see how I can do that with just an optional string?

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Dec 16, 2020

@patrickvonplaten I don't follow here. In this PR we want to have a way to either pass a token directly, or to opt in to use the one that's store in ~/. Don't see how I can do that with just an optional string?

I might have misunderstood a bit what constrains there are on the functionality. I thought, the following logic is possible and makes sense here:

  • If user passes a string use_auth_token, then use this as the token
  • Else look for token in ~/.huggingface:
    - if there is no token and model is private -> throw error
    - if there is no token and model is not private -> load the model as usual
    - if there is a token -> use this one

Not sure if there is something I am completely overlooking here in the logic though, e.g. if we cannot know before hand whether the model is private or not

@patrickvonplaten
Copy link
Contributor

@patrickvonplaten I don't follow here. In this PR we want to have a way to either pass a token directly, or to opt in to use the one that's store in ~/. Don't see how I can do that with just an optional string?

I might have misunderstood a bit what constrains there are on the functionality. I thought, the following logic is possible and makes sense here:

  • If user passes a string use_auth_token, then use this as the token
  • Else look for token in ~/.huggingface:
    • if there is no token and model is private -> throw error
    • if there is no token and model is not private -> load the model as usual
    • if there is a token -> use this one

Not sure if there is something I am completely overlooking here in the logic though, e.g. if we cannot know before hand whether the model is private or not

Ok never mind - as discussed offline this would require more features to add which is out-of-scope for this PR -> so LGTM!

@julien-c
Copy link
Member Author

Also cc'ing @borisdayma as this PR adds a exist_ok param to HfApi.create_repo()

Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, very nice functionality. I like that it doesn't log anything more than the user agent in the default case.

LGTM

@julien-c julien-c merged commit fb650df into master Dec 16, 2020
@julien-c julien-c deleted the hf_private_models branch December 16, 2020 15:09
guyrosin pushed a commit to guyrosin/transformers that referenced this pull request Jan 15, 2021
* minor wording tweaks

* Create private model repo + exist_ok flag

* file_utils: `use_auth_token`

* Update src/transformers/file_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Propagate doc from @sgugger

Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants