Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model checkpointing to push_to_hub and PushToHubCallback #14492
Add model checkpointing to push_to_hub and PushToHubCallback #14492
Changes from 8 commits
44dfe00
e551bba
f019466
9968de0
11e8f7c
ca7d863
62d9cc0
f2f2689
62a066b
9350d65
be97cb3
dbe5ce5
f8fd2be
5a98153
485e331
dfc70c3
0a7d621
87e41f4
4b19d37
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we said to use
Repository
here to avoid creating a new repo?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but it resulted in some code duplication in order to make everything work - I kept it as-is but added a comment to explain instead. If you'd still prefer to avoid the call, let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what @LysandreJik thinks then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it won't actually create a remote repository if it doesn't exist? If you're certain of it it's fine for me to go with
self._create_or_get_repo
but I'm quite unsure that it won't create a remote repo.Also which method are you using, is it this method here?
transformers/src/transformers/file_utils.py
Lines 2338 to 2367 in 956a483
Because if so, don't call it with positional arguments like you do here as you're passing the
organization
parameter torepo_url
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot on the
organization
positional arg, fixed!Also, the
_create_or_get_repo
method just does this, after sanity-checking the arguments:So I think as long as we don't
push
it afterwards, we won't create a new remote repo if one doesn't exist already.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this discussion is real proof we need to just put the code used here and not try to use some magic of the Mixin. The library always prefers an explicit approach at the cost of duplicate code, and I think this is an instance where we should just apply the code needed.
The Mixin might also change in the feature, following the RFC on huggingface_hub, so those private methods might actually introduce some behavior we don't want in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the
Repository(repo_path_or_name, clone_from=repo_url, use_auth_token=use_auth_token)
line actually creates the repo if you try to clone it but it doesn't exist:https://github.com/huggingface/huggingface_hub/blob/10c69146969ad7f9e1add075c1ef4ec15e42e85f/src/huggingface_hub/repository.py#L542-L549
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
push
event is unrelated to the creation of the repository. No push to a remote repository can be done if the repository does not exist. The repository creation happens at the repository creation, not at the repository update (push).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'm sorry, I didn't realize that's how it worked! And yes, you're right, this is probably a sign that we need to be explicit. I'll try to figure out some code that just checks without accidentally creating a new repo.