-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model checkpointing to push_to_hub and PushToHubCallback #14492
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this to the callback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few last comments!
"Checkpoint loading failed as no optimizer is attached to the model. " | ||
"This is most likely caused by the model not being compiled." | ||
) | ||
repo = self._create_or_get_repo(repo_path_or_name, organization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we said to use Repository
here to avoid creating a new repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but it resulted in some code duplication in order to make everything work - I kept it as-is but added a comment to explain instead. If you'd still prefer to avoid the call, let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what @LysandreJik thinks then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it won't actually create a remote repository if it doesn't exist? If you're certain of it it's fine for me to go with self._create_or_get_repo
but I'm quite unsure that it won't create a remote repo.
Also which method are you using, is it this method here?
transformers/src/transformers/file_utils.py
Lines 2338 to 2367 in 956a483
def _create_or_get_repo( | |
cls, | |
repo_path_or_name: Optional[str] = None, | |
repo_url: Optional[str] = None, | |
organization: Optional[str] = None, | |
private: bool = None, | |
use_auth_token: Optional[Union[bool, str]] = None, | |
) -> Repository: | |
if repo_path_or_name is None and repo_url is None: | |
raise ValueError("You need to specify a `repo_path_or_name` or a `repo_url`.") | |
if use_auth_token is None and repo_url is None: | |
use_auth_token = True | |
if repo_path_or_name is None: | |
repo_path_or_name = repo_url.split("/")[-1] | |
if repo_url is None and not os.path.exists(repo_path_or_name): | |
repo_name = Path(repo_path_or_name).name | |
repo_url = cls._get_repo_url_from_name( | |
repo_name, organization=organization, private=private, use_auth_token=use_auth_token | |
) | |
# Create a working directory if it does not exist. | |
if not os.path.exists(repo_path_or_name): | |
os.makedirs(repo_path_or_name) | |
repo = Repository(repo_path_or_name, clone_from=repo_url, use_auth_token=use_auth_token) | |
repo.git_pull() | |
return repo |
Because if so, don't call it with positional arguments like you do here as you're passing the organization
parameter to repo_url
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot on the organization
positional arg, fixed!
Also, the _create_or_get_repo
method just does this, after sanity-checking the arguments:
repo = Repository(repo_path_or_name, clone_from=repo_url, use_auth_token=use_auth_token)
repo.git_pull()
return repo
So I think as long as we don't push
it afterwards, we won't create a new remote repo if one doesn't exist already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this discussion is real proof we need to just put the code used here and not try to use some magic of the Mixin. The library always prefers an explicit approach at the cost of duplicate code, and I think this is an instance where we should just apply the code needed.
The Mixin might also change in the feature, following the RFC on huggingface_hub, so those private methods might actually introduce some behavior we don't want in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the Repository(repo_path_or_name, clone_from=repo_url, use_auth_token=use_auth_token)
line actually creates the repo if you try to clone it but it doesn't exist:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The push
event is unrelated to the creation of the repository. No push to a remote repository can be done if the repository does not exist. The repository creation happens at the repository creation, not at the repository update (push).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'm sorry, I didn't realize that's how it worked! And yes, you're right, this is probably a sign that we need to be explicit. I'll try to figure out some code that just checks without accidentally creating a new repo.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adjusting!
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…face#14492) * Add checkpointing to push_to_hub and PushToHubCallback * Add checkpoint loading * Add missing default value * Correct method name * make style * Moving everything to the right location * make style * Revert changes to file_utils.py * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Adding docstrings and comments to clarify code * make style * Fix organization positional arg * Fix load_repo_checkpoint to no longer accidentally create empty repos * make style * Remove unnecessary 'organization' argument in load_repo_checkpoint * Avoid private `_create_or_get_repo` method * make style * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
No description provided.