Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non blocking git push #315

Merged
merged 4 commits into from Sep 13, 2021
Merged

Non blocking git push #315

merged 4 commits into from Sep 13, 2021

Conversation

LysandreJik
Copy link
Member

@LysandreJik LysandreJik commented Sep 6, 2021

This PR introduces a non-blocking git push as asked in #169 and proposed by @cccntu in #171.

It is currently painful to have to wait until a git push is over before continuing to something else. Since this uses nothing else than bandwidth, it is fine for it to happen behind the scenes.

This implementation:

  • Implements a blocking parameter to git_push and the commit context manager; these variables default to True.
  • In case of blocking=False, outputs a variable that can be monitored to follow the status of the git push command
  • In case of blocking=False, outputs a variable that can be monitored to ensure the git push command has terminated
  • These variables are added to the instance variable command_queue to be managed. More information about this can be found below.
  • This implementation leverages the atexit official lib to ensure that all processes have completed before exiting the Python runtime.
  • The progress bars are not managed when using a blocking=False parameter.
  • Implements a warning when several commits are about to be pushed. The progress bars may be unreliable as a single progress bar may monitor the upload of several versions of a single file, therefore resulting in a jump between the two uploads on the progress bars.

It is recommended to monitor the progress of the current push before attempting a second one; however, several pushes may be launched concurrently. See below for the expected behaviors:

What happens when two git pushes are done simultaneously?

If two identical git pushes are done, then no error in the workflow will happen. The remote correctly registers the updated version.

Two times more bandwidth is consumed as both pushes will upload all files.

What happens when two git pushes are done simultaneously, with the second git push having one or several additional commits?

All commits are correctly registered on the remote. More bandwidth is consumed as both pushes will upload all files.

What happens if one of the pushes errors out?

Given that the pushes are independent of each other as the remote isn't updated until a push is complete, the workflow should complete.


The command_queue

This implementation implements a command_queue so that users may be aware of the current status of all commands launched.

It can be accessed as follows:

# Access to command queue
command_queue = repo.command_queue

# Accessing a single command
last_command = repo.command_queue[-1]

# Vizualizing the command
print(last_command)
# [push command, status code: running, in progress. PID: 299167]

# Accessing the command's properties
last_command.status
last_command.is_done
last_command.title
last_command.stderr
last_command.stdout

# Accessing the commands that are currently in progress
commands_in_progress = repo.commands_in_progress

# Accessing the commands that failed
commands_failed = repo.commands_failed

Copy link
Contributor

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks a lot for working on this!

src/huggingface_hub/repository.py Show resolved Hide resolved
src/huggingface_hub/repository.py Show resolved Hide resolved
Copy link
Member

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for the PR!

I mostly have some questions for my own learning/understanding :)

src/huggingface_hub/README.md Outdated Show resolved Hide resolved
src/huggingface_hub/README.md Show resolved Hide resolved
@@ -344,6 +404,10 @@ def __init__(
if revision is not None:
self.git_checkout(revision, create_branch_ok=True)

# This ensures that all commands exit before exiting the Python runtime.
# This will ensure all pushes register on the hub, even if other errors happen in subsequent operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if user tries to kill the process (for example ctrl + d) while things are getting pushed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user tries to kill the process, it will try to continue pushing:

>>> repo.command_queue
[[push command, status code: running, in progress. PID: 95581]]
>>> (CTRL+C)
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 95581]].

If you try to exit again:

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/lysandre/Workspaces/Python/huggingface_hub/src/huggingface_hub/repository.py", line 1141, in wait_for_commands
    time.sleep(1)
KeyboardInterrupt

and then you're out!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's perfect!

def clone_from(self, repo_url: str, use_auth_token: Union[bool, str, None] = None):
def clone_from(
self, repo_url: str, use_auth_token: Union[bool, str, None] = None
) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but where/why are we returning a bool here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice catch, that's a mistake from my side

Comment on lines 277 to 281
try:
url, result = repo.git_push(blocking=False)
except subprocess.CalledProcessError as exc:
print(exc.stderr)
raise exc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we having a try/except here? This seems off for a test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I suspect it was in an old test which got copy-pasted around. Fixed that in the last commit.

Add test & typings
Implement status code visualization
context manager has access to commands

Better warning
Traditional push should also contribute to queue
Add push_to-hub
Docs and tests
Update src/huggingface_hub/README.md

Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>
Apply Omar's review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants