Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Asynchroneous network API #5366

Closed
wants to merge 31 commits into from
Closed

Conversation

thiloschulz
Copy link

I have begun work on extending the API to support non-blocking network operations. So far, this is only a proof of concept to demonstrate that making libgit2 able to perform network operations asynchroneously is indeed feasible.

To enable async I/O, one only needs to set the set_fd_events member in the git_remote_callbacks structure. In this case, functions that wait for network activity return with the newly introduced error code GIT_EAGAIN
The set_fd_events() callback is used to signal to the application which type of events to wait for, and the application may call git_remote_perform() to continue processing with the new events that arrived.

If the set_fd_events is set to NULL (default), you get the old synchroneous behaviour.

For now, all asynchroneous networking functions are stubs only. Please take a look at git_remote_connect(), which demonstrates how the synchroneous API is wrapped around the async API.
To breathe life into the async API, the backends need considerable structural change to be made completely asynchroneous.

I have added a few members to the internal git_remote structure - some may be unnecessary, as they're encoded inside their respective transport structure as well. This is just work in progress, so they may just as well be removed again if it shows we don't need them.
Also, adding new callback function set_fd_events breaks ABI compatibility, so I bumped the version number of the .so to 29

…roneous I/O. Replaces previously internal GIT_SOCKET type
- Introduce GIT_ECANCEL for operations that were cancelled by the application, e.g. from callbacks
- Replace GIT_EUSER with GIT_ECANCEL in smart transport protocol. As per documentation, GIT_EUSER is never generated from inside libgit2 - as such we may not use that error code.
- Add GIT_EMIN to define the minimum error code ever used by libgit2
… legacy software, however, the ABI will not.

- Bump library version number as ABI is incompatible
…Most of these are still stubs and do nothing. Note how API should stay compatible.
@ethomson
Copy link
Member

Thanks for opening this PR, I'm looking forward to the discussion. I'm going to make a first pass to provide some insight into our style and provide some of the thinking behind the way we work in libgit2 that are not necessarily obvious to new contributors. These are things that we should document better in a new contributor guide. There's no need to respond to these immediately; I think that there's more value in discussing the architecture and the overall direction that we want to take.

But I need to meditate more upon this before I have feedback, so I wanted to make some notes on the stylistic stuff first, while I think about the other aspects.

@@ -31,6 +31,8 @@ typedef enum {
* GIT_EUSER is a special error that is never generated by libgit2
* code. You can return it from a callback (e.g to stop an iteration)
* to know that it was generated by the callback and not by libgit2.
*
* You may also use error codes below GIT_EMIN for the same purpose.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't suggest that end-users use our exit codes from callbacks. Obviously, there's nothing stopping them, but our general rule is that a user callback should return GIT_EUSER to cancel the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libgit2 only uses a very small amount of error codes - I think it might be beneficial for applications to be able to return other codes than just GIT_EUSER. After all, a callback may have many different reasons for triggering an abort. It is only a suggestion. I'm not particularly insistent on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They definitely can but we expect them to store that state themselves. We don’t make any guarantees that - if a user returned a -3 that we wouldn’t try to do something based on that result, thinking that we had returned it down in the stack. This isn’t something that I expect that we do anywhere but the only guarantees we make are around GIT_EUSER

@@ -58,6 +60,14 @@ typedef enum {
GIT_EMISMATCH = -33, /**< Hashsum mismatch in object */
GIT_EINDEXDIRTY = -34, /**< Unsaved changes in the index would be overwritten */
GIT_EAPPLYFAIL = -35, /**< Patch application failed */
GIT_EAGAIN = -36, /**< Operation would block */
GIT_ECANCEL = -37, /**< Operation was cancelled by application */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is GIT_ECANCEL different from GIT_EUSER?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for GIT_EUSER says:

GIT_EUSER is a special error that is never generated by libgit2
code. You can return it from a callback (e.g to stop an iteration)
to know that it was generated by the callback and not by libgit2.

However, if you look at mainline src/transports/smart_protocol.c, you can see that GIT_EUSER is in fact generated by libgit2 after a call to git_remote_stop().
This contradicts documented behaviour. Semantically, yes, the operation is cancelled by the user and in this way returning GIT_EUSER may make some sense, but the user never returned this error code from any callback so this behaviour is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we should clarify this in the documentation.

* Add new error codes above this comment and update GIT_EMIN
*/

GIT_EMIN = GIT_ECANCEL /**< Minimum error code ever generated by libgit2 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this useful?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure if the user does want to use other return codes than just GIT_EUSER, that these don't collide with libgit2 internal ones. However, as I said, I'm not particularly insistent on this.

include/git2/remote.h Outdated Show resolved Hide resolved
include/git2/remote.h Outdated Show resolved Hide resolved
include/git2/remote.h Outdated Show resolved Hide resolved
@@ -7,12 +7,12 @@
#ifndef INCLUDE_git_version_h__
#define INCLUDE_git_version_h__

#define LIBGIT2_VERSION "0.28.0"
#define LIBGIT2_VERSION "0.29.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to update this yet. We update the soversion on every release that has breaking ABI changes which, realistically, is every release. Practically speaking, this will be in conflict with our next release soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new field to git_remote_callbacks which is a member of git_push_options and git_fetch_options.
From the perspective of applications compiled against the old API, member "prune"
and all subsequent ones in git_fetch_options have an offset of 8 bytes on 64-bit systems.
I am pretty sure this breaks the ABI and would make a version bump necessary for the soversion.

If you kindly would consider reserving space for this one callback in this place for your newer releases, we might not need to bump version another time if we ever get this so far that this can be merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely breaks the ABI but the rhythm is to update that when we release a new version. We’ll definitely bump the soversion at that time.

@ethomson
Copy link
Member

One thing that might be useful is an example of what the caller will see with this API. If I'm reading this correctly, it looks to me like the caller will invoke connect, then get back a file descriptor, and call select / epoll / kqueue / etc on that, and then call git_remote_perform back with the results of that poll?

Is the reason that you are putting the onus on the end-user here because you want to do a select on our file descriptors and your own?

@thiloschulz
Copy link
Author

thiloschulz commented Jan 20, 2020

Thank you so much for reviewing this so quickly. I realize that this is work on your part as well, and this is not to be taken for granted.

One thing that might be useful is an example of what the caller will see with this API. If I'm reading this correctly, it looks to me like the caller will invoke connect, then get back a file descriptor, and call select / epoll / kqueue / etc on that, and then call git_remote_perform back with the results of that poll?

Yes, you have read this exactly right. You can see the perform_all() function in remote.c, which would be used by libgit2's blocking API internally. A user would have to do this himself in the non-blocking case. So you can look at perform_all() already as an example of what the user would have to do. Just wait for the events libgit2 tells the user to wait for, and then call git_remote_perform() as long as it returns GIT_EAGAIN.
After that, the operation, be it connect, push or pull, is finished.

I have only shown how the git_remote_connect() call can be made to exhibit non-blocking features. Please undestand that I hacked this together yesterday over the course of a few hours and was meant as a first example. What is missing from this version so far, is converting git_remote_fetch() / _push(), and all other blocking API calls in a similar manner as I have demonstrated for git_remote_connect().
Thus, if the user never sets the file descriptor callback in git_remote_callbacks which is the default, nothing ever changes for programmers using of libgit2, and these calls block for activity just as they've always done.

Is the reason that you are putting the onus on the end-user here because you want to do a select on our file descriptors and your own?

Yes, this is exactly correct. Only a non-blocking API enables the end-user to block for other events than just the ones that libgit2 is interested in.
As I have said: With things in libgit2 as they are, I see no way to cleanly abort a networking operation that happens to be stuck because a connection momentarily has 100% packet loss. This may happen at any point of a network operation. Say, you're blocking to receive 100 bytes and after getting 40 bytes the connection freezes. This freezes your whole application until the operating system's ip stack mercifully returns an error code. These timeouts can be quite long and may range in minutes.
In my case, I have a connection window as a modal dialog with a "Cancel" button. Yes, libgit2 supports callbacks at various points during a remote push or pull operation. But if libgit2 happens to be stuck in a system call like read() or write() my users are out of luck and (on windows) must kill their GUI via Ctrl+Alt+Del; instead of having a working cancel button they can click whenever the hell they like. In 2020, freezing during network i/o may be acceptable for command line tools that can be killed via CTRL+C, i.e. SIGINT, but certainly not for GUI applications.
We can work around this by maybe spawning another process we can kill at any time and communicating via pipes or whatever. But this solution is highly operating system dependent, and we're placing the onus on the user of libgit2 to work around something that libgit2 could do better.

@ethomson
Copy link
Member

Yes, this is exactly correct. Only a non-blocking API enables the end-user to block for other events than just the ones that libgit2 is interested in

Sure, I understand that - my question is why you’re exposing file descriptors to the end user and making them call poll. That’s not the sort of API that we usually build - it feels like it’s leaking the abstractions from below - and so I’m trying to intuit whether you built this because you already had a select loop that you were extending or because it felt natural, etc.

The reason I ask is because my thinking is that there are probably other APIs besides network I/O that could benefit from a polling style api - I could imagine packbuilding could be asynchronous for example. And it might be nice to move to something more abstract than file descriptors.

@thiloschulz
Copy link
Author

thiloschulz commented Jan 21, 2020

That’s not the sort of API that we usually build - it feels like it’s leaking the abstractions from below - and so I’m trying to intuit whether you built this because you already had a select loop that you were extending or because it felt natural, etc.

Well, first of all, file descriptors are only exposed if the end-user explicitly requests non-blocking i/o.
And in this case, exposing file descriptors is actually standard for all libraries out there that do networking. Libcurl does it, so does libssh, libssh2 or any other library from the POSIX world.

See:
https://curl.haxx.se/libcurl/c/CURLMOPT_SOCKETFUNCTION.html
http://api.libssh.org/stable/group__libssh__poll.html#ga41d63ffe950a48e8b2c513877e0cd6b4
https://www.libssh2.org/libssh2_session_startup.html

my question is why you’re exposing file descriptors to the end user and making them call poll.

I'm not sure whether you are concerned about file descriptors themselves, or about the concept that a user needs to call select/poll himself.
To the latter: This is not about "libgit2 forces the end-user to use select()/poll()/WaitForMultipleObjects() et. al." but rather empowering the end-user to use it, if he so wishes. This is a conscious choice the end-user makes. If the user says "I want to do event handling myself, because I must process other events just than those from libgit2", at some point, he needs to call one of those syscalls to give back control to the operating system if he does not wish to busywait.
To the former: Thus, at some point, an end-user using the non-blocking API needs some kind of handle that the operating system understands what to do with when yielding control back to the task scheduler. And those are file descriptors in POSIX, and HANDLE on Windows. On Windows, it looks like you have a 1:1 mapping between the two:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/get-osfhandle?view=vs-2019
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-osfhandle?view=vs-2019

And it might be nice to move to something more abstract than file descriptors.

Yes, I have thought about this as well, as Windows' natural data type for its event subsystem is HANDLE, but then file descriptors are a universal concept that even Windows understands. So really it's a choice between forcing ALL users to break libgit2's abstraction down to file descriptors/HANDLEs again for the syscalls, as opposed to only making users on Windows call _get_osfhandle().

The reason I ask is because my thinking is that there are probably other APIs besides network I/O that could benefit from a polling style api - I could imagine packbuilding could be asynchronous for example.

This is a different thing than networking I/O. It does not depend on any outside events to occur but it is just about yielding control to the library using application.

…h() support asynchroneous networking operations

- Add GIT_EBUSY error code that is returned if git_remote_fetch() or _upload() etc is called on a remote where an operation is already in progress
- Remove git_remote_connection_opts structure and keep a copy of its members directly in git_remote structure
- Add a few missing NULL pointer assertions for git_remote parameters
…nt_cb and its callback reference all the time
Base automatically changed from master to main January 7, 2021 10:09
@ethomson ethomson dismissed a stale review via abe81a2 July 27, 2021 14:18
@ethomson
Copy link
Member

We should now support non blocking operations on sockets, and support timeouts on connect, read, and write, as of #6535 - closing this but please do let me know if there's more work that we should consider.

@ethomson ethomson closed this Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants