[WIP] Asynchroneous network API #5366

thiloschulz · 2020-01-19T23:25:51Z

I have begun work on extending the API to support non-blocking network operations. So far, this is only a proof of concept to demonstrate that making libgit2 able to perform network operations asynchroneously is indeed feasible.

To enable async I/O, one only needs to set the set_fd_events member in the git_remote_callbacks structure. In this case, functions that wait for network activity return with the newly introduced error code GIT_EAGAIN
The set_fd_events() callback is used to signal to the application which type of events to wait for, and the application may call git_remote_perform() to continue processing with the new events that arrived.

If the set_fd_events is set to NULL (default), you get the old synchroneous behaviour.

For now, all asynchroneous networking functions are stubs only. Please take a look at git_remote_connect(), which demonstrates how the synchroneous API is wrapped around the async API.
To breathe life into the async API, the backends need considerable structural change to be made completely asynchroneous.

I have added a few members to the internal git_remote structure - some may be unnecessary, as they're encoded inside their respective transport structure as well. This is just work in progress, so they may just as well be removed again if it shows we don't need them.
Also, adding new callback function set_fd_events breaks ABI compatibility, so I bumped the version number of the .so to 29

…roneous I/O. Replaces previously internal GIT_SOCKET type

- Introduce GIT_ECANCEL for operations that were cancelled by the application, e.g. from callbacks - Replace GIT_EUSER with GIT_ECANCEL in smart transport protocol. As per documentation, GIT_EUSER is never generated from inside libgit2 - as such we may not use that error code. - Add GIT_EMIN to define the minimum error code ever used by libgit2

… legacy software, however, the ABI will not. - Bump library version number as ABI is incompatible

…Most of these are still stubs and do nothing. Note how API should stay compatible.

ethomson · 2020-01-20T11:21:51Z

Thanks for opening this PR, I'm looking forward to the discussion. I'm going to make a first pass to provide some insight into our style and provide some of the thinking behind the way we work in libgit2 that are not necessarily obvious to new contributors. These are things that we should document better in a new contributor guide. There's no need to respond to these immediately; I think that there's more value in discussing the architecture and the overall direction that we want to take.

But I need to meditate more upon this before I have feedback, so I wanted to make some notes on the stylistic stuff first, while I think about the other aspects.

ethomson · 2020-01-20T11:24:19Z

include/git2/errors.h

@@ -31,6 +31,8 @@ typedef enum {
 	 * GIT_EUSER is a special error that is never generated by libgit2
 	 * code.  You can return it from a callback (e.g to stop an iteration)
 	 * to know that it was generated by the callback and not by libgit2.
+	 *
+	 * You may also use error codes below GIT_EMIN for the same purpose.


We generally don't suggest that end-users use our exit codes from callbacks. Obviously, there's nothing stopping them, but our general rule is that a user callback should return GIT_EUSER to cancel the function.

libgit2 only uses a very small amount of error codes - I think it might be beneficial for applications to be able to return other codes than just GIT_EUSER. After all, a callback may have many different reasons for triggering an abort. It is only a suggestion. I'm not particularly insistent on this.

They definitely can but we expect them to store that state themselves. We don’t make any guarantees that - if a user returned a -3 that we wouldn’t try to do something based on that result, thinking that we had returned it down in the stack. This isn’t something that I expect that we do anywhere but the only guarantees we make are around GIT_EUSER

ethomson · 2020-01-20T11:24:44Z

include/git2/errors.h

@@ -58,6 +60,14 @@ typedef enum {
 	GIT_EMISMATCH       = -33,	/**< Hashsum mismatch in object */
 	GIT_EINDEXDIRTY     = -34,	/**< Unsaved changes in the index would be overwritten */
 	GIT_EAPPLYFAIL      = -35,	/**< Patch application failed */
+	GIT_EAGAIN          = -36,	/**< Operation would block */
+	GIT_ECANCEL         = -37,	/**< Operation was cancelled by application */


How is GIT_ECANCEL different from GIT_EUSER?

The documentation for GIT_EUSER says:

GIT_EUSER is a special error that is never generated by libgit2
code. You can return it from a callback (e.g to stop an iteration)
to know that it was generated by the callback and not by libgit2.

However, if you look at mainline src/transports/smart_protocol.c, you can see that GIT_EUSER is in fact generated by libgit2 after a call to git_remote_stop().
This contradicts documented behaviour. Semantically, yes, the operation is cancelled by the user and in this way returning GIT_EUSER may make some sense, but the user never returned this error code from any callback so this behaviour is confusing.

Indeed, we should clarify this in the documentation.

ethomson · 2020-01-20T11:24:54Z

include/git2/errors.h

+         * Add new error codes above this comment and update GIT_EMIN
+         */
+
+	GIT_EMIN            = GIT_ECANCEL /**< Minimum error code ever generated by libgit2 */


Why is this useful?

This makes sure if the user does want to use other return codes than just GIT_EUSER, that these don't collide with libgit2 internal ones. However, as I said, I'm not particularly insistent on this.

include/git2/remote.h

ethomson · 2020-01-20T11:32:25Z

include/git2/version.h

@@ -7,12 +7,12 @@
 #ifndef INCLUDE_git_version_h__
 #define INCLUDE_git_version_h__

-#define LIBGIT2_VERSION "0.28.0"
+#define LIBGIT2_VERSION "0.29.0"


There's no need to update this yet. We update the soversion on every release that has breaking ABI changes which, realistically, is every release. Practically speaking, this will be in conflict with our next release soon.

I have added a new field to git_remote_callbacks which is a member of git_push_options and git_fetch_options.
From the perspective of applications compiled against the old API, member "prune"
and all subsequent ones in git_fetch_options have an offset of 8 bytes on 64-bit systems.
I am pretty sure this breaks the ABI and would make a version bump necessary for the soversion.

If you kindly would consider reserving space for this one callback in this place for your newer releases, we might not need to bump version another time if we ever get this so far that this can be merged.

Definitely breaks the ABI but the rhythm is to update that when we release a new version. We’ll definitely bump the soversion at that time.

ethomson · 2020-01-20T11:37:59Z

One thing that might be useful is an example of what the caller will see with this API. If I'm reading this correctly, it looks to me like the caller will invoke connect, then get back a file descriptor, and call select / epoll / kqueue / etc on that, and then call git_remote_perform back with the results of that poll?

Is the reason that you are putting the onus on the end-user here because you want to do a select on our file descriptors and your own?

thiloschulz · 2020-01-20T18:55:19Z

Thank you so much for reviewing this so quickly. I realize that this is work on your part as well, and this is not to be taken for granted.

One thing that might be useful is an example of what the caller will see with this API. If I'm reading this correctly, it looks to me like the caller will invoke connect, then get back a file descriptor, and call select / epoll / kqueue / etc on that, and then call git_remote_perform back with the results of that poll?

Yes, you have read this exactly right. You can see the perform_all() function in remote.c, which would be used by libgit2's blocking API internally. A user would have to do this himself in the non-blocking case. So you can look at perform_all() already as an example of what the user would have to do. Just wait for the events libgit2 tells the user to wait for, and then call git_remote_perform() as long as it returns GIT_EAGAIN.
After that, the operation, be it connect, push or pull, is finished.

I have only shown how the git_remote_connect() call can be made to exhibit non-blocking features. Please undestand that I hacked this together yesterday over the course of a few hours and was meant as a first example. What is missing from this version so far, is converting git_remote_fetch() / _push(), and all other blocking API calls in a similar manner as I have demonstrated for git_remote_connect().
Thus, if the user never sets the file descriptor callback in git_remote_callbacks which is the default, nothing ever changes for programmers using of libgit2, and these calls block for activity just as they've always done.

Is the reason that you are putting the onus on the end-user here because you want to do a select on our file descriptors and your own?

Yes, this is exactly correct. Only a non-blocking API enables the end-user to block for other events than just the ones that libgit2 is interested in.
As I have said: With things in libgit2 as they are, I see no way to cleanly abort a networking operation that happens to be stuck because a connection momentarily has 100% packet loss. This may happen at any point of a network operation. Say, you're blocking to receive 100 bytes and after getting 40 bytes the connection freezes. This freezes your whole application until the operating system's ip stack mercifully returns an error code. These timeouts can be quite long and may range in minutes.
In my case, I have a connection window as a modal dialog with a "Cancel" button. Yes, libgit2 supports callbacks at various points during a remote push or pull operation. But if libgit2 happens to be stuck in a system call like read() or write() my users are out of luck and (on windows) must kill their GUI via Ctrl+Alt+Del; instead of having a working cancel button they can click whenever the hell they like. In 2020, freezing during network i/o may be acceptable for command line tools that can be killed via CTRL+C, i.e. SIGINT, but certainly not for GUI applications.
We can work around this by maybe spawning another process we can kill at any time and communicating via pipes or whatever. But this solution is highly operating system dependent, and we're placing the onus on the user of libgit2 to work around something that libgit2 could do better.

ethomson · 2020-01-21T00:40:50Z

Yes, this is exactly correct. Only a non-blocking API enables the end-user to block for other events than just the ones that libgit2 is interested in

Sure, I understand that - my question is why you’re exposing file descriptors to the end user and making them call poll. That’s not the sort of API that we usually build - it feels like it’s leaking the abstractions from below - and so I’m trying to intuit whether you built this because you already had a select loop that you were extending or because it felt natural, etc.

The reason I ask is because my thinking is that there are probably other APIs besides network I/O that could benefit from a polling style api - I could imagine packbuilding could be asynchronous for example. And it might be nice to move to something more abstract than file descriptors.

thiloschulz · 2020-01-21T08:04:11Z

That’s not the sort of API that we usually build - it feels like it’s leaking the abstractions from below - and so I’m trying to intuit whether you built this because you already had a select loop that you were extending or because it felt natural, etc.

Well, first of all, file descriptors are only exposed if the end-user explicitly requests non-blocking i/o.
And in this case, exposing file descriptors is actually standard for all libraries out there that do networking. Libcurl does it, so does libssh, libssh2 or any other library from the POSIX world.

See:
https://curl.haxx.se/libcurl/c/CURLMOPT_SOCKETFUNCTION.html
http://api.libssh.org/stable/group__libssh__poll.html#ga41d63ffe950a48e8b2c513877e0cd6b4
https://www.libssh2.org/libssh2_session_startup.html

my question is why you’re exposing file descriptors to the end user and making them call poll.

I'm not sure whether you are concerned about file descriptors themselves, or about the concept that a user needs to call select/poll himself.
To the latter: This is not about "libgit2 forces the end-user to use select()/poll()/WaitForMultipleObjects() et. al." but rather empowering the end-user to use it, if he so wishes. This is a conscious choice the end-user makes. If the user says "I want to do event handling myself, because I must process other events just than those from libgit2", at some point, he needs to call one of those syscalls to give back control to the operating system if he does not wish to busywait.
To the former: Thus, at some point, an end-user using the non-blocking API needs some kind of handle that the operating system understands what to do with when yielding control back to the task scheduler. And those are file descriptors in POSIX, and HANDLE on Windows. On Windows, it looks like you have a 1:1 mapping between the two:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/get-osfhandle?view=vs-2019
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-osfhandle?view=vs-2019

And it might be nice to move to something more abstract than file descriptors.

Yes, I have thought about this as well, as Windows' natural data type for its event subsystem is HANDLE, but then file descriptors are a universal concept that even Windows understands. So really it's a choice between forcing ALL users to break libgit2's abstraction down to file descriptors/HANDLEs again for the syscalls, as opposed to only making users on Windows call _get_osfhandle().

The reason I ask is because my thinking is that there are probably other APIs besides network I/O that could benefit from a polling style api - I could imagine packbuilding could be asynchronous for example.

This is a different thing than networking I/O. It does not depend on any outside events to occur but it is just about yielding control to the library using application.

…h() support asynchroneous networking operations - Add GIT_EBUSY error code that is returned if git_remote_fetch() or _upload() etc is called on a remote where an operation is already in progress - Remove git_remote_connection_opts structure and keep a copy of its members directly in git_remote structure - Add a few missing NULL pointer assertions for git_remote parameters

…s yet)

…tions

…uring states for file descriptors

…nt_cb and its callback reference all the time

…still uses synchroneous getaddrinfo(), however)

ethomson · 2024-02-20T08:08:19Z

We should now support non blocking operations on sockets, and support timeouts on connect, read, and write, as of #6535 - closing this but please do let me know if there's more work that we should consider.

thiloschulz added 6 commits January 19, 2020 19:19

- Introduce new API type git_socket in preparation for work on asynch…

17ff272

…roneous I/O. Replaces previously internal GIT_SOCKET type

- Start work on API and document changes. API will be compatible with…

58675af

… legacy software, however, the ABI will not. - Bump library version number as ABI is incompatible

Create skeleton for asynchroneous API and synchroneous API wrappers. …

3a7a0dd

…Most of these are still stubs and do nothing. Note how API should stay compatible.

Fix CI build errors

a24422f

More CI error fixes

1cd8c19

ethomson requested changes Jan 20, 2020

View reviewed changes

thiloschulz added 3 commits January 20, 2020 23:43

Fix a few more CI bugs, and a NULL pointer dereference

b9c1d94

Add winsock2.h include for MinGW as well

6c4718d

Return value of select() is an integer

f583c2d

thiloschulz added 15 commits February 1, 2020 22:33

Merge branch 'master' into async

5a96ccf

Merge branch 'master' of https://github.com/libgit2/libgit2 into async

b720c5c

Merge branch 'master' of https://github.com/libgit2/libgit2 into async

092eafc

Merge branch 'master' of https://github.com/libgit2/libgit2 into async

43db1d0

Merge remote-tracking branch 'libgit2' into async

6e59443

Fix missing parentheses in ARRAY_SIZE() macro

92c915a

Add support for asynchroneous connect (not supported by any transport…

f217631

…s yet)

Make git_remote_download() and git_remote_fetch() asynchroneous opera…

c9ff7fb

…tions

Make git_remote_upload() and git_remote_push() asynchroneous operations

b62d081

Fix return code not < 0 in clone online test

f249527

Add fd_events_cb to transport callbacks which will be used for config…

3a82619

…uring states for file descriptors

Add async I/O support to git_streams API (not implemented yet, though)

ed019c1

Pass pointer to remote structure instead of always passing the fd_eve…

04cb756

…nt_cb and its callback reference all the time

Merge branch 'master' of https://github.com/libgit2/libgit2 into async

660bc57

thiloschulz added 7 commits May 10, 2020 22:35

Set fd event callback to NULL after perform_all is finished

9c1ed26

Add code for setting sockets blocking/nonblocking

df86c09

Make git_stream connect() operation asynchroneous (nameserver lookup …

627d72b

…still uses synchroneous getaddrinfo(), however)

Merge branch 'master' of https://github.com/libgit2/libgit2 into async

3d42a77

Rename deprecated git_strarray_free() to git_strarray_dispose()

5982048

Make smart connect asynchronous and more work on async ssh connect

898a190

Fix double free()

abe81a2

Base automatically changed from master to main January 7, 2021 10:09

ethomson dismissed a stale review via abe81a2 July 27, 2021 14:18

ethomson closed this Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Asynchroneous network API #5366

[WIP] Asynchroneous network API #5366

thiloschulz commented Jan 19, 2020

ethomson commented Jan 20, 2020

ethomson Jan 20, 2020

thiloschulz Jan 20, 2020

ethomson Jan 21, 2020

ethomson Jan 20, 2020

thiloschulz Jan 20, 2020 •

edited

ethomson Jan 21, 2020

ethomson Jan 20, 2020

thiloschulz Jan 20, 2020

ethomson Jan 20, 2020

thiloschulz Jan 20, 2020 •

edited

ethomson Jan 21, 2020

ethomson commented Jan 20, 2020

thiloschulz commented Jan 20, 2020 •

edited

ethomson commented Jan 21, 2020

thiloschulz commented Jan 21, 2020 •

edited

ethomson commented Feb 20, 2024

[WIP] Asynchroneous network API #5366

[WIP] Asynchroneous network API #5366

Conversation

thiloschulz commented Jan 19, 2020

ethomson commented Jan 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiloschulz Jan 20, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiloschulz Jan 20, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethomson commented Jan 20, 2020

thiloschulz commented Jan 20, 2020 • edited

ethomson commented Jan 21, 2020

thiloschulz commented Jan 21, 2020 • edited

ethomson commented Feb 20, 2024

thiloschulz Jan 20, 2020 •

edited

thiloschulz Jan 20, 2020 •

edited

thiloschulz commented Jan 20, 2020 •

edited

thiloschulz commented Jan 21, 2020 •

edited