Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support libuv pipes for custom socket on platforms with UDS #18435

Closed
wants to merge 1 commit into from

Conversation

jkryl
Copy link

@jkryl jkryl commented Mar 19, 2019

Unix domain socket cannot be used with libuv custom socket implementation, although that libuv supports named pipes which on unix-like platforms translate to unix domain sockets (UDS). It is not that difficult to extend current libuv tcp backend to handle UDS in addition to TCP sockets.

Although that libuv provides platform independent implementation of named pipes, grpc does not. So the code added in this pull request has to be ifdef'd in GRPC_HAVE_UNIX_SOCKET blocks, which are omitted when grpc is built on windows. On the other platforms the way how the type of a socket, which should be used, is determined is simple. If the host name begins with slash, it is treated as a path to UDS. In all other cases the original code path gets executed (implicit socket type is tcp).

I have tested the code using UDS client and server in nodejs (nodejs uses bindings to grpc C lib with libuv custom socket). The only OS where I have tested the changes is Linux.

@nicolasnoble
Copy link
Member

I'd rather see whole new tests instead of having half of the code defined out. We try hard not to have portions of our code be disabled using #ifdef, as it increases complexity and decreases readability. Most of the time when we have an #ifdef in the middle of the code, it's because we really had no other option. In the other cases, what we do is create a new file, and protect the whole file using an ifdef, so that it's basically all of the code or nothing.

See for instance this code:

https://github.com/grpc/grpc/blob/master/src/core/lib/gpr/tmpfile_windows.cc

uv_socket_t* uv_socket = (uv_socket_t*)socket->impl;
int err = uv_tcp_getsockname(uv_socket->handle,
(struct sockaddr*)IGNORE_CONST(addr), addr_len);
if (uv_socket->handle_type == HANDLE_TYPE_PIPE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split this out to a helper function. You're already having a duplicate of this a few lines above, and the readability is seriously decreasing due to the added #ifdef in the middle.

Please consider defining two helper functions that are mutually exclusive with an #ifdef around them.

Copy link
Member

@murgatroid99 murgatroid99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, is it necessary to disable this code on Windows? The uv_pipe_t type exists across platforms. I expect that on Windows it will simply reject the UDS name and fail to construct the socket.

#define HANDLE_TYPE_TCP 0
#define HANDLE_TYPE_PIPE 1
#ifdef GRPC_HAVE_UNIX_SOCKET
#include <sys/un.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you using this header? From a quick look through the file I don't see anything that I would expect to be left out of the libuv headers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys/un.h holds the definition of struct sockaddr_un, which is not defined in libuv headers.

@@ -27,6 +27,7 @@
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/un.h>
#include <unistd.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unistd.h declares unlink() which is used in this file.

@nicolasnoble
Copy link
Member

Overall, is it necessary to disable this code on Windows? The uv_pipe_t type exists across platforms. I expect that on Windows it will simply reject the UDS name and fail to construct the socket.

That's a valid point, and overall, there's a lot of complexity added with #ifdef that seem to simply try to disable the feature under Windows, whereas libuv should simply be able to accept these calls under Windows.

@jkryl
Copy link
Author

jkryl commented Mar 20, 2019

@nicolasnoble and @murgatroid99 thanks for looking at the changes! I understand the concern about scattered ifdef's throughout the files. I can try to minimise their number by putting all UDS specific code to a new file wrapped in ifdef as @nicolasnoble has suggested.

As for the second suggestion of removing all ifdefs entirely because uv_pipe_t works on all platforms - that's unlikely going to work. libuv is fine, but grpc is not. I cannot use socket_un structure and AF_UNIX constant on windows and grpc lib does not provide abstraction layer to hide differences between UDS and named pipes on windows (unlike libuv). Also I know nothing about windows and cannot even test such code. So I think that the most safe way is to simply disable this code on windows unless there is a volunteer who would like to help with this part.

If you are fine with putting UDS specific code to a new file specific to unix-like OS's and creating yet another file with mostly just stubs returning errors for windows, then let me know and I will try to refactor the code. thanks

@nicolasnoble
Copy link
Member

nicolasnoble commented Mar 20, 2019

The only moment you have to actually interact with the socket_un structure is the getpeername code, right?

@nicolasnoble
Copy link
Member

nicolasnoble commented Mar 20, 2019

Digging a bit, the getpeername API is horrendous. The only two locations of code where we actively use it is to transform it immediately into a string:

peer_name_string = grpc_sockaddr_to_uri(&peer_name);

and in

grpc_string_to_sockaddr(&c_addr, hostname, peer[1])

Maybe we should refactor this to always return a string instead? That would avoid exposing system-dependent structures into a system-agnostic API.

@jkryl
Copy link
Author

jkryl commented Mar 20, 2019

Indirect use of socket_un is also in uv_resolve_async() and uv_resolve(), but it's not apparent because we make use of grpc_resolve_unix_domain_address() which hides the manipulation with socket_un. On windows the function is just a stub returning "not supported" error. That's actually the approach I thought we would be taking with libuv UDS as well. Another use of socket_un is in uv_socket_bind() btw.

And AF_UNIX? I guess that neither that one is defined on windows. So we would need to eliminate use of that one too if we want to have exactly the same code on all platforms without ifdef's.

@nicolasnoble
Copy link
Member

AF_UNIX is supposed to be defined in the windows headers, interestingly. Even if it's not supported, it's defined.

@jkryl
Copy link
Author

jkryl commented Mar 21, 2019

I have updated the fix. I removed all ifdefs in tcp_uv code. I make use of existing unix_sockets_posix.c and unix_sockets_posix_noop.c files for platform dependent code related to UDS.

@nicolasnoble nicolasnoble added release notes: no Indicates if PR should not be in release notes kokoro:run labels Mar 21, 2019
@jkryl
Copy link
Author

jkryl commented Mar 26, 2019

@nicolasnoble : there was an issue with building grpc without libuv which I think that should be fixed now. Is there a way how to trigger the test run again? thanks

@adon-at-work
Copy link

This feature will be very useful for us. Please re-take a look at this PR again. Thanks :)

@jkryl
Copy link
Author

jkryl commented May 7, 2019

There were some compiler warnings (on some platforms) last time when I checked. I will try to fix those, when I have some time for that. btw the fact that I always need to update PR and bother someone from google to run the tests for me, makes the dev cycle really slow. It is pity that the internal CI system is not a bit more friendly to external contributers.

I have created a npm package with my changes, which is used by us as a temporary workaround until this fix is pushed and it works great so far. You may want to give it a try: https://www.npmjs.com/package/grpc-uds .

@nicolasnoble
Copy link
Member

It's unfortunately due to the security policy here where we're not allowed to run code written by an external contributor automatically, sorry...

return error;
*port = nullptr;
// all hosts beginning with / are treated as unix domain sockets
if (name[0] == '/') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for determining whether a name is a UDS name and for handling those names should be the same as in the other resolver implementation(s). See https://github.com/grpc/grpc/blob/master/src/core/lib/iomgr/resolve_address_posix.cc#L57.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. thanks 👍

@jkryl
Copy link
Author

jkryl commented May 15, 2019

I was trying to fix following error:

[CXX]     Compiling src/core/lib/iomgr/tcp_client_custom.cc
src/core/lib/iomgr/tcp_client_custom.cc: In function ‘void tcp_connect(grpc_closure*, grpc_endpoint**, grpc_pollset_set*, const grpc_channel_args*, const grpc_resolved_address*, grpc_millis)’:
src/core/lib/iomgr/tcp_client_custom.cc:137:52: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
       socket, ((grpc_sockaddr*)resolved_addr->addr)->sa_family);
                                                    ^
cc1plus: all warnings being treated as errors
make: *** [/tmpfs/src/github/grpc/objs/opt/src/core/lib/iomgr/tcp_client_custom.o] Error 1

and I can't reproduce it. I tried gcc-7.4 and gcc-8.3 with explicit -Wstrict-aliasing -Werror flags but this particular file compiles without any warnings or errors. Any ideas what I need to do? Maybe the CI/CD is using clang?

@adon-at-work
Copy link

@jkryl i was able to reproduce the error with the official Node docker image (image: node). Please try:

# docker-compose up

version: '3'
services:
  grpc-uds:
    image: node
    tty: true
    volumes:
      - ./:/home/node
    entrypoint:
      - /bin/bash
      - -c
      - "npm i && ..."

@jkryl
Copy link
Author

jkryl commented Jun 26, 2019

@murgatroid99 I rebased on top of latest master, so should be ok now. As for the test failures, most of them happened on iOS platform. I'm not sure how I can run these tests and verify them. Thanks for helping out with evaluating the test results!

@jkryl
Copy link
Author

jkryl commented Jul 3, 2019

I did another rebase because there was a new conflict.

@jkryl
Copy link
Author

jkryl commented Jul 8, 2019

@murgatroid99 can we run kokoro tests again and evaluate the failures before there is another merge conflict? Thanks

@jkryl
Copy link
Author

jkryl commented Jul 9, 2019

Thanks @jtattermusch for triggering the tests!

@murgatroid99 : It looks I was lucky this time. I got just one failure again on macos platform, test suite grpc_basictests_node, with this being written at the end of the log:

rsync warning: some files vanished before they could be transferred (code 24) at main.c(1677) [generator=3.1.3]
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[06:24:38] Collecting build artifacts from build VM
Build script failed with exit code: 1

this looks again like a problem with rsync, but I don't want to be too eager when ignoring the error.

@jkryl
Copy link
Author

jkryl commented Jul 18, 2019

Hey guys, it looks we are not moving anywhere, so what's next?

My first question would be, is there anything fundamentally wrong with the PR why you are not willing to accept it? Or maybe the new version of native grpc for node which has no C++ bindings is going to solve this problem soon? Those could be valid reasons to reject the PR among others.

If the approach I have taken is ok and there are no comments (which seems to be the case), then I would appreciate your help with analysing the test case failure. in a sense if it can be ignored and if not what are the steps to reproduce it in my local setup or suggestions what I could do when trying to find the root cause.

thanks!

@linux-foundation-easycla
Copy link

CLA Check
One or more committers are not authorized under a signed CLA as indicated below. Please click here to be authorized.

@travisghansen
Copy link

UDS support is a must have for me..what's lacking here to get it over the hill? Anywhere we can help?

@jkryl
Copy link
Author

jkryl commented Nov 22, 2019

@travisghansen: I can rebase my PR on top of the latest master commit and ask them to run the tests again like I did a couple of times before. But without a real interest from maintainers of this project, it would be just a waste of time. So I guess that's the real blocker in this case.

@travisghansen
Copy link

@jkryl understood. I noticed you have a fork on npm, can you share example bind/connect strings/syntax?

@jkryl
Copy link
Author

jkryl commented Dec 1, 2019

Hi @travisghansen , I'm sorry for late reply. Basically you just replace the IP address or hostname by a path to the socket. taking an example from the project where we use it:

function createCsiClient(service) {
  const pkgDef = grpc.loadPackageDefinition(
    protoLoader.loadSync(
      path.join(__dirname, '..', 'csi', 'proto', 'csi.proto'),
      {
        // this is to load google/descriptor.proto
        includeDirs: ['./node_modules/protobufjs'],
        keepCase: true,
        longs: String,
        enums: String,
        defaults: true,
        oneofs: true,
      }
    )
  );
  const proto = pkgDef.csi.v1;
  return new proto[service]('/tmp/mayastor.sock', grpc.credentials.createInsecure());
}

And IIRC the same holds for the server. If you want to see the whole source code here it is https://github.com/openebs/MayaStor/blob/master/mayastor-test/test_csi.js.

@travisghansen
Copy link

Interesting, I'm doing csi as well and just met up with some of your co-workers in San Diego about collaborating on a couple things.

Thanks for the info, I did figure it out but think I went with the unix:// prefix. How do you clean up the socket file when process is dead? I couldn't seem to find much documentation about available events/etc

@jkryl
Copy link
Author

jkryl commented Dec 1, 2019

@travisghansen : nice! :-) We currently don't clean up the socket file although it would not be probably difficult to unlink the socket file in the process exit handler. This would work only if the process was shut down cleanly. Instead, we remove any stale socket file when the server is started which works in 100% cases (https://github.com/openebs/MayaStor/blob/master/csi/moac/csi.js#L150). btw you are more than welcomed to join our OpenEBS slack (openebs-community.slack.com, MayaStor channel) to discuss the implementation in greater depth if interested. cheers

@travisghansen
Copy link

Ah, I'm in sig-storage. Interesting that it's written in node (the server). My project is more of a framework and may make sense to review. Basically it's meant to be a generic infrastructure to write drivers/plugins with so most of the boiler plate stuff is done.

Where we're looking to collaborate mostly is on the node (csi) side of the equation to create uniform driver that can handle all the iscsi, mount, whatever garbage without everyone rewriting all that all the time. That's done and it should be capable of deploying with params being injected to dynamically set whatever capabilities you want to broadcast and that sort of thing.

I've got libs for zfs (both local and over ssh), ssh clients, http clients, and ansible all imported. So you can leverage those to handle whatever is needed.

I initially wrote it for connecting to FreeNAS (which is done) but shortly intend to write a plugin that will make it work with any zfs based generic server (ie: ubuntu with zol installed, or pure freebsd) and share over (nfs/iscsi/samba/whatever). The logic for zfs is all split from the FreeNAS specific bits so that'll be trivial. After that I want to do an 'aggregate' driver which would work with any combination of plugins in the project. After that I'll be looking at a plugin which handles ephemeral/inline volumes with zfs.

I've also put together a helm chart which should be able to handle pretty much any deployment scenario.

@faern
Copy link

faern commented Jan 17, 2020

I could really use UDS + named pipes support. So +1 on this PR from me!

Ultimately I don't think it scales to add one transport at a time really. To be truly flexible a library must be opened up to be used for anything. The tonic library in Rust does a great job of that. It accepts any transport object that implements a given read+write interface. I'm currently using it to try to talk gRPC over a virtual serial port (never assume how your users will use something :D )

@lightsofapollo
Copy link

This would be very useful in some of our test cases where we're using grpc more like an IPC bridge than anything else.

@stale
Copy link

stale bot commented May 6, 2020

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!

@tfar
Copy link

tfar commented Oct 29, 2020

@jkryl What's the state of your PR? I wonder why it was not merged but just went stale?

@jkryl
Copy link
Author

jkryl commented Oct 29, 2020

It was decided to fix this in new JS-native grpc library and leave the old one based on C++ library as it is. So that's the reason why PR has never been merged. If you use the native JS library, beware that there is an interoperability issue between go grpc lib and JS native lib. go grpc client lib was violating http2 standard (authority header) when using UDS as a transport protocol. That has been already fixed in go grpc lib.

@ArtGangsta
Copy link

AF_UNIX is supposed to be defined in the windows headers, interestingly. Even if it's not supported, it's defined.

It's supported under Windows: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition/stale kokoro:run release notes: no Indicates if PR should not be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet