New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AF_NETLINK
address family
#2980
Comments
Okay, it looks like this is triggered from the go standard library call to get addresses for network interfaces. For the client and proxy, this happens before they bind to a UDP address for candidate creation. Peers will automatically try to bind to all available addresses and apparently they do that by looking up addresses assigned to each interface. |
Would you be able to run the client or proxy under strace outside of shadow and include the related syscalls? Probably starting with something like This is probably something that would be good for Shadow to support since other applications use it as well, and I don't think shadow would need any architectural changes / rewriting of anything to support it, but it might take a fair amount of work depending on exactly what syscalls and what netlink messages are used. This is indirectly tied to #2900, which also requires netlink support. I think we'd have to write our own parsers for the protocol, similar to the existing libc macros. Or maybe we could write C wrappers around the libc macros and just use that, since I think the libc macros write the wire format directly so there shouldn't be any difference between the libc format and the kernel format. |
It would definitely be valuable to add netlink support, as it's come up repeatedly.
This looks like functionality implemented in libc by It might be worth trying to reproduce, and seeing if it works again with an older version of golang (or shadow, in case something broke there). |
Hmm, I get a full bootstrap without any calls to I'll do some more digging to see if I can figure out why this is different inside shadow. |
It might be helpful to enable strace-tracing in shadow and take a look at that output. |
Oh, my bad. I forgot to run |
I tracked down the "breaking" change to a change in the webrtc library network stack. Modifying snowflake's diff --git a/go.mod b/go.mod
index 8f18d6f..2ef577f 100644
--- a/go.mod
+++ b/go.mod
@@ -5,10 +5,11 @@ go 1.15
require (
github.com/clarkduvall/hyperloglog v0.0.0-20171127014514-a0107a5d8004
github.com/gorilla/websocket v1.5.0
- github.com/pion/ice/v2 v2.3.1
+ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
+ github.com/pion/ice/v2 v2.2.16
github.com/pion/sdp/v3 v3.0.6
github.com/pion/stun v0.4.0
- github.com/pion/webrtc/v3 v3.1.57
+ github.com/pion/webrtc/v3 v3.1.53
github.com/prometheus/client_golang v1.10.0
github.com/prometheus/client_model v0.2.0
github.com/refraction-networking/utls v1.0.0 This change comes with some settings for the library that might allow us to get around needing the |
Thanks for tracking that down! Agreed that ultimately it's desirable to support netlink for this use-case, but glad you found a workaround in the meantime. |
Documenting the workaround here: https://gitlab.torproject.org/cohosh/snowflake/-/commit/ec14599c22fd7dd61ae2191c8b9001a61b7b0fe2 It was easier than I thought, it turns out that both the older working version of the library and the new versions were calling the same functions under the hood, it's just that the old version silently ignored an error when |
From the strace:
(new
(bind to a random port ID)
(send a
(get the assigned port ID)
(read the Then a new
(send a Then another This cycle continues with more I may have missed some things since I don't know much about netlink, but this gives a general overview of what parts of the netlink api are being used. |
If/when we implement this, the |
Fyi, I have implemented AF_NETLINK in Shadow a long time ago. See ppopth@4fd34 Currently, it has no test and I use it only in my use case, i.e. to simulate the Ethereum network. See https://github.com/ppopth/ethereum-shadow I'm happy to clean it up and write the test to merge it to the upstream repo. Are you willing to take it? If so, I will start working on it. |
Cool! This implementation looks plausible to me at first glance, though it might conflict with @stevenengler's current work on the networking stack. @stevenengler wdyt? |
There shouldn't be any conflicts with other networking code, and it looks like this code doesn't access the network interface at all (the interface names are hardcoded into the netlink socket impl). There will be some small merge conflicts when rebasing on shadow@HEAD and possibly with the new TCP code (PR should hopefully be up this week), but I think all of the conflicts would be small. As for upstreaming it as a PR, it sounds good to me! There might be some small comments, but overall I think it's good. If you make a PR it would be good to have some tests, and a description about what netlink features are supported by this PR, and maybe what some limitations are. (For example, bind() is only partially supported, which I think is okay but it would be good to have that documented.) I don't know much about netlink, and netlink encompasses a lot of different features, so the more documentation the better. |
How about using
The other one that we can consider is |
@ppopth at first glance either of I dropped the note about |
For details, see shadow/shadow#2980
Describe the issue
Trying out snowflake simulations on the newest version of shadow and ran into some trouble with go library calls again. Seems like Snowflake's requirements have changed since I last tried a year ago :)
When I run a minimal snowflake example, all of the tgen streams fail and up inspecting the logs from the snowflake client, I see the following log messages:
These presumably correspond to the following warnings in
shadow.log
:The same thing appears to be happening with the proxy (which also uses WebRTC).
To Reproduce
Run a minimal Snowflake experiment: https://github.com/cohosh/shadow-snowflake-minimal
All of the tgen streams will fail
Operating System (please complete the following information):
Debian GNU/Linux 12 (bookworm)
Linux 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux
go1.19.8 linux/amd64
Shadow (please complete the following information):
Version: Shadow v3.0.0-52-g787ed885 2023-05-25--19:22:30
Which plug-ins you are using: None
Additional context
I realize this might be a big ask, I'm still trying to figure out why the webrtc library needs this.
The text was updated successfully, but these errors were encountered: