-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDP sockets created by create_datagram_endpoint() allow by default multiple processes to bind the same port #81409
Comments
When using loop.create_datagram_endpoint(), the default for reuse_address=True, which sets the SO_REUSEADDR sockopt for the UDP socket. This is a dangerous and unreasonable default for UDP, because in Linux it allows multiple processes to create listening sockets for the same UDP port and the kernel will randomly give incoming packets to these processes. I discovered this by accidentally starting two Python asyncio programs with the same UDP port and instead of getting an exception that the address is already in use, everything looked to be working except half my packets went to the wrong process. The documentation also refers to behavior with TCP sockets: "reuse_address tells the kernel to reuse a local socket in TIME_WAIT state, without waiting for its natural timeout to expire. If not specified will automatically be set to True on Unix." This is true for TCP but not UDP. Either the documentation should reflect the current (dangerous) behavior or the behavior should be changed for UDP sockets. Workaround is of course to create your own socket without SO_REUSEADDR and pass it to create_datagram_endpoint(). |
There is no chance to change the flag for 3.7. Yuri, what do you think? |
Can we still consider this for 3.9? I still think this is an easy way to accidentally blow your foot off with something that will probably only show up in production. |
I see two proposals:
What is the suggestion? Pick one, please :) |
I definitely propose changing the default for UDP sockets. Having multiple processes binding to a the same port and load-balancing incoming UDP traffic intentionally is a rare scenario which should be supported but not the default. For TCP the default is reasonable because everyone creating server sockets will usually want to set it. |
I'd be -1 on changing the default of an existing method, at least without consulting with a wider audience. We can add a new method to the loop and deprecate create_datagram_endpoint. I suggest to post to python-dev and discuss this before making any decisions. |
Sure, I fully appreciate implications of changing default behaviour and will post on python-dev. |
I agree that this is a bad default (and whoever wrote it, probably me, didn't know what this does for UDP). I think the right solution is to change the default, not to introduce a new method. Maybe we can deprecate the default somehow? There currently is a line the computes the default based on platform: if reuse_address is None:
reuse_address = os.name == 'posix' and sys.platform != 'cygwin' We could change this to issue a deprecation warning stating that in the future the default will change to False everywhere. Then in the next release we could change the default to False and remove the warning. |
Ouch, that's nasty. It also has security implications. For example, suppose you have a multi-user computer, where one user is has a video call going, which transfers packets over UDP. Another user could check what port they're using, bind to the same port, and steal half the packets, letting them effectively listen in on the first user's phone call. Well, hopefully most conferencing tools use end-to-end encryption. But the point is that we don't normally let unprivileged users steal network traffic intended for other users. |
How much harm would there be in bringing the DeprecationWarning into the next patch of existing (3.6, 3.7, 3.8) releases? The security implications are significant enough that I'd want to be notified of it in my software ASAP. Users can (and should!) squelch the warning by passing the setting explicitly. |
I had a quick search through github for calls to create_datagram_endpoint() and the reuse_address is either not set or set explicitly to True, probably due to the error in the documentation. Only in one case (of my admittedly small sample) did it seem like the parameter was set to True intentionally to get the behavior of binding multiple processes to the same port. Due to the nature of the function, it is also often buried inside packages implementing higher level protocols (SIP, STUN, DNS, RTP) which want to create a convenience function to create asyncio servers and clients. The deprecation warning and getting it into 3.6 and later sounds very reasonable to me. |
I'm working on patches for the deprecation bits (targeting 3.6 for now; will work my way up from there) for review, including documentation. Unless someone tells me to stop. :-) In an attempt to make this not-so-Linux-specific, I'm reviewing how this works on non-Linux platforms (MacOS, FreeBSD) as well. Reading Linux's socket(7) man page makes it seem like reuse_port is the issue (but that actually has protections to ensure you're only doing it across the same UID at least). I had to write my own test jig to (re-)convince myself that, indeed, reuse_addr is the problem. |
FreeBSD 12.1 and MacOS 10.15.1 (Catalina) appear to have saner and safer behavior. Both require the use of SO_REUSEPORT for this behavior to happen as well. FreeBSD also requires the UID to be the same or 0 for subsequent processes to make the bind() call. I'll call this out as being Linux-specific in the deprecation message for now. (I don't have an AIX, HP-UX, or Solaris system handy to test, nor do I really want one if I'm being honest. :-) .) |
Alright -- my first stab at the DeprecationWarning in 3.6. Please critique away, and don't fret about bruising my ego. :-) Is there a more idiomatic way of getting a warning to show up against the first callee that's not in the current module? I'm not enamored of the monstrosity I'm put together around line 918 of base_events.py (but less enamored about displaying warnings that the user is going to tear their hair out over trying to find the offending source line): |
David, in terms of documentation changes and the emitted deprecation warning itself, I think it would be appropriate to instruct that please set the parameter explicitly to True or False to silence the warning AND point out that setting it to True has significant security and previously incorrectly documented functional implications. Now your updated docs and warning read more like we are working around a Linux security bug which is not really the case - this behavior was intentionally added to the kernels and some of the code I do for a living relies on it to work properly. Admittedly the restriction of having the same UID wouldn't hurt. And browsing again through the hits to my github searches, it makes me cringe how many people are already explicitly setting reuse_address=True in their code because the current documentation mistakenly makes it seem harmless and desirable. Makes me wonder if we need to put out a CVE? At the very least, I will be putting in PRs to the asyncio packages that I myself use and understand. |
I think you can use SO_REUSEPORT instead, and for UDP sockets it's identical to SO_REUSEADDR except with the same-UID restriction added? If that's right then it might make sense to unconditionally switch SO_REUSEADDR -> SO_REUSEPORT, even in existing Python releases – on the theory that it fixes the main security hole, while being back-compatible enough to be acceptable for a point release. |
+1, I think this is the best proposal so far. From my reading of http://man7.org/linux/man-pages/man7/socket.7.html (specifically the SO_REUSEPORT section) this seems correct. Emitting a The way I see it, changing This would also allow us to apply the change to more previous Python versions, since it doesn't affect asyncio's public API. If we're just changing cpython/Lib/asyncio/base_events.py Line 1321 in 7eee5be
We should be able to apply to change starting from Python 3.5+, no? I should have the time to open a PR with backports within the next day if needed, particularly since it's a fairly small code change. I'll leave that up to Nathaniel though since he came up with the idea, let me know either way. |
Going to SO_REUSEPORT will fix the security issue and emitting a deprecation warning for default value invocation will catch the eyes of some maintainers but it will not prevent what caused me to catch this issue in the first place - starting two processes (with same UID) accidentally listening on the same UDP port (in my case it was port 5060 as a default SIP port). Since there already is a reuse_port parameter to create_datagram_endpoint(), I assume the proposal is to set default value for reuse_addresss=False and reuse_port=True? But if reuse_address is explicitly set to True, are we going to just set SO_REUSEPORT instead and always leave SO_REUSEADDR unset? This would leave the reuse_address parameter completely useless and still allow accidental port reuse. What if I really do want SO_REUSEADDR? Ok I can create a socket separately, call setsockopt() on it and pass it as the sock parameter to create_datagram_endpoint(). Maybe I'm not fully grasping the proposal.. or maybe we should just deprecate reuse_port from both create_datagram_endpoint() and create_server() + reuse_addr from create_datagram_endpoint()? This would leave the TCP create_server() with the reuse_addr parameter, defaulting reasonably to True. To use TCP/UDP SO_REUSEPORT or UDP SO_REUSEADDR, the docs would tell you to bake your own socket with socket.socket(). Those few (like me) who really need the functionality can survive without all-in-one convenience functions on asyncio.loop |
(previous message deleted, I hadn't noticed the "reuse_port" parameter) My preference for create_datagram_endpoint() would be:
This way we 1) solve the security issue 2) ensure that anyone that passed "reuse_address=True" explicitly is notified that they were doing something dangerous unwillingly. And, yes, someone who really wants SO_REUSEADDR can set it manually, for example by calling |
Jukka -- Fair enough; will reword this a bit. I'm trying to keep the DeprecationWarning short enough so people's eyes don't glaze over; will see what wordsmithing I can do here. (Once you get beyond a certain length, the number of folks who actually read the message goes down the longer it is.) |
On the completely deprecate reuse_address and rewrite/force folks to use reuse_port proposals, I'm a bit dubious of this approach. Right now, we have two knobs that directly correspond to (potential) kernel-level socket parameters, SO_REUSEADDR and SO_REUSEPORT. We just chose an unfortunate default for one of them. Otherwise, the behavior is nicely explicit. Rewriting or forbidding it, even when the user has (essentially) acknowledged they're accepting risks by passing reuse_address=True, seems to fly in the face of this. Users get bitten when we start convoluting logic here (I was just bitten by fcntl.lockf() not actually calling lockf() last month and ended up having to call lockf() via ctypes.). There are some platforms (Linux pre-3.9 kernels) that don't have SO_REUSEPORT. I wish I could say I don't care about such platforms; alas, I just had to compile Python 3.7 on a system running a 2.6 kernel last month at a client site. Finally, I've only scratched the surface with my test cases, on single-homed systems, IPv4, and binding to 0.0.0.0. There are a lot of parameters to try here before we decree that reuse_address has no purpose existing, and an OS could later reimplement saner defaults on SO_REUSEADDR. Changing a default already has some controversy (Yury was -1 on this change). This seems much bigger and I'm not convinced we fully understand its ramifications. That said, I'll happily implement whatever the consensus ends up being, or consensus as designated by the vBDFL, or wherever that decision needs to come from. :-) |
Based on https://www.kernel.org/releases.html, pre-3.9 Linux kernels are no longer supported upstream, the oldest LTS version is 3.16. There are likely a number of systems running older unsupported kernel versions, but I don't think that we can be reasonably expected to maintain indefinite backwards compatibility to versions that aren't supported upstream, especially not at the expense of the ones that are supported. In those cases, I think the responsibility ultimately falls upon the owners of the system or third party group providing service. This particular fix would be fairly straightforward:
|
I'd like to point out that it is also documented completely wrong up to this point in time and thus people who chose True are most likely to be unaware of the actual consequences. A user's explicit choice based on misinformation is not really demonstrating intent to shoot oneself in the foot. So after changing my mind a couple of times I'm still attached to the idea that in 3.10 reuse_address and reuse_port are both removed and the tiny portion of developers who need them will just create a suitably setsockopt'd socket and pass it to the function. 2 lines of additional code is all it takes. If the documentation had been correct all along and just the default bad, this mess would be much smaller. |
(Provisionally marking this as a security-related deferred blocker issue for backporting) |
I support Antoine's proposal. |
Where are we with this? The deadline for 3.8.1 and 3.7.6 is coming up in a few days. |
I believe we're just waiting on review and additional feedback on #61513, which implements Antoine's proposal. The only remaining component I can think of is the "What's New" entry, but I was planning on doing that in a separate PR so that we could get the security fix out there first (and simplify the merge conflicts for the backports). |
Kyle, I'm releasing 3.8.1rc1 now. Please add the What's New entry before next Monday (3.8.1). |
Thanks for taking care of merging to 3.x (master) and 3.8, Łukasz!
No problem, I'll definitely have time to do that before 3.8.1 final, likely in the next few days. I'll add you as a reviewer the to PR. After adding the What's New to 3.x and 3.8, I'll work on the manual backport PRs to 3.7, 3,6, and 3.5 via cherry-pick. This was suggested by Ned in the comments of #61513, for having a separate versionchanged for each of those branches. |
The backport to 3.7 seems straightforward so I did it to unblock 3.7.6rc1. The backport to 3.6 is a bit more complicated and 3.6.10rc1 can wait a bit longer so I'll leave that for Kyle along with the various What's New entries. |
Thanks, Ned. I'll prioritize working on the What's New entries for 3.7 and 3.8 (likely in the same PR to master) before working on the backports to 3.6 and 3.5, since the release for 3.7.1 and 3.7.6 are coming up soon. |
Clarification: should be "since the release for 3.8.1 and 3.7.6 are coming up soon", that was a typo. |
Actually, 3.6.10rc1 is currently blocked by this so if you do have time to work on it first, that would be great. |
Oh okay, I'll work on the 3.6 backport first then. |
Now that the backports for 3.6-3.8 are merged, I'll work on the What's New entries next. Waiting on feedback from Larry Hastings regarding the potential 3.5 backport, I'll add him to the nosy list. |
The fix seems to generate few DeprecationWarning while running test suite with -Wall. ➜ cpython git:(master) git checkout ab513a3~1 Lib/asyncio == Tests result: SUCCESS == 1 test OK. Total duration: 133 ms ➜ cpython git:(master) ✗ git checkout ab513a3 Lib/asyncio == Tests result: SUCCESS == 1 test OK. Total duration: 135 ms |
Thanks Karthikeyan, I'm guessing that I missed an assertWarns() or left an outdated test somewhere that explicitly sets |
Looks like this was the issue, I left a [aeros:~/repos/cpython]$ ./python -Wall -m test test_asyncio -m test_create_datagram_endpoint_nosoreuseport -m test_create_datagram_endpoint_ip_addr == Tests result: SUCCESS == 1 test OK. Total duration: 130 ms I also used git grep to ensure But while doing that, I did notice that I'll fix them accordingly and open a new PR (as well as the backports). |
One more resource warning about unclosed resource being garbage collected. As per the other tests I think transport and protocol need to be closed as per below patch but someone can verify if it's the right approach. ./python.exe -X tracemalloc -Wall -m test test_asyncio -m test_create_datagram_endpoint_reuse_address_warning == Tests result: SUCCESS == 1 test OK. Total duration: 373 ms diff --git Lib/test/test_asyncio/test_base_events.py Lib/test/test_asyncio/test_base_events.py with self.assertWarns(DeprecationWarning):
- self.loop.run_until_complete(coro)
+ transport, protocol = self.loop.run_until_complete(coro)
+ transport.close()
+ self.loop.run_until_complete(protocol.done) @patch_socket
def test_create_datagram_endpoint_nosoreuseport(self, m_socket): After patch there are no resource warnings for the test. ./python.exe -X tracemalloc -Wall -m test test_asyncio -m test_create_datagram_endpoint_reuse_address_warning == Tests result: SUCCESS == 1 test OK. Total duration: 349 ms |
Nevermind. Upon further inspection, the other occurrences of Thanks again for pointing it out Karthikeyan, as well as mentioning the specific tests the DeprecationWarnings were coming from. That made it a lot easier to figure out the source of the warnings. |
Ah, good catch. It's not needed when the ValueError occurs because it prevents the transport and protocol from being created in the first place, but I forgot that it would still be needed for the DeprecationWarning test. In addition to your changes, I'll also add a self.assertEqual('CLOSED', protocol.state). This is done in most of the other tests and it doesn't hurt to make sure the protocol properly closed. I'll add your fix to the PR and add you to the "Co-authored-by" for that commit, thanks. |
Opened a PR that adds the whatsnew entries to master, 3.8, 3.7, and 3.6: #61795. |
Thanks for taking care of merging the remaining backport PRs for 3.6-3.8, Ned. Now, the only branch left is (potentially) 3.5. |
Downgrading priority since it's released everywhere except for 3.5. |
Since 3.5 has now reached end-of-life, this issue will not be fixed there so it looks like it can be closed. |
Thanks, Ned <3 |
(For following up and closing the issue) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: