Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket binding implemented properly for IPv6 and UNIX sockets. #1641

Merged
merged 36 commits into from Jun 29, 2020

Conversation

Tronic
Copy link
Member

@Tronic Tronic commented Jul 23, 2019

  • app.run("::1") for IPv6
  • app.run("unix:/tmp/server.sock") app.run(unix="/tmp/server.sock") for UNIX sockets
  • app.run("localhost") retains old functionality (randomly either IPv4 or IPv6, or with workers=1 might be both)

Do note that IPv6 and UNIX sockets are not fully supported by other Sanic facilities.
In particular, request.server_name and request.server_port are currently unreliable.

- app.run("::1") for IPv6
- app.run("unix:/tmp/server.sock") for UNIX sockets
- app.run("localhost") retains old functionality (randomly either IPv4 or IPv6)

Do note that IPv6 and UNIX sockets are not fully supported by other Sanic facilities.
In particular, request.server_name and request.server_port are currently unreliable.
@codecov
Copy link

codecov bot commented Jul 23, 2019

Codecov Report

Merging #1641 into master will increase coverage by 0.55%.
The diff coverage is 96.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1641      +/-   ##
==========================================
+ Coverage   91.70%   92.26%   +0.55%     
==========================================
  Files          27       27              
  Lines        3001     3075      +74     
  Branches      544      552       +8     
==========================================
+ Hits         2752     2837      +85     
+ Misses        171      165       -6     
+ Partials       78       73       -5     
Impacted Files Coverage Δ
sanic/server.py 79.61% <96.38%> (+5.28%) ⬆️
sanic/request.py 99.23% <96.77%> (-0.40%) ⬇️
sanic/__main__.py 77.50% <100.00%> (+0.57%) ⬆️
sanic/app.py 93.47% <100.00%> (+0.02%) ⬆️
sanic/asgi.py 93.01% <100.00%> (+2.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aba74d...19898d0. Read the comment docs.

@Tronic
Copy link
Member Author

Tronic commented Jul 23, 2019

Compatibility with create_server (used when workers=1) remains an issue. uvloop uses quite different approach to binding than Sanic's implementation, making this difficult to manage.

UNIX sockets are useful (among other things) for nginx proxy_pass, where they perform much better than 127.0.0.1.

# Nginx proxying to Sanic via unix socket
$ wrk http://127.0.0.1/unix/ -c 125 -t 4
Running 10s test @ http://127.0.0.1/unix/
  4 threads and 125 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.14ms    2.89ms  44.44ms   75.97%
    Req/Sec     4.41k   254.81     4.97k    74.50%
  175471 requests in 10.01s, 37.64MB read
Requests/sec:  17531.19
Transfer/sec:      3.76MB

# Nginx proxying to Sanic via TCP is much slower (errors caused by high load)
$ wrk http://127.0.0.1/8000/ -c 125 -t 4
Running 10s test @ http://127.0.0.1/8000/
  4 threads and 125 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   252.47ms  342.04ms   1.13s    80.36%
    Req/Sec     1.25k   838.94     2.44k    55.00%
  16525 requests in 10.08s, 3.67MB read
  Non-2xx or 3xx responses: 751
Requests/sec:   1640.18
Transfer/sec:    372.55KB

# For comparison, the app directly served by Sanic (Goin' Fast!)
$ wrk http://127.0.0.1:8000/ -c 125 -t 4
Running 10s test @ http://127.0.0.1:8000/
  4 threads and 125 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.93ms    1.16ms   8.42ms   71.71%
    Req/Sec    10.59k   404.58    11.79k    68.00%
  421511 requests in 10.00s, 63.92MB read
Requests/sec:  42146.70
Transfer/sec:      6.39MB

sanic/server.py Outdated
name = host[5:]
sock = socket.socket(socket.AF_UNIX)
if os.path.exists(name) and os.stat(name) == stat.S_ISSOCK:
os.unlink(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because we can't clean up the socket on server close?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlinking of existing socket? Standard practice that I didn't pay much attention to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a preference, its inconsistent with what happens when you try to bind to a port that's in use though so I figured I'd mention it

Copy link
Member Author

@Tronic Tronic Aug 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sockets are left over e.g. after kill -9, so removing them is practical, and apparently asyncio's and uvloop's create_unix_server both will overwrite any existing socket this way. Also, neither of them removes the socket on close.

I've added the previously missing unlink on exit to serve() as well, for unix sockets created within this function. As a result, both modes of Sanic (workers=1 and serve_multiple) now (1) remove any existing unix socket before bind and (2) remove the socket after normal shutdown.

Copy link
Member Author

@Tronic Tronic Aug 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abuckenheimer Thanks for your notes/reviews! I'm off keyboard until Monday. Have a nice weekend :)

@raphendyr
Copy link

👍 for UNIX sockets. In addition, the implementation looked good.

@raphendyr
Copy link

Request.server_port fails with unix sockets :(

https://github.com/huge-success/sanic/blob/master/sanic/request.py#L401

@Tronic
Copy link
Member Author

Tronic commented Aug 28, 2019

Request.server_port fails with unix sockets :(

This would conflict with #1638, which I expect to be merged first. Normally, however, even unix socket requests should have a Host header, which would then be used. E.g. try

curl -v --unix-socket /tmp/sanic.sock http://fake.domain:1234/

With #1638 the existence of Host or other forwarding headers imply scheme's default port if not explicitly mentioned, and I think this would be the best fallback if no such headers are present: return 443 for SSL connections (https, wss) and 80 for anything else.

@Tronic
Copy link
Member Author

Tronic commented Aug 28, 2019

New unix sockets are created with temporary names, set to listen and then moved over the existing socket to atomically replace it. This allows zero-downtime restarts, no connection attempts may fail.

A new server may be started while the old one is still running. The old instance will not see any new requests but it may finish its pending requests while at the same time the new instance is already servicing new requests.

sanic/server.py Outdated
sock = socket.socket(socket.AF_UNIX)
try:
# Atomic zero-downtime socket replace
tmp_path = f"{path}.{uuid4().hex[:8]}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would os.getpid() be a simpler solution? It is already ensured to be unique identifier (at least inside a pid namespace (e.g. single container)). If the file would exists, it's would be from an already dead process.

Copy link
Member Author

@Tronic Tronic Aug 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale sockets might still be left from before reboot, causing services fail to start. There are apparently also some security concerns, in particular that if another user creates a symlink in place of the socket, bind may follow that symlink and create a socket at symlink's target. Eight random hex digits is far less likely to be hit by these issues than a pid, although it might be necessary to use even more.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=47492 (Linux has fixed the bug long ago but MacOS is still vulnerable)

A minor issue with this approach is that the initial random name sticks, and is returned as sockname on the server side and as peername on the clients.

@Tronic
Copy link
Member Author

Tronic commented Sep 3, 2019

Is anyone aware of why request checks peername and sockname from protocol instead of protocol filling these into requests? The only reasons I can think of are (1) it was the fastest way to hack it, or (2) not to consume runtime when these values are not used. Since the values need to be read from socket only once per connection, not per request, I plan to change this to happen right after connection.

…ket.

- Would be a good idea to remove request.transport entirely but I didn't dare to touch it yet.
@Tronic
Copy link
Member Author

Tronic commented Sep 5, 2019

Quite many potentially breaking changes again. @abuckenheimer can you have a look and comment?

@Tronic
Copy link
Member Author

Tronic commented Sep 5, 2019

Current implementation is balancing between backward compatibility and sensible implementation. And I didn't benchmark at all, in particular the ASGI server that creates new ConnInfo for every request. Probably no slowdown still, but needs to be tested.

In any case, this will require further refining before merging.

@Tronic
Copy link
Member Author

Tronic commented Sep 5, 2019

Also, it'd be great if someone could write tests for unix sockets (I have no idea how to fit that into the testing framework which afaik doesn't even test live servers).

@stale
Copy link

stale bot commented Dec 4, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is incorrect, please respond with an update. Thank you for your contributions.

@stale stale bot added the stale label Dec 4, 2019
@sjsadowski sjsadowski removed the stale label Dec 5, 2019
@Tronic
Copy link
Member Author

Tronic commented Mar 27, 2020

I'm done with this. Requesting review.

@Tronic
Copy link
Member Author

Tronic commented Mar 29, 2020

This doesn't need to go in 20.3 but the ConnInfo structure is something I'll be using in the streaming branch as soon as this gets merged.

@Tronic
Copy link
Member Author

Tronic commented Jun 4, 2020

@huge-success/sanic-core-devs Anyone care to review?

@Tronic Tronic removed the needs tests PR that needs additional tests added label Jun 21, 2020
sanic/request.py Show resolved Hide resolved
@Tronic
Copy link
Member Author

Tronic commented Jun 28, 2020

@ahopkins Are you still including this one?

@Tronic
Copy link
Member Author

Tronic commented Jun 28, 2020

Hmm, looks like some unix socket tests got broken now. I suppose that'll need more investigation.

@ahopkins
Copy link
Member

Tests were failing. If you can take a look and solve easily, let's push another release.

@Tronic
Copy link
Member Author

Tronic commented Jun 28, 2020

Tests were failing. If you can take a look and solve easily, let's push another release.

#1853 made workers use spawn rather than fork on all OSes, and consequently doesn't allow using function-local app objects or even route decorators. However, I would expect that to break far more things than only this...

Do you know what the current behaviour regards pickling should be? Perhaps something from master was not correctly merged to this? I'll try to investigate.

@ahopkins
Copy link
Member

@ashleysommer ?

@Tronic
Copy link
Member Author

Tronic commented Jun 28, 2020

This sets "spawn" mode, earlier "fork" was used on Linux and MacOS (Windows only supports "spawn"):
https://github.com/huge-success/sanic/blob/master/sanic/server.py#L930

As a result, with current master, this doesn't work:

from sanic import Sanic

app = Sanic(__name__)

@app.get("/")
def hello(req):
    pass

app.run(workers=2)
_pickle.PicklingError: Can't pickle <function hello at 0x10ff18160>: it's not the same object as __main__.hello

@Tronic
Copy link
Member Author

Tronic commented Jun 28, 2020

The multiple worker test is now made compatible with either mode of multiprocessing, so that this can be merged. Let's move further discussion about the pickling to issue #1883 or related PR.

@ahopkins ahopkins merged commit a62c84a into sanic-org:master Jun 29, 2020
@ahopkins
Copy link
Member

ahopkins commented Apr 7, 2021

This pull request has been mentioned on Sanic Community Discussion. There might be relevant details there:

https://community.sanicframework.org/t/multiprocessing-is-currently-not-supported-on-nt/827/2

@ahopkins ahopkins removed the needs review the PR appears ready but requires a review label Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants