Keep-alive connections keep threads occupied infinitely #368

Closed
andor44 opened this Issue Mar 11, 2015 · 25 comments

Projects

None yet

9 participants

@andor44
andor44 commented Mar 11, 2015

As discussed in rustless/rustless#35, suggested by @reem, Hyper seems to be too "literal" about keep-alive, keeping threads in a keep-alive loop, essentially DDoSing a Hyper server. My temporary workaround is to use many threads to delay running out of threads for the time being.

@andor44 andor44 referenced this issue in rustless/rustless Mar 11, 2015
Closed

Server stops accepting connections after a while #35

@s-panferov
Contributor

I think that Hyper must have something like KeepAliveTimeout in Apache.

@s-panferov
Contributor

@andor44 I think that you may try to use nginx as a reverse proxy in front of Rustless/Huper to work-around the issue. Nginx could handle a big amount of connections without any issues and it has usable keep-alive settings. I think nginx is required for blocking servers anyway.

@andor44
andor44 commented Mar 11, 2015

@s-panferov I'll give that a shot.

Here's the relevant part of the code. I'm not sure how deep down the rabbit hole this should be handled though. Probably as deep as this though I'm not sure how easily you can specify a timeout in the current Hyper architecture.

@seanmonstar
Member

The current IO doesn't provide a way to specify a timeout, which makes this harder.

@s-panferov
Contributor

@andor44 have you tried to use a reverse proxy already? Has it helped?

@andor44
andor44 commented Mar 15, 2015

Yes. I am using nginx, and it uses HTTP 1.0 for reverse proxying requests, which doesn't have keep-alive as part of the spec, and it works perfectly.

@doomsplayer

When will this bug be fixed? timeout should be an urgent feature.

@seanmonstar
Member

Timeouts won't be supported by TcpStreams until Rust 1.1, which would be
mid June. To support timeouts in hyper earlier, we'd need to explore using
an additional thread, but it's performance may suffer too much.

In the mean time, I suggest putting nginx in front of your server, which
handles this well.

On Sat, Apr 18, 2015, 10:24 AM Young Woo notifications@github.com wrote:

When will this bug be fixed? timeout should be an urgent feature.


Reply to this email directly or view it on GitHub
#368 (comment).

@doomsplayer

All right, thanks. But what about the client? Are there any good ways to
add timeout to a HttpClient?

On 2015年4月19日周日 上午2:15 Sean McArthur notifications@github.com wrote:

Timeouts won't be supported by TcpStreams until Rust 1.1, which would be
mid June. To support timeouts in hyper earlier, we'd need to explore using
an additional thread, but it's performance may suffer too much.

In the mean time, I suggest putting nginx in front of your server, which
handles this well.

On Sat, Apr 18, 2015, 10:24 AM Young Woo notifications@github.com wrote:

When will this bug be fixed? timeout should be an urgent feature.


Reply to this email directly or view it on GitHub
#368 (comment).


Reply to this email directly or view it on GitHub
#368 (comment).

@seanmonstar seanmonstar modified the milestone: Rust 1.0 Apr 27, 2015
@tilpner tilpner added a commit to tilpner/sersve that referenced this issue May 12, 2015
@tilpner tilpner Fix for hyperium/hyper#368 bcc686c
@miguelmartin75

This error is still apparent in v0.6. My server hangs sometimes and it is quite frequent, I know it's not my code because not even the Handler's handle is called for when I send a HTTP request; it can take up to an order of minutes or not at all for the handle function to be called. It seems to not be an issue when hosting & requesting locally.

I've switched to rust-http and my code works fine. This really needs to get fixed IMHO as this library does have a nice API.

@seanmonstar seanmonstar modified the milestone: 1.0, Rust 1.0 Aug 6, 2015
@juanibiapina

Is this issue the place to follow for updates on this bug?

@juanibiapina juanibiapina added a commit to juanibiapina/zas that referenced this issue Aug 27, 2015
@juanibiapina juanibiapina feature: Increase number of threads to 20
To overcome this bug: hyperium/hyper#368
1f6f22a
@mikedilger
Contributor

FYI, for those who need a workaround for this issue while we wait for async:

  1. Keep a count of how many worker threads are busy or waiting in the keep-alive loop. The handler trait now has hooks (on_connection_start and on_connection_end) allowing you to increment/decrement your count of busy threads. You'll need to use a Mutex or AtomicUsize or other sync primitive where you update the count.
  2. Have your handler function check the count, and if the count is high enough (e.g. maybe all but 4 threads are occupied), set the Connection::close header (which hyper will notice and respect, closing the connection after the response).
  3. I still recommend using the 'timeouts' feature and setting read and write timeouts on your server; but now those timeouts don't need to be super-short anymore.

This way you can still support keep-alive most of the time, but when things get really busy, your server won't "hang". I've tested this and it works well.

@seanmonstar
Member

I'm thinking of turning keep-alive off by default for hyper 0.6 servers, which is probably the right thing to do, and let people who have a solution turn it back on. However, that's a logical breaking change, even though code will continue to compile...

Thinking of adding something like this to keep the current behavior:

Server::http(addr).unwrap().keep_alive().listen(handler)
@juanibiapina

No solution for the problem itself?

@seanmonstar
Member

@juanibiapina the best solution is asyncio, tracked in #395. With synchronous IO, your only options are 1) set a timeout, which can be done currently, but still means the connection could block to the full length of the timeout, or 2) don't try to read on kept-alive connections.

@Ogeon
Contributor
Ogeon commented Oct 8, 2015

It feels a bit like sweeping it under the rug, but it's about as good as any other workaround. It may become a gotcha in its own way, though. Probably less obvious, but still...

@reem
Member
reem commented Oct 8, 2015

In iron at least I can attempt to implement the AtomicUsize-tracked-by-connecton_{start, end} solution, which may eliminate a lot of these issues.

@seanmonstar
Member

I'm going to try the way Apache does it: if you want keep-alive, a timeout will be used when trying to read on the kept-alive connection. This is really the only thing that is possible on blocking sockets.

server.keep_alive(Duration::from_secs(5));
server.listen(...)

Of course, this will require activating the timeouts feature in hyper, and requires rustc v1.4 or greater.

@mikedilger
Contributor

I concur. I think the number of people who find hyper "locks up" is sufficiently high to justify disabling keep-alive by default for 0.6. It could be considered a DoS security vulnerability as it currently stands. The performance benefit of keep-alive can't really be achieved under blocking I/O anyhow without an enormous number of threads, or being under very low load.

@juanibiapina

You could make a new release without the keep-alive, like you suggested, since this is still 0.x. Then you could implement the timeout when 1.4 comes out.

As a user of the lib, I don't mind because I would be actively upgrading a not 1.0 dependency.

@juanibiapina

@mikedilger you beat me to it xD

@Ogeon
Contributor
Ogeon commented Oct 8, 2015

Doing the connection counting trick will at least let users choose a balance, but tune it the wrong way and your thread pool may be filled up with idle favicon connections and your server basically reverts to single thread mode. I implemented it in Rusful and it prevents lock-ups, but not much more. Disabling keep-alive may even be a better alternative, after all, since it will treat every connection equally. It's at least better than lock-ups.

@mikedilger
Contributor

A co-worker of mine wrote a tool to benchmark hyper-based servers, and which is annoying in just the right ways to inspect keep-alive related behaviour: http://github.com/alanstockwell/hyperspank (shout out to @alanstockwell)

@seanmonstar seanmonstar added a commit that referenced this issue Oct 8, 2015
@seanmonstar seanmonstar fix(server): use a timeout for Server keep-alive
Server keep-alive is now **off** by default. In order to turn it on, the
`keep_alive` method must be called on the `Server` object.

Closes #368
9482614
@seanmonstar seanmonstar self-assigned this Oct 8, 2015
@seanmonstar
Member

Proposed fix is in #661

@seanmonstar seanmonstar added a commit that referenced this issue Oct 9, 2015
@seanmonstar seanmonstar fix(server): use a timeout for Server keep-alive
Server keep-alive is now **off** by default. In order to turn it on, the
`keep_alive` method must be called on the `Server` object.

Closes #368
cdaa254
@seanmonstar seanmonstar closed this in #661 Oct 9, 2015
@seanmonstar seanmonstar removed the in progress label Oct 9, 2015
@seanmonstar
Member

On IRC, @reem pointed out to me that it's possible to do a little more in this area. With the timeout solution, a scenario is still possible that all the threads are used to each deal with a connection that is using keep-alive, and even though they aren't blocked by a stalled connection, while they are active no other connections can be handled.

I'm not sure if this scenario should still be considered part of this bug, or more of a general improvement. Also, once non-blocking io is merged, this issue goes away automatically.

A solution here would be to have an acceptor thread, and a work queue. The acceptor would push all TcpStreams into the queue, and the workers would process 1 request/response pair. If keep-alive is true, that Stream could be placed at the back of the work queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment