Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach the http service how to enforce connection limits #6601

Merged
merged 1 commit into from
May 12, 2016

Conversation

jsternberg
Copy link
Contributor

The http connection limit is for any HTTP operation and is independent
of the other connection limits. It should be set to a higher value than
the query limit or the write limit. The difference between this and the
other connection limits is it will close out the connection immediately
without any further processing.

A max concurrent write limit has been added. This will prevent writes to
the underlying store if the number of writers exceeds the threshold.

Also removes some unused config options from the cluster config.

Fixes #6559.

@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @gunnaraasen, @joelegasse and @mark-rushakoff to be potential reviewers

@jsternberg jsternberg force-pushed the js-6559-http-connection-limits branch from 72e0bbf to c901aa1 Compare May 11, 2016 16:52
@@ -196,6 +202,13 @@ func (w *PointsWriter) WritePoints(database, retentionPolicy string, consistency
w.statMap.Add(statWriteReq, 1)
w.statMap.Add(statPointWriteReq, int64(len(points)))

if w.MaxConcurrentWrites > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite what the intent of #6559 was intended to solve. #6559 is to return an error on the HTTP connection before getting into PointsWriter or parsing points. It needs to drop the connection before doing all the work to get down to this level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I did it this way was to avoid putting this into the http handler. We currently have the query limit inside of the query executor so I thought to put this in the points writer.

I think it would be good to refactor the points writer to read from a stream though so it would be easier to separate out this functionality into the points writer. Putting it in the points writer also allows us to restrict the number of maximum concurrent writers from other sources such as udp, collectd, graphite, and continuous queries rather than only through the httpd service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the issue is with the underlying TCP connections in HTTP, and not really a problem with limiting the points written internally, #6559 is specific to HTTP.

This also doesn't solve the problem of new connections being created. They will still be created, and then return an error, rather than closing the connection immediately when the limit has been exceeded. All this does (including startWrite), is add further points of lock contention in the write path...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but this won't solve the issue that #6559 was intended to address though. For example, a bad client sending thousands of huge /write payloads could still overwhelm the server by causing the server to use CPU to parse points as well as allocate memory for those points even if this limit is set in the PointsWriter. Doing it in the handler allows us to drop the connection at the periphery and minimize damage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be best to only include the http connection limit then since I don't think a write limit would solve the underlying issue for #6559, while the generic max connection limit solves that problem perfectly.

Do we still want the /write, /query, and generic http handlers to have separate connection limits?

I'll start working on a change to address the other feedback. It seems like we should not rely on the max-concurrent-queries option to address this issue and /query should have its own connection limit if we're going to do that for the others too. It may also be better to try implementing this at the socket level, but I don't think Go provides the ability to perform an action when an HTTP connection starts/stops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/query and /write have different performance characteristics so I think it makes sense to have have them separately tunable . Many open /query calls is not as problematic as many /write ones. /query has a very small payload where as /write can be quite large.

The max-concurrent-queries seems ok where it's at but moving it up to HTTP shouldn't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually saying that there would be two levels. The query executor would have a maximum number of queries that it would run that the query executor would enforce while the httpd service would have its own enforcement to prevent DDOS attacks. One would act on the listener while the other would act on the service and would have different performance characteristics and error messages.

Although I'm not sure if both would be needed in all honesty. I think just a single connection limit on the httpd listener would probably work best.

@joelegasse
Copy link
Contributor

Rather than continuing commenting on a specific line, I'll continue here. Just for clarification, is the issue the number of active /write requests being processed, or the underlying TCP connection count? Are we moving forward with considering unclosed connections to be a "bad client" and just limiting the number of actively processing /write requests? If so, it seems like a simple "channel as a semaphore" would be enough to limit that without outright rejecting requests that happen in quick succession or when the server is experiencing high load.

Rejecting the connections puts the burden back on the client to retry, using a semaphore to limit concurrent processing of requests would allow requests to "build up" and then get processed as the server is able to.

@jsternberg
Copy link
Contributor Author

I think limiting the connections as you say through a semaphore would make us vulnerable to a DDOS for clients that don't act correctly. The reason to close the connections immediately would be to avoid further damage from a DDOS, but it would also mean that nobody new could connect and existing client connections could freeze the server.

But to be fair, that's how PostgreSQL connections work so we may be in good company.

@jsternberg jsternberg force-pushed the js-6559-http-connection-limits branch from c901aa1 to cbc878f Compare May 12, 2016 12:55
@jsternberg
Copy link
Contributor Author

Latest version with numbers from influx_stress for three different configurations:

# With a connection limit of 5 (intentionally to show a DDOS situation with retries)
$ influx_stress
Total Queries: 250
Average Query Response Time: 11.29319ms

Total Requests: 2127
        Success: 2000
        Fail: 127
Average Response Time: 56.393977ms
Points Per Second: 366954

# With no connection limit
$ influx_stress
Total Requests: 2000
        Success: 2000
        Fail: 0
Average Response Time: 85.688198ms
Points Per Second: 399314

Total Queries: 250
Average Query Response Time: 25.941744ms

# With a connection limit of 5000
$ influx_stress
Total Requests: 2000
        Success: 2000
        Fail: 0
Average Response Time: 86.180241ms
Points Per Second: 400393

Total Queries: 250
Average Query Response Time: 24.379521ms

I wouldn't trust the benchmarking results completely and the average write response time from the 5 connection limit situation looks very wrong. It doesn't look like it takes into account retries and likely counted a write that was closed immediately as a successful connection. Average query time, which I know does retries, seems to show the results better since the connection limit severely slowed down the result because of the need to continuously retry.

Benchmarking:

$ go test -run=LimitListener -bench=LimitListener ./services/httpd/
PASS
BenchmarkLimitListener-8         5000000               349 ns/op
ok      github.com/influxdata/influxdb/services/httpd   2.170s

I used channels since they seemed to be faster than mutexes and using the primitives in sync/atomic directly caused me to think I would make a mistake. The logic of checking and incrementing a counter didn't play out very well. I can try to do it anyway as long as we're all agreed this is the correct direction to go.

@jsternberg jsternberg force-pushed the js-6559-http-connection-limits branch from cbc878f to 6936017 Compare May 12, 2016 13:05
@jwilder
Copy link
Contributor

jwilder commented May 12, 2016

LGTM. @joelegasse should take a look as well.

@joelegasse
Copy link
Contributor

👍 Clean and simple, LGTM. I'm not too worried about extreme performance in accepting a network connection, since the round-trip latency is several orders of magnitude greater than waiting on a channel operation.

@jsternberg
Copy link
Contributor Author

Then I'll update this with a changelog, wait for green again, and then merge it.

The http connection limit is for any HTTP operation and is independent
of the other connection limits. It should be set to a higher value than
the query limit. The difference between this and the query limit is it
will close out the connection immediately without any further
processing.

This is the equivalent of the `max_connections` option in PostgreSQL.

Also removes some unused config options from the cluster config.

Fixes #6559.
@jsternberg jsternberg force-pushed the js-6559-http-connection-limits branch from 6936017 to 4fab68b Compare May 12, 2016 20:25
@jsternberg jsternberg merged commit 51b3f46 into master May 12, 2016
@jsternberg jsternberg deleted the js-6559-http-connection-limits branch May 12, 2016 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP Connection Limits
4 participants