-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teach the http service how to enforce connection limits #6601
Conversation
By analyzing the blame information on this pull request, we identified @gunnaraasen, @joelegasse and @mark-rushakoff to be potential reviewers |
72e0bbf
to
c901aa1
Compare
@@ -196,6 +202,13 @@ func (w *PointsWriter) WritePoints(database, retentionPolicy string, consistency | |||
w.statMap.Add(statWriteReq, 1) | |||
w.statMap.Add(statPointWriteReq, int64(len(points))) | |||
|
|||
if w.MaxConcurrentWrites > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I did it this way was to avoid putting this into the http handler. We currently have the query limit inside of the query executor so I thought to put this in the points writer.
I think it would be good to refactor the points writer to read from a stream though so it would be easier to separate out this functionality into the points writer. Putting it in the points writer also allows us to restrict the number of maximum concurrent writers from other sources such as udp, collectd, graphite, and continuous queries rather than only through the httpd service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, the issue is with the underlying TCP connections in HTTP, and not really a problem with limiting the points written internally, #6559 is specific to HTTP.
This also doesn't solve the problem of new connections being created. They will still be created, and then return an error, rather than closing the connection immediately when the limit has been exceeded. All this does (including startWrite
), is add further points of lock contention in the write path...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but this won't solve the issue that #6559 was intended to address though. For example, a bad client sending thousands of huge /write
payloads could still overwhelm the server by causing the server to use CPU to parse points as well as allocate memory for those points even if this limit is set in the PointsWriter
. Doing it in the handler allows us to drop the connection at the periphery and minimize damage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be best to only include the http connection limit then since I don't think a write limit would solve the underlying issue for #6559, while the generic max connection limit solves that problem perfectly.
Do we still want the /write
, /query
, and generic http handlers to have separate connection limits?
I'll start working on a change to address the other feedback. It seems like we should not rely on the max-concurrent-queries
option to address this issue and /query
should have its own connection limit if we're going to do that for the others too. It may also be better to try implementing this at the socket level, but I don't think Go provides the ability to perform an action when an HTTP connection starts/stops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/query
and /write
have different performance characteristics so I think it makes sense to have have them separately tunable . Many open /query
calls is not as problematic as many /write
ones. /query
has a very small payload where as /write
can be quite large.
The max-concurrent-queries
seems ok where it's at but moving it up to HTTP shouldn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually saying that there would be two levels. The query executor would have a maximum number of queries that it would run that the query executor would enforce while the httpd service would have its own enforcement to prevent DDOS attacks. One would act on the listener while the other would act on the service and would have different performance characteristics and error messages.
Although I'm not sure if both would be needed in all honesty. I think just a single connection limit on the httpd listener would probably work best.
Rather than continuing commenting on a specific line, I'll continue here. Just for clarification, is the issue the number of active Rejecting the connections puts the burden back on the client to retry, using a semaphore to limit concurrent processing of requests would allow requests to "build up" and then get processed as the server is able to. |
I think limiting the connections as you say through a semaphore would make us vulnerable to a DDOS for clients that don't act correctly. The reason to close the connections immediately would be to avoid further damage from a DDOS, but it would also mean that nobody new could connect and existing client connections could freeze the server. But to be fair, that's how PostgreSQL connections work so we may be in good company. |
c901aa1
to
cbc878f
Compare
Latest version with numbers from
I wouldn't trust the benchmarking results completely and the average write response time from the 5 connection limit situation looks very wrong. It doesn't look like it takes into account retries and likely counted a write that was closed immediately as a successful connection. Average query time, which I know does retries, seems to show the results better since the connection limit severely slowed down the result because of the need to continuously retry. Benchmarking:
I used channels since they seemed to be faster than mutexes and using the primitives in |
cbc878f
to
6936017
Compare
LGTM. @joelegasse should take a look as well. |
👍 Clean and simple, LGTM. I'm not too worried about extreme performance in accepting a network connection, since the round-trip latency is several orders of magnitude greater than waiting on a channel operation. |
Then I'll update this with a changelog, wait for green again, and then merge it. |
The http connection limit is for any HTTP operation and is independent of the other connection limits. It should be set to a higher value than the query limit. The difference between this and the query limit is it will close out the connection immediately without any further processing. This is the equivalent of the `max_connections` option in PostgreSQL. Also removes some unused config options from the cluster config. Fixes #6559.
6936017
to
4fab68b
Compare
The http connection limit is for any HTTP operation and is independent
of the other connection limits. It should be set to a higher value than
the query limit or the write limit. The difference between this and the
other connection limits is it will close out the connection immediately
without any further processing.
A max concurrent write limit has been added. This will prevent writes to
the underlying store if the number of writers exceeds the threshold.
Also removes some unused config options from the cluster config.
Fixes #6559.