Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add rate limiting of /api/annotations (#5423)
* Add rate limiting per user & POST /api/annotations Rate limit endpoints that are known to cause issues if over-requested on a per authorization token basis. Use the authorization token instead of ip address since users from a university may have the same ip address and the bulk of users in h day-to-day are students. The main endpoint that is known to cause issues is /api/annotations:create. When over requested it can overstress the db with create requests causing long query times which ultimately hog up the gunicorn worker time which would otherwise be <1%. * Add /api/badge and /assets rate limiting /api/badge accounts for a large portion of our traffic but it's not a very valuable endpoint. Rather than prioritizing these requests by sending them to the server right away, deprioritize them by queueing them and sending 1 per sec. /assets is an endpoint that is rarely touched by when it is, it's hit a lot. Because of this, give it a larger than usual burst limit. * Adjust the bursts towards allowing more requests * Remove inherited response stat and add exact match * Add custom 429 response & multiple zones Add custom 429 json response for api requests. Add multiple zones so that quotas on one request don't impact another request. Re-using a zone means the queue's are shared and that's not what we want so make different zones for each endpoint. Re-order the rate limits so that it follows and if, else format so it's easier to read. * Replace comments w/ calcs w/ general statements Replace the previous comments that contained detailed calculations that may be system specific with more general statements about how the number was chosen at a high level. The following are the detailed calculations that were replaced: - The 95th percentile time for a badge request is .042s. 7.6% of worker time is spent handling these requests. Typical usage per user is around 50 rpm. Queue up badge requests rather than sending them directly to the server-this will allow other requests to take priority. - The 95th percentile time for an asset is .013s. The maximum burst of requests from a single page https://hypothes.is/docs/help is 20 requests. <1% of worker time is spent handling these requests. The maximum expected request rate is 25rpm. Assume in frustration the user hammers on the refresh button 7 times in a row. Worst case this results in a burst of 140 requests. - Each /api/annotations:create request has a 95 percentile response time of .56s and there are 12 gunicorn workers per host. Create requests account for <1% of the traffic on the host so assume .12 workers are allocated for /api/annotations:create requests. .12 workers * 1 request / .56 s = .21 requests/s = ~1rps If too many of these requests happen back to back it can overwhelm the database so instead of letting a burst of requests pass to the server, queue these requests and only send 1 each second. Allow a user to queue up 8 rps (8 times the expected rate). - A bot may burst up to 50rpm and the client issues 5 requests upon loading the sidebar. Assume a max request rate of 15 rps in a burst. This means the queue size would be 14 requests. Allow a user to sustain the max bursty request rate for 3 seconds. This would mean they can request up to 45 requests in one second but only 1 new request each second after that.
- Loading branch information