Add rate limiting of /api/annotations (#5423)

* Add rate limiting per user & POST /api/annotations Rate limit endpoints that are known to cause issues if over-requested on a per authorization token basis. Use the authorization token instead of ip address since users from a university may have the same ip address and the bulk of users in h day-to-day are students. The main endpoint that is known to cause issues is /api/annotations:create. When over requested it can overstress the db with create requests causing long query times which ultimately hog up the gunicorn worker time which would otherwise be <1%. * Add /api/badge and /assets rate limiting /api/badge accounts for a large portion of our traffic but it's not a very valuable endpoint. Rather than prioritizing these requests by sending them to the server right away, deprioritize them by queueing them and sending 1 per sec. /assets is an endpoint that is rarely touched by when it is, it's hit a lot. Because of this, give it a larger than usual burst limit. * Adjust the bursts towards allowing more requests * Remove inherited response stat and add exact match * Add custom 429 response & multiple zones Add custom 429 json response for api requests. Add multiple zones so that quotas on one request don't impact another request. Re-using a zone means the queue's are shared and that's not what we want so make different zones for each endpoint. Re-order the rate limits so that it follows and if, else format so it's easier to read. * Replace comments w/ calcs w/ general statements Replace the previous comments that contained detailed calculations that may be system specific with more general statements about how the number was chosen at a high level. The following are the detailed calculations that were replaced: - The 95th percentile time for a badge request is .042s. 7.6% of worker time is spent handling these requests. Typical usage per user is around 50 rpm. Queue up badge requests rather than sending them directly to the server-this will allow other requests to take priority. - The 95th percentile time for an asset is .013s. The maximum burst of requests from a single page https://hypothes.is/docs/help is 20 requests. <1% of worker time is spent handling these requests. The maximum expected request rate is 25rpm. Assume in frustration the user hammers on the refresh button 7 times in a row. Worst case this results in a burst of 140 requests. - Each /api/annotations:create request has a 95 percentile response time of .56s and there are 12 gunicorn workers per host. Create requests account for <1% of the traffic on the host so assume .12 workers are allocated for /api/annotations:create requests. .12 workers * 1 request / .56 s = .21 requests/s = ~1rps If too many of these requests happen back to back it can overwhelm the database so instead of letting a burst of requests pass to the server, queue these requests and only send 1 each second. Allow a user to queue up 8 rps (8 times the expected rate). - A bot may burst up to 50rpm and the client issues 5 requests upon loading the sidebar. Assume a max request rate of 15 rps in a burst. This means the queue size would be 14 requests. Allow a user to sustain the max bursty request rate for 3 seconds. This would mean they can request up to 45 requests in one second but only 1 new request each second after that.
hypothesis · Nov 23, 2018 · 3f7bcb0 · 3f7bcb0
1 parent 0bc6360
commit 3f7bcb0
Showing 1 changed file with 61 additions and 0 deletions.
diff --git a/conf/nginx.conf b/conf/nginx.conf
@@ -17,6 +17,22 @@ http {
 
   access_log off;
 
+  # If there is an auth token, rate limit based on that,
+  # otherwise rate limit per ip.
+  map $http_authorization $limit_per_user {
+    "" $binary_remote_addr;
+    default $http_authorization;
+  }
+
+  # 1m stands for 1 megabyte so the zone can store ~8k users.
+  # User's typically don't go over 1rps including bots so set the
+  # generic rate limit of all endpoints to 1rps.
+  limit_req_zone $limit_per_user zone=badge_user_1rps_limit:1m rate=1r/s;
+  limit_req_zone $limit_per_user zone=assets_user_1rps_limit:1m rate=1r/s;
+  limit_req_zone $limit_per_user zone=create_ann_user_1rps_limit:1m rate=1r/s;
+  limit_req_zone $limit_per_user zone=user_1rps_limit:1m rate=1r/s;
+  limit_req_status 429;
+
   # We set fail_timeout=0 so that the upstream isn't marked as down if a single
   # request fails (e.g. if gunicorn kills a worker for taking too long to handle
   # a single request).
@@ -55,6 +71,10 @@ http {
       return 302 "https://trello.com/b/2ajZ2dWe/public-roadmap";
     }
 
+    location @api_error_429 {
+        return 429 '{"status": "failure", "reason": "Request rate limit exceeded"}';
+    }
+
     location / {
       proxy_pass http://web;
       proxy_http_version 1.1;
@@ -66,6 +86,47 @@ http {
       proxy_set_header X-Forwarded-Server $http_host;
       proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
       proxy_set_header X-Request-Start "t=${msec}";
+
+      # The /api/badge endpoint limit is chosen to limit the
+      # load from any single user, and take advantage of latency
+      # not being critical.
+      location /api/badge {
+        limit_req zone=badge_user_1rps_limit burst=15;
+        error_page 429 @api_error_429;
+
+        proxy_pass http://web;
+      }
+
+      # The /assets rate limit is chosen so that the user
+      # can refresh the web page with the most asset links
+      # on it a few times in succession without hitting the
+      # limit.
+      location /assets {
+        limit_req zone=assets_user_1rps_limit burst=139 nodelay;
+
+        proxy_pass http://web;
+      }
+
+      # The POST /api/annotations limit is chosen to allow
+      # reasonable usage while preventing a single user from
+      # causing service disruption.
+      location =/api/annotations {
+        limit_req zone=create_ann_user_1rps_limit burst=8;
+        error_page 429 @api_error_429;
+
+        proxy_pass http://web;
+      }
+
+      location /api {
+        limit_req zone=user_1rps_limit burst=44 nodelay;
+        error_page 429 @api_error_429;
+
+        proxy_pass http://web;
+      }
+
+      # An overall rate limit was chosen to allow reasonable usage while
+      # preventing a single user from causing service disruption.
+      limit_req zone=user_1rps_limit burst=44 nodelay;
     }
   }