feat(api): Add API rate limiting NestJS guard #4910

rifont · 2023-11-28T18:29:53Z

What change does this PR introduce?

Adds an API Rate Limit guard to throttle inbound API requests
- Extended from @nestjs/throttler
- Implements the guard as an interceptor, to access Auth guard context in execution flow
- Add Cost decorator to use on controllers and methods
- Add Category decorator to use on controllers and methods
- Unit test for single-cost endpoint use and multi-category
Specifies the guard as an App Interceptor (Rate Limiting is currently toggled off by default)
Add strong environment variable typings

Why was this change needed?

API Rate Limiting increases API resiliency by limiting the number of requests that a single API client can make.
The guards will be used to implement variable-cost endpoints and variable-category rate limits.

Other information (Screenshots)

Note
E2E tests for bulk cost endpoints are included in a subsequent PR: #4911

Tests

…future rate limited protocols

…-use-case

…file

… use case

…se-case

…fault-api-rate-limits use-case

…tiveness

…onsistency

…vuhq/novu into nv-3059-get-rate-limit-use-case

rifont · 2023-11-28T18:52:19Z

apps/api/src/app.module.ts

+  {
+    provide: APP_INTERCEPTOR,
+    useClass: ApiRateLimitInterceptor,
+  },


Placing before the idempotency interceptor so that idempotent requests are still subject to rate limiting.

rifont · 2023-11-28T18:52:47Z

apps/api/src/app/rate-limiting/guards/throttler.decorator.ts

+export const ThrottlerCategory = Reflector.createDecorator<ApiRateLimitCategoryEnum>();
+
+// eslint-disable-next-line @typescript-eslint/naming-convention
+export const ThrottlerCost = Reflector.createDecorator<ApiRateLimitCostEnum>();


Custom decorators to specify custom costs and categories on both controllers and methods.

rifont · 2023-11-28T18:53:24Z

apps/api/src/app/rate-limiting/guards/throttler.guard.e2e.ts

+        expectedRetryAfter: 5,
+        expectedThrottledRequests: 151, // Upstash algorithm currently limits 1 more request than it should
+      },
+    ];


Scenarios testing combinations of different endpoints being requested.

rifont · 2023-11-28T18:56:14Z

apps/api/src/app/rate-limiting/guards/throttler.guard.ts

@@ -0,0 +1,182 @@
+import {


This guard is adapted from the NestJS rate limit guard to support per-client rate limiting.

This allows us to use all the existing decorators for the guard, like @SkipThrottle for endpoints that don't need rate limiting.

rifont · 2023-11-28T18:57:54Z

apps/api/src/app/testing/rate-limiting.controller.ts

@@ -0,0 +1,94 @@
+import { ApiRateLimitCategoryEnum, ApiRateLimitCostEnum } from '@novu/shared';


Test controllers for rate limiting.

rifont · 2023-11-28T18:59:45Z

apps/api/src/types/env.d.ts

+      IS_API_IDEMPOTENCY_ENABLED: `${boolean}`;
+      FRONT_BASE_URL: string;
+      SENTRY_DSN: string;
+    }


Strongly typed process.env generated from Typescript template literals!

rifont · 2023-11-28T19:00:59Z

libs/dal/src/repositories/environment/environment.repository.ts

+    );
+
+    return await this.getApiKeys(environmentId);
+  }


This will be used to add custom rate limits per environment when enterprise customers need custom limits. It's also used for testing.

rifont · 2023-11-28T19:01:31Z

libs/dal/src/repositories/organization/organization.repository.ts

+      }
+    );
+  }
+


This will be used to set a custom API service level when customers upgrade to higher product tiers. It's only used for testing now.

rifont · 2023-11-28T19:01:56Z

libs/testing/src/user.session.ts

+
+  public async updateEnvironmentApiRateLimits(apiRateLimits: Partial<IApiRateLimitMaximum>) {
+    await this.environmentService.updateApiRateLimits(this.environment._id, apiRateLimits);
+  }


Adding methods to customise the organization service level and environment rate limit in tests.

rifont · 2023-11-28T19:02:56Z

packages/application-generic/src/services/auth/auth.guard.ts

+      const authScheme = authorizationHeader.split(' ')[0];
+      request.authScheme = authScheme;
+    }
+


This is a little ugly right now. It's required because we currently override the existing auth header when an API key is used to authenticate. Once we remove this override, we can let the downstream execution flow read the auth header again as normal.

It will be fixed in https://linear.app/novu/issue/NV-3055/🔐-fix-apikey-auth-guard-performance

I will make sure to prioritize NV-3055 for the next cycle

djabarovgeorge

Looks like an amazing job done, left a couple of comments regarding stuff i was not sure about.

djabarovgeorge · 2023-11-28T21:23:17Z

apps/api/src/app/rate-limiting/guards/throttler.guard.ts

+}
+
+export const THROTTLED_EXCEPTION_MESSAGE = 'API rate limit exceeded';
+export const ALLOWED_AUTH_SCHEMES = ['ApiKey'];


Should we add Bearer to ALLOWED_AUTH_SCHEMES for JWT token? If i got it right ApiRateLimitInterceptor is a global interceptor meaning shouldSkip will be true for web clients?

Right now we are not planning to rate limit requests with a Bearer token, because those requests will come from the browser (unless someone spoofs an auth flow outside the browser) and browser-based activity is unlikely to cause extreme server load.

Keen to hear your thoughts on whether you think we should indeed be rate limiting Bearer Auth requests.

@djabarovgeorge and I discussed this point, we agreed that unless there is a good reason not to limit Bearer token requests, we should limit their consumption.

However, we will need to add support for a custom cache prefix to be passed to the evaluate rate limit use-case to support multiple token buckets to consume from - it's not desirable for both authentication schemes to share the same bucket as this may be confusing to the user.

Given the additional work required and deviation from the initial scope, we should address this in a separate PR if we think we should limit Bearer token requests.

@djabarovgeorge and @rifont I do not disagree that we should consider rate limiting the Bearer implementing this in the full app will require a decent amount of time investment from the Frontend and the design side to insure quality UX when this does happen.

One of the reasons that we decided to not rate limit the UI in the research phase of this was that it could create a perception that Novu is not able to handle a user symply testing the system.
In addition, no user should ever be able to click the test button fast enough to even get close to our limits of 60RPS and if they are using an auto clicker then that is a whole different problem that could be solved on the UI instead of on the api.

I am happy to discuss this more, however for the proposes of this PR. I think we should take this onto Notion to start writing next steps.

I think we should add a client rate limit with a separate bucket as Richard said in order to guard our system from things like:

malicious actors that could retrieve the jwt token and automate requests.

misconfigured systems - could be someone that implements the notification center (jwt token user), which could cause high-traffic requests by bug or misconfigured.

there could be potential other cases.

But this should not be part of this PR/Cycle, we could benefit a lot from the first stage of the rate limit and prioritize the next stage accordingly.

apps/api/src/app/rate-limiting/guards/throttler.guard.ts

...miting/usecases/evaluate-token-bucket-rate-limit/evaluate-token-bucket-rate-limit.usecase.ts

.../app/rate-limiting/usecases/get-api-rate-limit-maximum/get-api-rate-limit-maximum.usecase.ts

djabarovgeorge · 2023-11-28T22:15:00Z

...t-api-rate-limit-service-maximum-config/get-api-rate-limit-service-maximum-config.usecase.ts

+          _environmentId: '*',
+          apiRateLimitCategory: '*',


what is the motivation behind the * character?

The intention here is to cater to the following scenario:

A Novu platform administrator makes changes to the default API rate limit service config environment variables. All cached maximum environment limits for all Organizations, for all Environments need to be invalidated. This cached entity supports fast lookups for an Environment's max limit.

When a change is detected (via the modified hash), invalidate all cached entities.

Update the hash of the service config

This pattern could be used to support other platform service configuration variables that require cache invalidation. This will typically be the case when entities are cached for performance reasons, but still depend on platform defaults that can change. A good example here is for pulling platform configuration from an external source, like AWS SSM or Hashicorp Vault.

...t-api-rate-limit-service-maximum-config/get-api-rate-limit-service-maximum-config.usecase.ts

libs/dal/src/repositories/environment/environment.repository.ts

...t-api-rate-limit-service-maximum-config/get-api-rate-limit-service-maximum-config.usecase.ts

Cliftonz · 2023-11-29T20:29:50Z

apps/api/src/app/rate-limiting/guards/throttler.guard.ts

+/**
+ * An interceptor is used instead of a guard to ensure that Auth context is available.
+ * This is currently necessary because we do not currently have a global guard configured for Auth,
+ * therefore the Auth context is not guaranteed to be available in the guard.


What would it take to switch this to a guard? and should we?

We should switch it to a guard, as a guard is a better descriptor of the effect that a rate limiter has on the API - a guard executes only at the beginning of the request execution flow and shouldn't run after, whereas an interceptor can run again on the response execution flow.

The following steps are required to enable guaranteed auth context, to switch this to a guard:

Create a single, global auth guard that uses decorators to:

handle multiple Auth provider strategies (e.g. JWT, Google, Subscriber In-App read/seen use-case Auth)

handle controllers/paths that don't need Auth

create a contract of what downstream providers can expect from Auth context:

Auth security scheme (None, Bearer, ApiKey)

Strategy (JWT, Google, Subscriber)

User context (userId, environmentId, organizationId)

Implement new Auth guard at a global level to guarantee Auth context is available to Throttler guard

Modify ThrottlerGuard to be a guard, move ThrottlerGuard to execute after Auth guard

After making these changes, we can remove a considerable number of individual controller-level and method-level use of the AuthGuard, providing better maintainability.

Given the sizeable scope and many controllers/methods touched, I think we should make these change in a separate PR.

I agree, Can you please create a set of tasks to do this in linear so that way I can prioritize this follow-up tasks?

…akiness

rifont added 30 commits November 3, 2023 13:36

feat(dal, shared, api): Add DAL fields for rate limiting

748b194

test(api): Add tests for rate limit fields

4b47cce

test(api): Use enum for apiServiceLevel test assertion

75febc1

Merge branch 'next' into nv-3058-rate-limiting-dtos

1e06a6d

fix(dal): Use api prefix for rate limits to differentiate from other …

15035d0

…future rate limited protocols

fix(dal): Update category enum to also include api prefix

81ef5b7

fix(dal): Make apiRateLimits subdocument optional

d9d86d9

feat(shared): Add API rate limiting cache key builder

05962ea

fix(dal): Add fallback unlimited tier

975df32

Merge branch 'nv-3058-rate-limiting-dtos' into nv-3059-get-rate-limit…

e17ef8e

…-use-case

feat(shared): Add rate limiting constants

7618b62

feat(api): Add get rate limit use case

1d5d7db

fix(api): Fix import path

15324a5

test(application-generic): Refactor mock cache service into separate …

642b046

…file

test(api): Add unit tests for get-api-rate-limit use-case

acab424

fix(api): Remove unused LOG_CONTEXT declaration in get-api-rate-limit…

167631f

… use case

feat(api): Add rate limiting module, add get-default-api-rate-limit u…

9cde897

…se-case

feat(shared): Add types for env var format and platform rate limit map

5b8a128

refactor(api): Refactor get-api-rate-limit use-case to use the get-de…

9f9cd55

…fault-api-rate-limits use-case

Merge branch 'next' into nv-3059-get-rate-limit-use-case

72cf83b

refactor(api, shared): Rename api rate limiting interface for descrip…

e5e7357

…tiveness

refactor(api): Rename get-api-rate-limit use-case helper method for c…

e2a40cd

…onsistency

Merge branch 'nv-3059-get-rate-limit-use-case' of ssh://github.com/no…

17a3bfb

…vuhq/novu into nv-3059-get-rate-limit-use-case

feat(api): Add logging to get-api-rate-limit use case

de803e3

fix(shared): Add missing newline

d8de11e

fix(api): Typo

e746b41

fix(api): Removed unused import in get-api-rate-limit use-case

ad7e156

refactor(api, application-generic): Rename max api rate limit cache key

06ce24b

feat(application-generic): Add evaluate api rate limit cache key builder

c065cf0

fix(shared): Remove redundant import rename

310b877

rifont added 2 commits November 28, 2023 18:51

fix(api): Use rate limiter before idempotency interceptor

b625403

fix(api): Add comment on nestjs throttler config

ef45dea

rifont commented Nov 28, 2023

View reviewed changes

rifont added 3 commits November 28, 2023 19:18

test(api): update test

69360ed

test(api): Add tolerance for throttled count

ac45a0e

test(api): Fix tolerance for upstash

3dc6a9d

Base automatically changed from nv-3060-token-bucket-rate-limiting-use-case to next November 28, 2023 19:29

fix(api): Typo

fab4042

github-actions bot added the @novu/shared label Nov 28, 2023

djabarovgeorge reviewed Nov 28, 2023

View reviewed changes

Merge branch 'next' into nv-3061-rate-limiting-nestjs-guard

ff165e3

github-actions bot removed the @novu/shared label Nov 29, 2023

rifont added 4 commits November 29, 2023 09:38

fix(dal): Fix updateApiRateLimits return value

7edb9eb

fix(api): Auto-generate name prefix

c9ae745

fix(api): Use invalidate by key instead of query

650eb3a

fix(api): Remove redundant import

963a041

rifont commented Nov 29, 2023

View reviewed changes

...t-api-rate-limit-service-maximum-config/get-api-rate-limit-service-maximum-config.usecase.ts Show resolved Hide resolved

rifont added 3 commits November 29, 2023 11:04

fix(api): Fix cache invalidation test

8781404

Merge branch 'next' into nv-3061-rate-limiting-nestjs-guard

7d63579

fix(api): Fix typo

6af53c1

rifont mentioned this pull request Nov 29, 2023

feat(api): Apply rate limit decorators to api controllers and methods #4915

Merged

rifont added 4 commits November 29, 2023 14:26

fix(api): Add separate before statements for unit and e2e tests

fd05a99

test(api): Use regex for variable policy header values

8d53cf2

fix(api): Toggle launch darkly off to allow test to define FF state

588933c

fix(api): Fix launch darkly toggle off

d987263

Cliftonz approved these changes Nov 29, 2023

View reviewed changes

rifont added 2 commits December 4, 2023 23:17

Merge branch 'next' into nv-3061-rate-limiting-nestjs-guard

001a40a

fix(api): Increase error tolerance on rate limiting to reduce test fl…

828b9ad

…akiness

rifont merged commit d785623 into next Dec 5, 2023
25 of 26 checks passed

rifont deleted the nv-3061-rate-limiting-nestjs-guard branch December 5, 2023 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): Add API rate limiting NestJS guard #4910

feat(api): Add API rate limiting NestJS guard #4910

rifont commented Nov 28, 2023 •

edited

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

rifont Nov 28, 2023

Cliftonz Nov 29, 2023

djabarovgeorge left a comment

djabarovgeorge Nov 28, 2023

rifont Nov 29, 2023

rifont Nov 29, 2023

Cliftonz Nov 29, 2023

djabarovgeorge Nov 29, 2023

djabarovgeorge Nov 28, 2023

rifont Nov 29, 2023 •

edited

Cliftonz Nov 29, 2023

Cliftonz Nov 29, 2023

rifont Dec 4, 2023 •

edited

rifont Dec 4, 2023

Cliftonz Dec 4, 2023

		@@ -0,0 +1,94 @@
		import { ApiRateLimitCategoryEnum, ApiRateLimitCostEnum } from '@novu/shared';

feat(api): Add API rate limiting NestJS guard #4910

feat(api): Add API rate limiting NestJS guard #4910

Conversation

rifont commented Nov 28, 2023 • edited

What change does this PR introduce?

Why was this change needed?

Other information (Screenshots)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djabarovgeorge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rifont Nov 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rifont Dec 4, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rifont commented Nov 28, 2023 •

edited

rifont Nov 29, 2023 •

edited

rifont Dec 4, 2023 •

edited