Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): Add API rate limiting NestJS guard #4910

Merged
merged 191 commits into from Dec 5, 2023

Conversation

rifont
Copy link
Contributor

@rifont rifont commented Nov 28, 2023

What change does this PR introduce?

  • Adds an API Rate Limit guard to throttle inbound API requests
    • Extended from @nestjs/throttler
    • Implements the guard as an interceptor, to access Auth guard context in execution flow
    • Add Cost decorator to use on controllers and methods
    • Add Category decorator to use on controllers and methods
    • Unit test for single-cost endpoint use and multi-category
  • Specifies the guard as an App Interceptor (Rate Limiting is currently toggled off by default)
  • Add strong environment variable typings

Why was this change needed?

API Rate Limiting increases API resiliency by limiting the number of requests that a single API client can make.
The guards will be used to implement variable-cost endpoints and variable-category rate limits.

Other information (Screenshots)

Note
E2E tests for bulk cost endpoints are included in a subsequent PR: #4911

Tests
image

…vuhq/novu into nv-3059-get-rate-limit-use-case
{
provide: APP_INTERCEPTOR,
useClass: ApiRateLimitInterceptor,
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing before the idempotency interceptor so that idempotent requests are still subject to rate limiting.

export const ThrottlerCategory = Reflector.createDecorator<ApiRateLimitCategoryEnum>();

// eslint-disable-next-line @typescript-eslint/naming-convention
export const ThrottlerCost = Reflector.createDecorator<ApiRateLimitCostEnum>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom decorators to specify custom costs and categories on both controllers and methods.

expectedRetryAfter: 5,
expectedThrottledRequests: 151, // Upstash algorithm currently limits 1 more request than it should
},
];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scenarios testing combinations of different endpoints being requested.

@@ -0,0 +1,182 @@
import {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard is adapted from the NestJS rate limit guard to support per-client rate limiting.

This allows us to use all the existing decorators for the guard, like @SkipThrottle for endpoints that don't need rate limiting.

@@ -0,0 +1,94 @@
import { ApiRateLimitCategoryEnum, ApiRateLimitCostEnum } from '@novu/shared';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test controllers for rate limiting.

IS_API_IDEMPOTENCY_ENABLED: `${boolean}`;
FRONT_BASE_URL: string;
SENTRY_DSN: string;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly typed process.env generated from Typescript template literals!
image
image

);

return await this.getApiKeys(environmentId);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be used to add custom rate limits per environment when enterprise customers need custom limits. It's also used for testing.

}
);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be used to set a custom API service level when customers upgrade to higher product tiers. It's only used for testing now.


public async updateEnvironmentApiRateLimits(apiRateLimits: Partial<IApiRateLimitMaximum>) {
await this.environmentService.updateApiRateLimits(this.environment._id, apiRateLimits);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding methods to customise the organization service level and environment rate limit in tests.

const authScheme = authorizationHeader.split(' ')[0];
request.authScheme = authScheme;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little ugly right now. It's required because we currently override the existing auth header when an API key is used to authenticate. Once we remove this override, we can let the downstream execution flow read the auth header again as normal.

It will be fixed in https://linear.app/novu/issue/NV-3055/🔐-fix-apikey-auth-guard-performance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make sure to prioritize NV-3055 for the next cycle

Base automatically changed from nv-3060-token-bucket-rate-limiting-use-case to next November 28, 2023 19:29
Copy link
Contributor

@djabarovgeorge djabarovgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an amazing job done, left a couple of comments regarding stuff i was not sure about.

}

export const THROTTLED_EXCEPTION_MESSAGE = 'API rate limit exceeded';
export const ALLOWED_AUTH_SCHEMES = ['ApiKey'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add Bearer to ALLOWED_AUTH_SCHEMES for JWT token? If i got it right ApiRateLimitInterceptor is a global interceptor meaning shouldSkip will be true for web clients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we are not planning to rate limit requests with a Bearer token, because those requests will come from the browser (unless someone spoofs an auth flow outside the browser) and browser-based activity is unlikely to cause extreme server load.

Keen to hear your thoughts on whether you think we should indeed be rate limiting Bearer Auth requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djabarovgeorge and I discussed this point, we agreed that unless there is a good reason not to limit Bearer token requests, we should limit their consumption.

However, we will need to add support for a custom cache prefix to be passed to the evaluate rate limit use-case to support multiple token buckets to consume from - it's not desirable for both authentication schemes to share the same bucket as this may be confusing to the user.

Given the additional work required and deviation from the initial scope, we should address this in a separate PR if we think we should limit Bearer token requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djabarovgeorge and @rifont I do not disagree that we should consider rate limiting the Bearer implementing this in the full app will require a decent amount of time investment from the Frontend and the design side to insure quality UX when this does happen.

One of the reasons that we decided to not rate limit the UI in the research phase of this was that it could create a perception that Novu is not able to handle a user symply testing the system.
In addition, no user should ever be able to click the test button fast enough to even get close to our limits of 60RPS and if they are using an auto clicker then that is a whole different problem that could be solved on the UI instead of on the api.

I am happy to discuss this more, however for the proposes of this PR. I think we should take this onto Notion to start writing next steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a client rate limit with a separate bucket as Richard said in order to guard our system from things like:

  • malicious actors that could retrieve the jwt token and automate requests.
  • misconfigured systems - could be someone that implements the notification center (jwt token user), which could cause high-traffic requests by bug or misconfigured.

there could be potential other cases.

But this should not be part of this PR/Cycle, we could benefit a lot from the first stage of the rate limit and prioritize the next stage accordingly.

Comment on lines 44 to 45
_environmentId: '*',
apiRateLimitCategory: '*',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the motivation behind the * character?

Copy link
Contributor Author

@rifont rifont Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here is to cater to the following scenario:

  1. A Novu platform administrator makes changes to the default API rate limit service config environment variables. All cached maximum environment limits for all Organizations, for all Environments need to be invalidated. This cached entity supports fast lookups for an Environment's max limit.
  2. When a change is detected (via the modified hash), invalidate all cached entities.
  3. Update the hash of the service config

This pattern could be used to support other platform service configuration variables that require cache invalidation. This will typically be the case when entities are cached for performance reasons, but still depend on platform defaults that can change. A good example here is for pulling platform configuration from an external source, like AWS SSM or Hashicorp Vault.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!

/**
* An interceptor is used instead of a guard to ensure that Auth context is available.
* This is currently necessary because we do not currently have a global guard configured for Auth,
* therefore the Auth context is not guaranteed to be available in the guard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would it take to switch this to a guard? and should we?

Copy link
Contributor Author

@rifont rifont Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should switch it to a guard, as a guard is a better descriptor of the effect that a rate limiter has on the API - a guard executes only at the beginning of the request execution flow and shouldn't run after, whereas an interceptor can run again on the response execution flow.

The following steps are required to enable guaranteed auth context, to switch this to a guard:

  1. Create a single, global auth guard that uses decorators to:
    1. handle multiple Auth provider strategies (e.g. JWT, Google, Subscriber In-App read/seen use-case Auth)
    2. handle controllers/paths that don't need Auth
    3. create a contract of what downstream providers can expect from Auth context:
      1. Auth security scheme (None, Bearer, ApiKey)
      2. Strategy (JWT, Google, Subscriber)
      3. User context (userId, environmentId, organizationId)
  2. Implement new Auth guard at a global level to guarantee Auth context is available to Throttler guard
  3. Modify ThrottlerGuard to be a guard, move ThrottlerGuard to execute after Auth guard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making these changes, we can remove a considerable number of individual controller-level and method-level use of the AuthGuard, providing better maintainability.

Given the sizeable scope and many controllers/methods touched, I think we should make these change in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, Can you please create a set of tasks to do this in linear so that way I can prioritize this follow-up tasks?

@rifont rifont merged commit d785623 into next Dec 5, 2023
25 of 26 checks passed
@rifont rifont deleted the nv-3061-rate-limiting-nestjs-guard branch December 5, 2023 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants