feat(api): performance service #2878

p-fernandez · 2023-02-21T13:36:49Z

What change does this PR introduce?

Implements a service that allows to gather performance data and Node.js internals data for certain areas of the API.
Also adds a performance test to understand the duration of different executions under different loads.

Why was this change needed?

Will be helpful to study performance of the sensible parts of the Trigger Event and what is the overhead of the distributed locking mechanism. Also can help to set the base for load performance test suites (in a way).

Other information (Screenshots)

SebastianStehle · 2023-02-21T14:36:05Z

Why are you not just using OTLP?

p-fernandez · 2023-02-23T08:28:30Z

Why are you not just using OTLP?

I guess you are referring to OpenTelemetry, if not I don't know what you might be talking about. Acronyms are hard.

To be honest I didn't think about it because this had a clear purpose of measuring execution times and was not meant to be able to export data. Mostly to add a quick tool for @tsssdev and his request about how to be able to measure Trigger Event execution performance.
@Cliftonz is working on general analytics so I don't want to step on his work. In case he decides to integrate OpenTelemetry, downstream this implementation perf_hooks could be easily replaced and integrate it.

SebastianStehle · 2023-02-23T13:09:19Z

Yes, I am about open telemetry.

The most important primitive of OTLP is the span. It represents a unit of work and you annotate your code to track spans. For example method calls, db calls, http requests and so on. Spans have a correlation Ids to represent the parent span and there is a protocol to add this correlation IDs to http requests for distributed tracing.

Most implementations for like this:

You have a way to create spans.
You define one or many outputs (exporters), e.g. NewRelic, OTLP (Open Telemetry protocol) and so on.
You define one or more processors, e.g. to enrich the spans or to filter them.

So what I would do:

Configure Open Telemetry.
Annotate your code with custom spans.
Write a custom exporter that provides the performance data (e.g. for tests) or so.

You can then get rid of a lot of code and the performance data can also be used for self hosting, e.g. in Azure Application Insights or Google Stackdriver and so on.

Cliftonz · 2023-02-23T23:49:13Z

@p-fernandez I agree, I think this will be fine for now.

However in the long run I agree with @SebastianStehle that we should be doing open-telemetry, However doing this in the trigger engine and mail parser is much harder then the nest based apps as there are easy OS modules for it.

https://github.com/pragmaticivan/nestjs-otel

@SebastianStehle I will adding open-telemetry into our backlog.
This would also be accompanied by setting up Prometheus metrics too.

apps/api/src/app/events/services/performance-service/index.ts

LetItRock · 2023-02-24T11:57:29Z

apps/api/src/app/events/services/performance-service/index.ts

+  public buildDigestFilterStepsMark(
+    transactionId: string,
+    templateId: string,
+    notificationId: string,
+    subscriberId: string
+  ): IMark {
+    const mark = {
+      id: `${MarkFunctionNameEnum.DIGEST_FILTER_STEPS}:event:${transactionId}:template:${templateId}:notification:${notificationId}:subscriber:${subscriberId}:steps`,
+    };


why do we need to pass the arguments here if we only calculate the averages by the MarkFunctionNameEnum?
couldn't we just generate the random id if the concurrent requests is the problem here?

Those are needed for having individual marks and to help to debug different executions if needed. And to have the right data as they are being accumulated I needed to make it unique, if not they would get overriden everytime it was executed. I saw your mention regarding generating a random UUID but that would make more difficult to debug any of the executions to be honest.
Right now the publishing of the individual marks is disabled as it was cluttering the log output and don't need it.

won't be the request scope random id the solution here?

LetItRock · 2023-02-24T12:00:43Z

apps/api/src/app/events/events.controller.ts

@@ -69,6 +71,8 @@ export class EventsController {
    @UserSession() user: IJwtPayload,
    @Body() body: TriggerEventRequestDto
  ): Promise<TriggerEventResponseDto> {
+    const mark = this.performanceService.buildEndpointTriggerEventMark(body.transactionId as string);


maybe we should hold all this logic in the decorator? I can imagine it like this:

@MeasurePerformance('EventsController.trackEvent') async trackEvent(

then I don't have to deal with the service

@LetItRock This is something nest-otel can cover for us

LetItRock · 2023-02-24T12:08:10Z

apps/api/src/app/events/services/performance-service/index.ts

+}
+
+@Injectable()
+export class EventsPerformanceService {


I don't think that we need to create an events-specific performance service, I feel like all this might be generic code, that can be reused in any context. This way we won't end up writing per context performance services. So I see it with the decorator as I mentioned before.

The publishResults function can be defined as the helper function that will be called from anywhere we need to get the stats.

I try to think of the different modules as if they were its own microservice, so making this domain related service allows to:

Keep isolated in the domain the specific mark generation.

Have their own kind of calculations and what publish and what not (think in a different domain we might want to have different calculations, instead of average other algorithm, publish different results, etc)

That doesn't mean that's going to happen, to be reused elsewhere, so I can understand why you are suggesting something more generic.

I don't know to me it would be more to maintain... but yeah maybe let's start with something simple and then adjust to the new requirements in the future...

apps/api/src/app/events/services/performance-service/index.ts

LetItRock · 2023-02-24T12:19:18Z

apps/api/src/app/events/services/performance-service/index.ts

+    return this.setStart(mark);
+  }
+
+  public buildEndpointTriggerEventMark(transactionId: string): IMark {


these all won't be needed if we will create generic function with mark like:

const mark = { id: `${functionName}:${uuid}`, };

Addressing this comment here: #2878 (comment)

tsssdev · 2023-02-25T05:47:31Z

Why are you not just using OTLP?

I guess you are referring to OpenTelemetry, if not I don't know what you might be talking about. Acronyms are hard.

To be honest I didn't think about it because this had a clear purpose of measuring execution times and was not meant to be able to export data. Mostly to add a quick tool for @tsssdev and his request about how to be able to measure Trigger Event execution performance. @Cliftonz is working on general analytics so I don't want to step on his work. In case he decides to integrate OpenTelemetry, downstream this implementation perf_hooks could be easily replaced and integrate it.

@p-fernandez How we generate load? do we have load testing infrastructure in place? Under decent to high load only we can see real issues with locking or with out locking.. if we have Dev/QA type of environment that can take decent load then we can implement quick load tests with K6(https://github.com/grafana/k6 ) or any other similar tool.

Cliftonz · 2023-02-27T18:57:06Z

@tsssdev Our testing bed is dev.web.novu.co and we are looking to add k6 load tests in the api e2e test folder. If you want to create a pr for that we would be more then happy to look at it and merge it.

Cliftonz

I think this is a good start, lets get it merged in and we can go from here.

Co-authored-by: Paweł Tymczuk <LetItRock@users.noreply.github.com>

p-fernandez requested review from davidsoderberg, LetItRock, ainouzgali, scopsy, BiswaViraj and djabarovgeorge February 21, 2023 13:36

p-fernandez self-assigned this Feb 21, 2023

github-actions bot added the @novu/api label Feb 21, 2023

LetItRock reviewed Feb 24, 2023

View reviewed changes

p-fernandez force-pushed the feat-performance-service branch from 77f524f to 70cc09c Compare February 24, 2023 15:16

LetItRock self-requested a review March 6, 2023 17:23

LetItRock approved these changes Mar 6, 2023

View reviewed changes

Cliftonz approved these changes Mar 6, 2023

View reviewed changes

p-fernandez and others added 4 commits March 9, 2023 16:58

feat(api): performance service

e4043f7

Update apps/api/src/app/events/services/performance-service/index.ts

51c56f3

Co-authored-by: Paweł Tymczuk <LetItRock@users.noreply.github.com>

Update apps/api/src/app/events/services/performance-service/index.ts

fc263fa

Co-authored-by: Paweł Tymczuk <LetItRock@users.noreply.github.com>

feat(api): pr fixes

e890ef6

p-fernandez force-pushed the feat-performance-service branch from 9198951 to e890ef6 Compare March 9, 2023 16:58

p-fernandez added this pull request to the merge queue Mar 9, 2023

Merged via the queue into next with commit f827a89 Mar 9, 2023

p-fernandez deleted the feat-performance-service branch March 9, 2023 17:06

Cliftonz mentioned this pull request Mar 11, 2023

feat(otel/analytics): Add Open-Telemetry and Promethus Montioring to Novu #3000

Closed

AliaksandrRyzhou mentioned this pull request Dec 21, 2023

feat(pkg): Add Open-Telemetry and Prometheus Montioring to Novu #5014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): performance service #2878

feat(api): performance service #2878

p-fernandez commented Feb 21, 2023

SebastianStehle commented Feb 21, 2023

p-fernandez commented Feb 23, 2023 •

edited

SebastianStehle commented Feb 23, 2023

Cliftonz commented Feb 23, 2023 •

edited

LetItRock Feb 24, 2023

p-fernandez Feb 24, 2023

LetItRock Mar 6, 2023

LetItRock Feb 24, 2023

Cliftonz Mar 6, 2023

LetItRock Feb 24, 2023

p-fernandez Feb 24, 2023

LetItRock Mar 6, 2023

LetItRock Feb 24, 2023

p-fernandez Feb 24, 2023

tsssdev commented Feb 25, 2023

Cliftonz commented Feb 27, 2023

Cliftonz left a comment

feat(api): performance service #2878

feat(api): performance service #2878

Conversation

p-fernandez commented Feb 21, 2023

What change does this PR introduce?

Why was this change needed?

Other information (Screenshots)

SebastianStehle commented Feb 21, 2023

p-fernandez commented Feb 23, 2023 • edited

SebastianStehle commented Feb 23, 2023

Cliftonz commented Feb 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsssdev commented Feb 25, 2023

Cliftonz commented Feb 27, 2023

Cliftonz left a comment

Choose a reason for hiding this comment

p-fernandez commented Feb 23, 2023 •

edited

Cliftonz commented Feb 23, 2023 •

edited