-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenTelemetry support for Koa template #706
Conversation
🦋 Changeset detectedLatest commit: 0126a2a The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
730954c
to
735fb53
Compare
template/koa-rest-api/src/tracing.ts
Outdated
return sdk.start(); | ||
}; | ||
|
||
const log = (level: string, msg: string, extra = {}) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to bring in the standard logger as it might bring more dependencies before the internals are instrumented. This lib really meant to be self-contained.
Thoughts?
@@ -1,5 +1,3 @@ | |||
import './register'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's loaded in app.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is here to make it obvious (to humans but also to e.g. linters) that the import has side effects and must go first. It's easy to shoot yourself in the foot by importing something else before ./app
, which is part of the reason why I'd like to get rid of this hook soon™️.
"@seek/logger": "^5.0.1", | ||
"skuba-dive": "^1.2.0", | ||
"aws-sdk": "^2.1039.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do you need to declare this as a dep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's up for discussion. It's needed for aws-sdk
instrumentation, which I guess is highly desirable for anyone at seek. Perhaps some don't use aws-sdk
but I bet it's the minority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when I flipped through the GitHub page it said you could use aws-sdk V3 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, otel won't work with aws-sdk < V3. At least it wasn't in early testing. This was because in V2 of the aws SDK, request signing was happening before otel header propagation. This meant that all instrumented requests to AWS services via the AWS SDK would fail on account of invalid signatures. This wasn't an issue in V3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernos re aws-sdk
v2 - I can confirm instrumentation works for it with current otel libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to just leave it out and let consumers byo sdk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on your definition of "better", as always :). Do we want to bring in aws-sdk
instrumentation for consumers out of the box?
Looking at this trace - I see clear advantages of knowing what the HTTP POST actually was about, which in the case with aws-sdk
instrumentation is very illustrative. With minimal built-in http
instrumentation it'll be just another opaque HTTP request.
The problem is https://github.com/open-telemetry/opentelemetry-js-contrib/tree/main/plugins/node/opentelemetry-instrumentation-aws-sdk needs aws-sdk (or rather its types)
Has this been running in production yet? Any early performance insights versus no instrumentation or Datadog APM? |
Otel is in production for about ~20 services today, however only some have code instrumentation enabled. Most of those 20 services have the flag set in gantry which enables tracing in auth-sidecar only. I can see there are only 4 services that have code instrumentation enabled across a couple of teams. There're some heavy-loaded systems like the API router that is instrumented though. Also it comes down that what we want to instrument, e.g. both Re DD APM - I think it'd be a similar performance hit as with automatic instrumentation as all outgoing requests (and other i/o commands) are wrapped. Stats for one of the services: (red line is when it went live) Number of tasks didn't go up so CPU avg is somewhat reliable representation of possible performance hit. I'll try to get similar insights for another service that's hit a bit harder. |
Okay, there're two discreet parts - app code instrumentation (controlled via However, when enabled in gantry AND app, there's about 2-3ms latency increase. I'm inclined to believe it's mainly attributed to sidecar instrumentation. |
Is that expected @bernos? I don't recall that showing up on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking with @runk offline, there's potentially a large performance regression for some workloads (33%-50%). I'm happy to include this as a default off to allow teams to opt-in to tracing if they find it useful. This is similar to what we do with Datadog APM.
Changed to be opt-in. I think as an alternative (a middle ground?) we can to have the tracing on in auth sidecar by default, but keep app instrumentation off by default. I think this way possible performance hit as you experienced with DD APM will be an unlikely scenario. |
The graph shows the service in three modes:
While latency increase is a single digit, it could mean a ~30% increase for a fast-responding app. Assuming if the app is doing a lot of network calls and all be instrumented (e.g. Experience APIs/Graphql servers), the more serious performance hit of app instrumentation should be expected. I do not see time spent for sidecar instrumentation to differ depending on nature of the app itself. |
"@seek/logger": "^5.0.1", | ||
"skuba-dive": "^1.2.0", | ||
"aws-sdk": "^2.1039.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, otel won't work with aws-sdk < V3. At least it wasn't in early testing. This was because in V2 of the aws SDK, request signing was happening before otel header propagation. This meant that all instrumented requests to AWS services via the AWS SDK would fail on account of invalid signatures. This wasn't an issue in V3
template/koa-rest-api/src/tracing.ts
Outdated
import { B3InjectEncoding, B3Propagator } from '@opentelemetry/propagator-b3'; | ||
import { NodeSDK } from '@opentelemetry/sdk-node'; | ||
|
||
const main = async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does async
code loaded via --require
behave any differently to async
code called from the entry point? ie. is there a race here between instrumentation being initialised and the first requests being handed by the service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think it's definitely a possibility that promises aren't fulfilled before the app loads. Today there's no way to pause for async execution in the bootstrap file supplied via --require x.js
.
Really depends on what happens inside of the sdk.start
fn: https://www.npmjs.com/package/@opentelemetry/sdk-node. Looking in code it only really returns a pending promise when we want auto resource detection which is off: https://github.com/seek-oss/skuba/pull/706/files#diff-4836bf38ef9493029c23deb46fdf06733bbc8b8cec520b9f538ea1a68cebf8eaR33
I'm going to remove async from the main
fn to keep it simple. However it's only hiding the possible problem with pending promise from sdk.start()
This is great, happy to roll it out, but please don't default any open telemetry options to |
623e27e
to
32d09a8
Compare
8f6bf48
to
dd47b67
Compare
@samchungy @etaoins do you want to check it one more time before I merge? The notable change is the addition of process exit handling: https://github.com/seek-oss/skuba/pull/706/files#diff-4836bf38ef9493029c23deb46fdf06733bbc8b8cec520b9f538ea1a68cebf8eaR46 |
@72636c your review probably holds the most weight here 😆 |
@@ -1,5 +1,3 @@ | |||
import './register'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is here to make it obvious (to humans but also to e.g. linters) that the import has side effects and must go first. It's easy to shoot yourself in the foot by importing something else before ./app
, which is part of the reason why I'd like to get rid of this hook soon™️.
"@opentelemetry/exporter-collector-grpc": "^0.25.0", | ||
"@opentelemetry/instrumentation-aws-sdk": "^0.1.0", | ||
"@opentelemetry/instrumentation-http": "^0.26.0", | ||
"@opentelemetry/sdk-node": "^0.26.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be concerned about these v0s? Does OpenTelemetry promise some degree of stability?
Co-authored-by: Ryan Ling <ryan@outlook.com.au>
Co-authored-by: Ryan Ling <ryan@outlook.com.au>
Co-authored-by: Ryan Ling <ryan@outlook.com.au>
Co-authored-by: Ryan Ling <ryan@outlook.com.au>
Co-authored-by: Ryan Ling <ryan@outlook.com.au>
Co-authored-by: Ryan Ling <ryan@outlook.com.au>
import { B3InjectEncoding, B3Propagator } from '@opentelemetry/propagator-b3'; | ||
import { NodeSDK } from '@opentelemetry/sdk-node'; | ||
|
||
const app = 'opentelemetry'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, I was wondering where this app
log field convention was from and it looks like our sidecar follows suit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old mate ca-go-library
🙈
seek-auth-sidecar
was originally ca-auth-sidecar
and used the default logger from ca-go-library
. The newer name
convention comes from Pino/Bunyan.
This brings https://opentelemetry.io/ support, which can be enabled via gantry config (deploys another sidecar that collects the data). According to @bernos it's here to stay and should no longer be considered an experimental feature. Yet has to be widely announced.
All traces can be viewed in "Technology Platforms" DD org.
This PR is a first step - one template only - koa app.