-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(cli): Add OpenTelemetry #8653
Conversation
I'm not in love with this code. Seems difficult to ensure key consisitency but I'm working around the existing code and yargs.
Switched to use a custom exporter which writes spans to a file. Then have a background job that fires when OTel shuts down to read those saved spans and send them to the collector. This means we have one background job that runs to both compute the telemetry resources and send the data to the collector.
Move some files around and rename some. Adds in the missing telemetry fields to the resource. Rewrites the spans to disk with the added resource information for verbose visibility.
We will likely have to iterate on how the output looks.
I wouldn't have written this myself but I seen it used in the code so I may as well keep using it.
Stopped because of some many edge or special cases where the CLI immediately exists with non-zero exit code.
@jtoar and I had a discussion around the best approach to getting this in. This will require changes to nearly all command handlers to remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @Josh-Walker-GM, left a few comments, mostly questions. I'll try to do another review today, just submitting what I have, and I know this is one of many PRs, but one question that didn't belong to any particular file: how are you testing this? Does the existing CI check for telemetry apply to these changes?
@@ -23,6 +23,15 @@ export const handler = async ({ | |||
prisma = true, | |||
prerender, | |||
}) => { | |||
recordTelemetryAttributes({ | |||
command: 'build', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note to others, @Josh-Walker-GM and I went back and forth about importing this from the corresponding file (in this case, build.js
). I decided against it cause the last thing I want is a circular import. The tradeoff is overhead for us—if we ever rename a command, we have to change it here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the fact that I also just hard coded the values for commands nested in directories like setup auth dbAuth
then it did just feel a little silly to import for the simpler cases.
} | ||
|
||
// Legacy telemetry | ||
errorTelemetry(process.argv, error.message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just a note for others, this will be removed eventually, but till we've completely made the switch we'll keep it here.
The existing CI check simply executes a command and listens for a http packet to be received by the mock telemetry endpoint. This works for both the existing telemetry and OTel since we both end up exporting the data to a backend url. I haven't performed any packet content analysis like "did we get something from I don't have any sort of tests that call each command and expect a known telemetry output. That sort of testing strategy simply doesn't exist for the CLI right now. Would be a cool thing to build out but that would be it's own project to organist the infrastructure around that. |
The helper ensure's we maintain the spawn options we need to support different platforms. It also ensures the output is written to a log file within the '.redwood' folder. Reworks the new telemetry background process to use this helper. Introduces some relatively verbose output to the send process now that we do not need to be quiet.
Instrumenting the CLI with OpenTelemetry.
Changes
This PR introduces the following changes to the CLI:
The
cli-helpers
package now includes two telemetry related functions;recordTelemetryAttributes
andrecordTelemetryError
. These functions are helpful in avoiding errors such as spans being undefined when telemetry is disabled.The CLI index has been slightly altered to support; starting up, shutting down and recording telemetry as well as handling errors thrown by the command handler functions.
A current example of the output shown when an error is thrown by the handler functions is:
Note: I forced this command above to throw
The additional error message text links to our forums and github. It also provides a generated uuid unique to that command invocation. This error reference code may be useful for maintainers to identify any telemetry associated with an error a user is facing.
The telemetry workflow is also slightly different than has existed before and can be summarised as:
This design was chosen in an effort to ensure the CLI process is minimally impacted by the telemetry. There should be no noticeable impact of telemetry to the user's experience of the CLI - i.e. not pauses at startup or shutdown.
These telemetry files exist within
.redwood/telemetry
and are json files with a name corresponding toDate.now()
when the CLI starts. They contain the spans gathered during the command serialised to a file. These files are deleted once the spans have been sent unlessREDWOOD_VERBOSE_TELEMETRY
environment variable is set. In that case we maintain the last 8 telemetry files so the user has visibility into what has been gathered and sent.Remaining
process.exit()
directly. This will prevent telemetry as we need to ensure telemetry is shutdown properly for data to be recorded and sent.Ping: @thedavidprice