Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(cli): Add OpenTelemetry #8653

Merged
merged 35 commits into from
Jun 21, 2023
Merged

chore(cli): Add OpenTelemetry #8653

merged 35 commits into from
Jun 21, 2023

Conversation

Josh-Walker-GM
Copy link
Collaborator

@Josh-Walker-GM Josh-Walker-GM commented Jun 17, 2023

Instrumenting the CLI with OpenTelemetry.

Changes
This PR introduces the following changes to the CLI:

The cli-helpers package now includes two telemetry related functions; recordTelemetryAttributes and recordTelemetryError. These functions are helpful in avoiding errors such as spans being undefined when telemetry is disabled.

The CLI index has been slightly altered to support; starting up, shutting down and recording telemetry as well as handling errors thrown by the command handler functions.

A current example of the output shown when an error is thrown by the handler functions is:
image
Note: I forced this command above to throw

The additional error message text links to our forums and github. It also provides a generated uuid unique to that command invocation. This error reference code may be useful for maintainers to identify any telemetry associated with an error a user is facing.

The telemetry workflow is also slightly different than has existed before and can be summarised as:

  1. Telemetry is started, unless disabled, which simply performs some OTel related setup.
  2. A span is started within which the yargs parse is called.
  3. Telemetry is shutdown. This ensures all span data has been written to the telemetry file. This also launches the background process send process.
  4. The background process calculates environment details such as project complexity and then reads the unsent spans from the telemetry files sending them to the redwood telemetry endpoint for storage.

This design was chosen in an effort to ensure the CLI process is minimally impacted by the telemetry. There should be no noticeable impact of telemetry to the user's experience of the CLI - i.e. not pauses at startup or shutdown.

These telemetry files exist within .redwood/telemetry and are json files with a name corresponding to Date.now() when the CLI starts. They contain the spans gathered during the command serialised to a file. These files are deleted once the spans have been sent unless REDWOOD_VERBOSE_TELEMETRY environment variable is set. In that case we maintain the last 8 telemetry files so the user has visibility into what has been gathered and sent.

Remaining

  1. (pushed) We must continue to remove and rework the command handlers which do the following:
    • Try/Catch at the top level of the handler - we now have no issue with throwing out of the handler functions.
    • Call process.exit() directly. This will prevent telemetry as we need to ensure telemetry is shutdown properly for data to be recorded and sent.
  2. Remove the existing (non-OTel) telemetry from the CLI. We can do this after we are comfortable with the OTel results.

Ping: @thedavidprice

I'm not in love with this code. Seems difficult to ensure key consisitency but I'm working around the existing code and yargs.
Switched to use a custom exporter which writes spans to a file. Then have a background job that fires when OTel shuts down to read those saved spans and send them to the collector.
This means we have one background job that runs to both compute the telemetry resources and send the data to the collector.
Move some files around and rename some. Adds in the missing telemetry fields to the resource. Rewrites the spans to disk with the added resource information for verbose visibility.
We will likely have to iterate on how the output looks.
@Josh-Walker-GM Josh-Walker-GM added the release:chore This PR is a chore (means nothing for users) label Jun 17, 2023
@Josh-Walker-GM Josh-Walker-GM self-assigned this Jun 17, 2023
@Josh-Walker-GM Josh-Walker-GM added the fixture-ok Override the test project fixture check label Jun 20, 2023
@Josh-Walker-GM
Copy link
Collaborator Author

@jtoar and I had a discussion around the best approach to getting this in. This will require changes to nearly all command handlers to remove process.exit calls and to remove unwanted try/catches. I think we agreed we'd get this PR in since it should focus on the core OTel related functionality and then follow up with PRs to address the various command handlers.

@Josh-Walker-GM Josh-Walker-GM marked this pull request as ready for review June 20, 2023 13:18
Copy link
Contributor

@jtoar jtoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @Josh-Walker-GM, left a few comments, mostly questions. I'll try to do another review today, just submitting what I have, and I know this is one of many PRs, but one question that didn't belong to any particular file: how are you testing this? Does the existing CI check for telemetry apply to these changes?

__fixtures__/test-project/web/vite.config.ts Outdated Show resolved Hide resolved
packages/cli/package.json Outdated Show resolved Hide resolved
@@ -23,6 +23,15 @@ export const handler = async ({
prisma = true,
prerender,
}) => {
recordTelemetryAttributes({
command: 'build',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note to others, @Josh-Walker-GM and I went back and forth about importing this from the corresponding file (in this case, build.js). I decided against it cause the last thing I want is a circular import. The tradeoff is overhead for us—if we ever rename a command, we have to change it here too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the fact that I also just hard coded the values for commands nested in directories like setup auth dbAuth then it did just feel a little silly to import for the simpler cases.

packages/cli/src/index.js Outdated Show resolved Hide resolved
packages/cli/src/index.js Outdated Show resolved Hide resolved
packages/cli/src/index.js Outdated Show resolved Hide resolved
}

// Legacy telemetry
errorTelemetry(process.argv, error.message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just a note for others, this will be removed eventually, but till we've completely made the switch we'll keep it here.

packages/cli/src/telemetry/exporter.js Show resolved Hide resolved
packages/cli/src/telemetry/index.js Outdated Show resolved Hide resolved
packages/cli/src/telemetry/send.js Outdated Show resolved Hide resolved
@Josh-Walker-GM
Copy link
Collaborator Author

how are you testing this? Does the existing CI check for telemetry apply to these changes?

The existing CI check simply executes a command and listens for a http packet to be received by the mock telemetry endpoint. This works for both the existing telemetry and OTel since we both end up exporting the data to a backend url. I haven't performed any packet content analysis like "did we get something from @redwoodjs/cli" because it feels like if everything worked to the point it was exported then we'd be good. Happy to revise the testing and be more rigorous if it is felt needed.

I don't have any sort of tests that call each command and expect a known telemetry output. That sort of testing strategy simply doesn't exist for the CLI right now. Would be a cool thing to build out but that would be it's own project to organist the infrastructure around that.

The helper ensure's we maintain the spawn options we need to support different platforms. It also ensures the output is written to a log file within the '.redwood' folder.

Reworks the new telemetry background process to use this helper. Introduces some relatively verbose output to the send process now that we do not need to be quiet.
@jtoar jtoar merged commit a87e8a4 into main Jun 21, 2023
11 checks passed
@jtoar jtoar deleted the jgmw-cli/initial-otel branch June 21, 2023 21:21
@redwoodjs-bot redwoodjs-bot bot added this to the next-release milestone Jun 21, 2023
@jtoar jtoar modified the milestones: next-release, v6.0.0 Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixture-ok Override the test project fixture check release:chore This PR is a chore (means nothing for users)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants