chore(cli): Add OpenTelemetry #8653

Josh-Walker-GM · 2023-06-17T13:34:38Z

Instrumenting the CLI with OpenTelemetry.

Changes
This PR introduces the following changes to the CLI:

The cli-helpers package now includes two telemetry related functions; recordTelemetryAttributes and recordTelemetryError. These functions are helpful in avoiding errors such as spans being undefined when telemetry is disabled.

The CLI index has been slightly altered to support; starting up, shutting down and recording telemetry as well as handling errors thrown by the command handler functions.

A current example of the output shown when an error is thrown by the handler functions is:

Note: I forced this command above to throw

The additional error message text links to our forums and github. It also provides a generated uuid unique to that command invocation. This error reference code may be useful for maintainers to identify any telemetry associated with an error a user is facing.

The telemetry workflow is also slightly different than has existed before and can be summarised as:

Telemetry is started, unless disabled, which simply performs some OTel related setup.
A span is started within which the yargs parse is called.
Telemetry is shutdown. This ensures all span data has been written to the telemetry file. This also launches the background process send process.
The background process calculates environment details such as project complexity and then reads the unsent spans from the telemetry files sending them to the redwood telemetry endpoint for storage.

This design was chosen in an effort to ensure the CLI process is minimally impacted by the telemetry. There should be no noticeable impact of telemetry to the user's experience of the CLI - i.e. not pauses at startup or shutdown.

These telemetry files exist within .redwood/telemetry and are json files with a name corresponding to Date.now() when the CLI starts. They contain the spans gathered during the command serialised to a file. These files are deleted once the spans have been sent unless REDWOOD_VERBOSE_TELEMETRY environment variable is set. In that case we maintain the last 8 telemetry files so the user has visibility into what has been gathered and sent.

Remaining

(pushed) We must continue to remove and rework the command handlers which do the following:
- Try/Catch at the top level of the handler - we now have no issue with throwing out of the handler functions.
- Call process.exit() directly. This will prevent telemetry as we need to ensure telemetry is shutdown properly for data to be recorded and sent.
Remove the existing (non-OTel) telemetry from the CLI. We can do this after we are comfortable with the OTel results.

Ping: @thedavidprice

I'm not in love with this code. Seems difficult to ensure key consisitency but I'm working around the existing code and yargs.

Switched to use a custom exporter which writes spans to a file. Then have a background job that fires when OTel shuts down to read those saved spans and send them to the collector. This means we have one background job that runs to both compute the telemetry resources and send the data to the collector.

Move some files around and rename some. Adds in the missing telemetry fields to the resource. Rewrites the spans to disk with the added resource information for verbose visibility.

We will likely have to iterate on how the output looks.

I wouldn't have written this myself but I seen it used in the code so I may as well keep using it.

Stopped because of some many edge or special cases where the CLI immediately exists with non-zero exit code.

Josh-Walker-GM · 2023-06-20T13:18:44Z

@jtoar and I had a discussion around the best approach to getting this in. This will require changes to nearly all command handlers to remove process.exit calls and to remove unwanted try/catches. I think we agreed we'd get this PR in since it should focus on the core OTel related functionality and then follow up with PRs to address the various command handlers.

jtoar

Great work @Josh-Walker-GM, left a few comments, mostly questions. I'll try to do another review today, just submitting what I have, and I know this is one of many PRs, but one question that didn't belong to any particular file: how are you testing this? Does the existing CI check for telemetry apply to these changes?

__fixtures__/test-project/web/vite.config.ts

packages/cli/package.json

jtoar · 2023-06-20T21:33:41Z

packages/cli/src/commands/buildHandler.js

@@ -23,6 +23,15 @@ export const handler = async ({
  prisma = true,
  prerender,
 }) => {
+  recordTelemetryAttributes({
+    command: 'build',


Just a note to others, @Josh-Walker-GM and I went back and forth about importing this from the corresponding file (in this case, build.js). I decided against it cause the last thing I want is a circular import. The tradeoff is overhead for us—if we ever rename a command, we have to change it here too.

and the fact that I also just hard coded the values for commands nested in directories like setup auth dbAuth then it did just feel a little silly to import for the simpler cases.

packages/cli/src/index.js

jtoar · 2023-06-20T21:43:23Z

packages/cli/src/index.js

+      }
+
+      // Legacy telemetry
+      errorTelemetry(process.argv, error.message)


Mostly just a note for others, this will be removed eventually, but till we've completely made the switch we'll keep it here.

packages/cli/src/telemetry/exporter.js

packages/cli/src/telemetry/index.js

packages/cli/src/telemetry/send.js

packages/cli-helpers/src/telemetry/index.ts

Josh-Walker-GM · 2023-06-20T22:04:42Z

how are you testing this? Does the existing CI check for telemetry apply to these changes?

The existing CI check simply executes a command and listens for a http packet to be received by the mock telemetry endpoint. This works for both the existing telemetry and OTel since we both end up exporting the data to a backend url. I haven't performed any packet content analysis like "did we get something from @redwoodjs/cli" because it feels like if everything worked to the point it was exported then we'd be good. Happy to revise the testing and be more rigorous if it is felt needed.

I don't have any sort of tests that call each command and expect a known telemetry output. That sort of testing strategy simply doesn't exist for the CLI right now. Would be a cool thing to build out but that would be it's own project to organist the infrastructure around that.

packages/cli/src/index.js

packages/cli/src/telemetry/resource.js

The helper ensure's we maintain the spawn options we need to support different platforms. It also ensures the output is written to a log file within the '.redwood' folder. Reworks the new telemetry background process to use this helper. Introduces some relatively verbose output to the send process now that we do not need to be quiet.

Josh-Walker-GM added 16 commits June 13, 2023 00:31

Initial OTel kernel for CLI

4f7622e

Add helper functions for telemetry

2f9b6a6

Add telemetry for info command

7600f88

Capture and record top level errors

3f0daf3

WIP compute of telemetry details in the background

f9553bb

Merge remote-tracking branch 'origin/main' into jgmw-cli/initial-otel

da8f441

wip changes to the background compute

c12f3a8

Instrument CLI commands

76af8aa

I'm not in love with this code. Seems difficult to ensure key consisitency but I'm working around the existing code and yargs.

Re-enable existing telemetry and remove error messaging

007d69a

Revert change to locking.js

15d6211

Refactor the new telemetry setup

7fd1f18

Move some files around and rename some. Adds in the missing telemetry fields to the resource. Rewrites the spans to disk with the added resource information for verbose visibility.

Remove stray console logs and remove events from spans.

69bf7bf

Produce top level error output and record an error ref. code

dcccd81

We will likely have to iterate on how the output looks.

fix experiments and webBundler resource values

ee48285

Merge remote-tracking branch 'origin/main' into jgmw-cli/initial-otel

edb85ad

Josh-Walker-GM added the release:chore This PR is a chore (means nothing for users) label Jun 17, 2023

Josh-Walker-GM self-assigned this Jun 17, 2023

Josh-Walker-GM added 8 commits June 17, 2023 22:37

Tidy telemetry/index.js

723798c

Record top level errors with legacy telemetry also

7ff5308

Use error exit code if it exists

f8656ad

I wouldn't have written this myself but I seen it used in the code so I may as well keep using it.

Started to remove the handler level try/catch.

9fde37f

Stopped because of some many edge or special cases where the CLI immediately exists with non-zero exit code.

Remove unused imports

dd352f3

Add back try/catch for the lint command

4fcc9d6

Update test-fixture

d9dc218

Revert and just hardcode the command name

b3644c3

Josh-Walker-GM added the fixture-ok Override the test project fixture check label Jun 20, 2023

Josh-Walker-GM requested a review from jtoar June 20, 2023 13:18

Josh-Walker-GM marked this pull request as ready for review June 20, 2023 13:18

jtoar reviewed Jun 20, 2023

View reviewed changes

packages/cli-helpers/src/telemetry/index.ts Show resolved Hide resolved

jtoar reviewed Jun 20, 2023

View reviewed changes

packages/cli/src/index.js Show resolved Hide resolved

jtoar reviewed Jun 20, 2023

View reviewed changes

packages/cli/src/telemetry/resource.js Outdated Show resolved Hide resolved

Josh-Walker-GM added 8 commits June 21, 2023 13:31

Revert change to vite.config.ts

a0cc975

Add missing otel api dep

7360b23

Make use of fs-extra json funcs

d731827

Replace require with import

80ac2d5

Simplify the exitCode setting

5a3bee7

Fix telemetry yargs option

4323131

lint

9ce4a13

jtoar approved these changes Jun 21, 2023

View reviewed changes

jtoar added 3 commits June 21, 2023 13:26

refactor telemetry enabling

0055abb

Merge branch 'main' into jgmw-cli/initial-otel

4f5c48d

Merge branch 'main' into jgmw-cli/initial-otel

8fcb44e

jtoar merged commit a87e8a4 into main Jun 21, 2023
11 checks passed

jtoar deleted the jgmw-cli/initial-otel branch June 21, 2023 21:21

redwoodjs-bot bot added this to the next-release milestone Jun 21, 2023

jtoar modified the milestones: next-release, v6.0.0 Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(cli): Add OpenTelemetry #8653

chore(cli): Add OpenTelemetry #8653

Josh-Walker-GM commented Jun 17, 2023 •

edited

Josh-Walker-GM commented Jun 20, 2023

jtoar left a comment •

edited

jtoar Jun 20, 2023

Josh-Walker-GM Jun 20, 2023

jtoar Jun 20, 2023

Josh-Walker-GM commented Jun 20, 2023

chore(cli): Add OpenTelemetry #8653

chore(cli): Add OpenTelemetry #8653

Conversation

Josh-Walker-GM commented Jun 17, 2023 • edited

Josh-Walker-GM commented Jun 20, 2023

jtoar left a comment • edited

Choose a reason for hiding this comment

jtoar Jun 20, 2023

Choose a reason for hiding this comment

Josh-Walker-GM Jun 20, 2023

Choose a reason for hiding this comment

jtoar Jun 20, 2023

Choose a reason for hiding this comment

Josh-Walker-GM commented Jun 20, 2023

Josh-Walker-GM commented Jun 17, 2023 •

edited

jtoar left a comment •

edited