Skip to content
This repository was archived by the owner on Jan 12, 2026. It is now read-only.

Agent CLI should be less-brittle when starting#207

Merged
acunniffe merged 7 commits intoopticdev:developfrom
acunniffe:qa-fixes
Jun 3, 2020
Merged

Agent CLI should be less-brittle when starting#207
acunniffe merged 7 commits intoopticdev:developfrom
acunniffe:qa-fixes

Conversation

@acunniffe
Copy link
Copy Markdown
Member

I made some changes to the Agent CLI to make it less brittle in production.

The run command would crash on any invalid input to --config. Invalid tokens, strings, json of a different shape, etc. I wrapped that destabilization in a try/catch to we can branch the code.

Branch 1 is followed when your start config and monitoring config is valid and starts the agent as usual

Branch 2 is followed when the start config is valid and the monitoring config is not. It starts the API and proxy as usual and swaps out the agent persistence manager for the DoNothingCapture -- just a proxy with no recording, batching, etc :)

Optic CI API having downtime, or some accidental clearing of the config environment variable should never affect your ability to deploy your code to prod.

  • I also renamed --masquerade to --listen to match other proxies/http CLIs. Also, the context around the word masquerade is not welcome cognitive load for security teams.

//@ts-ignore
import jwtDecode from 'jwt-decode';
//@ts-ignore
import niceTry from 'nice-try';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 swallowing errors, pretty much never the thing we want to do. I totally agree that the UX should be nice and that Optic service being down shouldn't stop services. Swallowing errors, however, is the total enemy to visibility and complete ignores that thrown errors could have put the process in an undefined state.

I don't see it used anywhere, but it shouldn't even be imported. We have to do the work of actually hardening this, not just hide the issues!

@JaapRood
Copy link
Copy Markdown
Contributor

JaapRood commented Jun 2, 2020

Optic CI API having downtime, or some accidental clearing of the config environment variable should never affect your ability to deploy your code to prod.

Totally agree here. Expected errors in operating of a distributed system should never bring the process down. Network partitions, downtime of other systems, all of that is expected.

However, misconfiguration should always be as loud as possible and fail a deployment. There's nothing as frustration when running multiple instances on servers, with log outputs all collated, to have no idea why things aren't working, only to find out there is 1 tiny log line amongst millions indicating a failed parse somewhere, causing an entire branch to go ignored.

When users choose to deploy Optic in a place, that's a direct intent, so if they have misconfigured things, it's our jobs to let them know.

@acunniffe
Copy link
Copy Markdown
Member Author

acunniffe commented Jun 2, 2020

Good points @JaapRood -- I'll find a better balance between preventing Optic from being the thing that takes down your production infra and providing better visibility.

Pending :)

@JaapRood
Copy link
Copy Markdown
Contributor

JaapRood commented Jun 2, 2020

Past-Jaap who spent hours sifting through aggregated logs looking for similar issues thanks you!

Copy link
Copy Markdown
Contributor

@JaapRood JaapRood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@acunniffe acunniffe merged commit a4224f8 into opticdev:develop Jun 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants