Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(framework): several error handling improvements #5001

Merged
merged 18 commits into from Sep 6, 2023
Merged

Conversation

stefreak
Copy link
Member

@stefreak stefreak commented Aug 30, 2023

What this PR does / why we need it:

  • Made sure that InternalError is only used for errors that "should not happen" and
    also to wrap unhandled errors that originate from libraries or the NodeJS standard library.

  • Stop masking leaf errors (leaf means basically the root cause).
    One of the most occuring errors on Amplitude is "graph", but that's a
    wrapping error. This will give us better insights in what error types
    actually causing command failures.

  • Fixed a bug where garden would print endless amounts of yaml with the --output option in case of simple errors. You can reproduce the bug by running garden deploy -o yaml in the Garden core repository root directory. The bug has been fixed by filtering error details early to avoid issues later.

  • Prevent infinite recursion by making sure not to call sanitizeValue from toSanitizedValue and toJSON

  • Removed dead code in the GraphNodeError.

  • Consistently handle unexpected errors and format them equally on all
    levels.

  • Format InternalErrors as crashes with an explanation
    and a call to action to report the bug on all levels

  • Remove the GardenError interface in favor of just a GardenError abstract base class. This makes GardenError recognition less error-prone because there is only one way left of recognizing a GardenError (instanceof)

  • Treat uncaught errors on the CLI level (cli.ts) as crashes, even if
    they are GardenError instances other than InternalError, because the GardenCli class should not
    throw but return an exit code.

  • Make sure that in case of crashes we print all relevant information to the terminal. Yes, that makes the log output less concise, but it increases the probability that users report the errors in a way where we can understand what happened and fix the bug.

  • Add a new CRASH issue template, and add a link to reporting crashes on GitHub in the error message.

Crashes that happen in an action handler look like this:
Screenshot 2023-08-30 at 23 43 38

Before it looked like this:
Screenshot 2023-08-31 at 01 04 55

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

core/src/actions/base.ts Outdated Show resolved Hide resolved
- Made sure that InternalError is only used for errors that "should not happen" and
  also to wrap unhandled errors that originate from libraries or the NodeJS standard library.

- Stop masking leaf errors (leaf means basically the root cause).
  One of the most occuring errors on Amplitude is "graph", but that's a
  wrapping error. This will give us better insights in what error types
  actually causing command failures.

- Consistently handle unexpected errors and format them equally on all
  levels.

- Format InternalErrors as crashes with an explanation
  and a call to action to report the bug on all levels

- Treat uncaught errors on the CLI level (cli.ts) as crashes, even if
  they are GardenError instances other than InternalError, because the GardenCli class should not
  throw but return an exit code.
core/src/graph/results.ts Outdated Show resolved Hide resolved
@@ -51,7 +51,8 @@ export function parseLogLevel(level: string): LogLevel {
lvl = LogLevel[level]
}
if (!getNumericLogLevels().includes(lvl)) {
throw new InternalError({
// This should be validated on a different level
throw new ParameterError({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is if this is actually an InternalError, because validation should happen somewhere else.

But because validation does not happen for root-level options, I changed this to ParameterError so the error message does not confuse users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise garden --log-level doesnotexist will be rendered as a crash.

core/src/cli/cli.ts Outdated Show resolved Hide resolved
@stefreak stefreak force-pushed the error-handling branch 2 times, most recently from 0d8036f to e1f53a9 Compare August 31, 2023 12:01
Previously we had an exported interface called GardenError together with
an exported abstract base class called GardenBaseError.

We had a function called isGardenError that checked for interface
conformance, and used it in some parts of the code base, and then we
also had parts of the code base that checked for `instanceof
GardenBaseError`.

This commit eliminates the GardenError interface to eliminate a class of
errors, where we fail to recognize the GardenError and treat it as a
crash.
Bugfix: Before this commit when running `garden deploy -o json`
in the Garden root project, the json was several thousands lines long

It was a little bit tricky to solve the error, because I ran into infinite recursion
issues between `toSanitizedValue` and  `sanitizeValue`

Now the -o json output looks good to me.

This commit also removes dead code in the GraphNodeError.
core/src/exceptions.ts Outdated Show resolved Hide resolved
Usually spawned child processes are expected to fail for various
reasons.
Wrap them in ChildProcessError

Also improved the error message and error type for Kubernetes apply
through kubectl.
@stefreak stefreak marked this pull request as ready for review September 1, 2023 15:14
@stefreak stefreak requested a review from mkhq September 1, 2023 15:14
@edvald
Copy link
Collaborator

edvald commented Sep 1, 2023

This is great stuff. Haven't yet done a full review, but really appreciate this effort @stefreak and I like the overall approach ✨

core/src/exceptions.ts Outdated Show resolved Hide resolved
abstract type: string
public override message: string
public detail?: D
public detail: D
Copy link
Member Author

@stefreak stefreak Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deceptive: the value is being sanitized in the constructor so the types are only correct if they are base types and non-circular. Maybe we can craft a TypeScript type that allows only base types without circular references?

On the other hand, the rabbit hole was already quite deep and maybe we can get away with a comment here.

Copy link
Member Author

@stefreak stefreak Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimBeyer and me tried to solve this problem properly, but fell into a deep rabbit hole. Our TLDR conclusion: The detail object should be eliminated. But this is out of scope for this PR. Let's merge this for now.

output,
},
})
if (err instanceof PodRunnerError) {
Copy link
Member Author

@stefreak stefreak Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw if it's not a PodRunnerError?

@stefreak stefreak added this pull request to the merge queue Sep 6, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 6, 2023
@stefreak stefreak added this pull request to the merge queue Sep 6, 2023
Merged via the queue into main with commit 1bfe7b5 Sep 6, 2023
43 checks passed
@stefreak stefreak deleted the error-handling branch September 6, 2023 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants