Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure resolver improvements #14148

Merged
merged 5 commits into from
Mar 1, 2020
Merged

Failure resolver improvements #14148

merged 5 commits into from
Mar 1, 2020

Conversation

caithagoras
Copy link
Contributor

@caithagoras caithagoras commented Feb 24, 2020

Most of this PR is refactoring, with a few functional changes:

  • Each failures resolvers can be enabled and disabled individually.
  • Verification was skipped when a checksum query fails with COMPILER_ERROR. It is now marked as FAILED_RESOLVED.
  • Only control checksum query failed with COMPILER_ERROR will now gets resolved; test and determinism analysis checksum query will not.
  • Resolve message updated.
== RELEASE NOTES ==

Verifier Changes
* Add support to disable individual failure resolvers (:pr:`14148`).
* Add support to auto-resolve control checksum query failures with ``COMPILER_ERROR`,
  instead of skipping the verification.

@caithagoras caithagoras changed the title Auto resolve checksum query failure due to array comparison Resolve checksum query compiler error instead of skipping it Feb 24, 2020
@caithagoras caithagoras changed the title Resolve checksum query compiler error instead of skipping it Failure resolver improvements Feb 24, 2020
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caithagoras This is quite useful addition. Would you update the commit message to add an explanation of how to enable individual resolvers?

Typos in the commit message:

- Sepaerate specfic resolver config into separate class.

->

- Separate specific resolver config into separate class.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caithagoras

Distinguish between control, test, and determinism analysis checksum

This is great. Thanks!

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caithagoras

Resolve checksum query compiler error instead of skipping it

This is very useful. Thanks. I feel that verifier is evolving into pretty sophisticated tool and therefore would benefit from having a documentation that calls out all the different features. There was an effort to build sucj documentation at some point. How would you feel about resurrecting that and pushing it forward?

@mbasmanova
Copy link
Contributor

@caithagoras Overall looks great. This is a nice set of useful functionalities. I only have some minor structural comments. I'm also thinking that release notes need to call out all the improvements being made and include links or inline instructions on how to disable individual resolvers.

@caithagoras
Copy link
Contributor Author

caithagoras commented Feb 27, 2020

@mbasmanova Documentation in a separate PR: #14153

@caithagoras
Copy link
Contributor Author

@mbasmanova Comments addressed. Updated commit messages and release notes.

Cluster connection failure and Presto query failure are two different
types of QueryException. Separate them into 2 sub-classes of
QueryException for better encapsulation.

Remove unused method in AbstractVerification
@caithagoras caithagoras reopened this Feb 28, 2020
This abstract implementation of FailureResolver made inconsistent
assumption. It supports specifying the expected QueryStage in order
for resolution to happen, but it exposes an abstract method
resolveTestQueryFailure, which assumes test query are failed.

Instead, we should be able to resolve checksum query failure as well.

Also, update resolve message.
For each failure resolver, a configuration property is available to
enable and disable the resolver, in the format of
<name>.failure-resolver.enable, e.g. too-many-open-partitions.failure-
resolver.enable.

The failure resolvers can still be disabled altogether by
failure-resolver.enabled.

Also,
- Do not require factory class for simple FailureResolver.
- Simplify FailureResolverFactoryContext.
- Seperate specfic resolver config into separate class.
Also, only resolve for control checksum failure.
@caithagoras caithagoras merged commit 66e4101 into prestodb:master Mar 1, 2020
@caithagoras caithagoras deleted the r2 branch March 1, 2020 04:26
@caithagoras caithagoras mentioned this pull request Mar 5, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants