Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: exit on error with some exception #66

Merged
merged 14 commits into from
Dec 13, 2022
Merged

Conversation

ratik
Copy link
Contributor

@ratik ratik commented Nov 25, 2022

As discussed we don't need to resubmit msg as described in the task instead we exit on every error but ones defined in exception.
So default behavior: it will exit on all errors but contract errors

// check error with regexp
if !r.ignoreErrorsRegexp.MatchString(err.Error()) {
r.logger.Error("failed to submit tx proof", zap.Error(err))
os.Exit(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only thing is bothering me

  1. We ignore all our graceful shutdown
  2. I reuse ProcessAndsubmit https://github.com/neutron-org/neutron-query-relayer/blob/feat/resubmit-failed-txs/internal/webserver/webserver.go#L87 which in case of network/gas error just stops the daemon with os.Exit(3)

I would consider to pass a critical error up on the stack

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely so, we shouldn't just os.Exit() here

@zavgorodnii
Copy link
Collaborator

this PR also required a comprehensive description in the docs (with motivation, examples & suggestions)

@ratik
Copy link
Contributor Author

ratik commented Nov 30, 2022

this PR also required a comprehensive description in the docs (with motivation, examples & suggestions)

neutron-org/neutron-docs#36
@oopcode @swelf19

zavgorodnii
zavgorodnii previously approved these changes Nov 30, 2022
@@ -80,6 +80,7 @@ func (r *Relayer) Run(
if err != nil {
r.logger.Error("could not process message", zap.Uint64("query_id", query.Id), zap.Error(err))
neutronmetrics.AddFailedRequest(string(query.QueryType), time.Since(start).Seconds())
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we stop the app in case of any error in the processMessageKV and processMessageTX, regardless is the error happends during tx submition(we want to stop), or during some KV query on the remote chain rpc(we dont want to stop). We need to detect the error is critical and stop only this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition, this will stop the app if any error in the whole processMessageTX has happened (e.g. if buildTxQuery failed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sotnikov-s ok, thanks. added this error into ignore pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swelf19 fixed

internal/txprocessor/txprocessor.go Outdated Show resolved Hide resolved
internal/txprocessor/txprocessor.go Outdated Show resolved Hide resolved
@@ -80,6 +80,7 @@ func (r *Relayer) Run(
if err != nil {
r.logger.Error("could not process message", zap.Uint64("query_id", query.Id), zap.Error(err))
neutronmetrics.AddFailedRequest(string(query.QueryType), time.Since(start).Seconds())
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition, this will stop the app if any error in the whole processMessageTX has happened (e.g. if buildTxQuery failed)

internal/txprocessor/txprocessor.go Outdated Show resolved Hide resolved
if err != nil {
return fmt.Errorf("failed to store tx: %w", err)
// check error with regexp
if !r.ignoreErrorsRegexp.MatchString(err.Error()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO TxProcessor should have nothing to do with this new piece of logic. It's Relayer who decides to ignore err (just log it) or to crash. TxProcessor logic should remain intact in this scope.

Instead, enhance Relayer.processMessageTX and Relayer.processMessageKV methods the way they return a custom error, say, SubmissionError, which contains error severity level:

  • critical for errors that came from TxProcessor.ProcessAndSubmit and didn't match regexp;
  • info for all other errors.
    Then, the Relayer switches on errors severity (here) and decides whether to stop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes and they are look like what you described except info thing. Could you please check and let me know if you are happy with these ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is better, but still critical error handling logic is spread between two entities: relayer and txprocessor. I think it would be better to move all critical-noncritical error handling logic to the relayer

internal/config/config.go Outdated Show resolved Hide resolved
.env.example Outdated
@@ -31,6 +31,7 @@ RELAYER_QUERIES_TASK_QUEUE_CAPACITY=10000
RELAYER_CHECK_SUBMITTED_TX_STATUS_DELAY=10s
RELAYER_INITIAL_TX_SEARCH_OFFSET=0
RELAYER_PROMETHEUS_PORT=9999
RELAYER_IGNORE_ERRORS_REGEX=(execute wasm contract failed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't forget to update readme and documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, as soon as we merge, will update docs

internal/txprocessor/txprocessor.go Outdated Show resolved Hide resolved
if err != nil {
return fmt.Errorf("failed to store tx: %w", err)
// check error with regexp
if !r.ignoreErrorsRegexp.MatchString(err.Error()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is better, but still critical error handling logic is spread between two entities: relayer and txprocessor. I think it would be better to move all critical-noncritical error handling logic to the relayer

@ratik
Copy link
Contributor Author

ratik commented Dec 6, 2022

this is better, but still critical error handling logic is spread between two entities: relayer and txprocessor. I think it would be better to move all critical-noncritical error handling logic to the relayer

@sotnikov-s can't get how can you manage flow in txProcessor and depend on configuration when you move logic to the relayer

@sotnikov-s
Copy link
Contributor

this is better, but still critical error handling logic is spread between two entities: relayer and txprocessor. I think it would be better to move all critical-noncritical error handling logic to the relayer

@sotnikov-s can't get how can you manage flow in txProcessor and depend on configuration when you move logic to the relayer

okay, I agree that it's not good. also: I wouldn't like the critical error to be confounded with an ordinary error, e.g. I don't like the strings comparison as the check for this case. I decided to try to write it myself instead of describing my thoughts here. please take a look: #67

@ratik ratik requested a review from sotnikov-s December 9, 2022 10:06
Copy link
Contributor

@sotnikov-s sotnikov-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested locally: regex and critical error handling work as expected

@zavgorodnii zavgorodnii merged commit 1cc1354 into main Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants