-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: exit on error with some exception #66
Conversation
internal/txprocessor/txprocessor.go
Outdated
// check error with regexp | ||
if !r.ignoreErrorsRegexp.MatchString(err.Error()) { | ||
r.logger.Error("failed to submit tx proof", zap.Error(err)) | ||
os.Exit(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only thing is bothering me
- We ignore all our graceful shutdown
- I reuse ProcessAndsubmit https://github.com/neutron-org/neutron-query-relayer/blob/feat/resubmit-failed-txs/internal/webserver/webserver.go#L87 which in case of network/gas error just stops the daemon with os.Exit(3)
I would consider to pass a critical error up on the stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely so, we shouldn't just os.Exit() here
this PR also required a comprehensive description in the docs (with motivation, examples & suggestions) |
neutron-org/neutron-docs#36 |
internal/relay/relayer.go
Outdated
@@ -80,6 +80,7 @@ func (r *Relayer) Run( | |||
if err != nil { | |||
r.logger.Error("could not process message", zap.Uint64("query_id", query.Id), zap.Error(err)) | |||
neutronmetrics.AddFailedRequest(string(query.QueryType), time.Since(start).Seconds()) | |||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we stop the app in case of any error in the processMessageKV
and processMessageTX
, regardless is the error happends during tx submition(we want to stop), or during some KV query on the remote chain rpc(we dont want to stop). We need to detect the error is critical and stop only this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in addition, this will stop the app if any error in the whole processMessageTX
has happened (e.g. if buildTxQuery
failed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sotnikov-s ok, thanks. added this error into ignore pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swelf19 fixed
internal/relay/relayer.go
Outdated
@@ -80,6 +80,7 @@ func (r *Relayer) Run( | |||
if err != nil { | |||
r.logger.Error("could not process message", zap.Uint64("query_id", query.Id), zap.Error(err)) | |||
neutronmetrics.AddFailedRequest(string(query.QueryType), time.Since(start).Seconds()) | |||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in addition, this will stop the app if any error in the whole processMessageTX
has happened (e.g. if buildTxQuery
failed)
if err != nil { | ||
return fmt.Errorf("failed to store tx: %w", err) | ||
// check error with regexp | ||
if !r.ignoreErrorsRegexp.MatchString(err.Error()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO TxProcessor
should have nothing to do with this new piece of logic. It's Relayer
who decides to ignore err (just log it) or to crash. TxProcessor
logic should remain intact in this scope.
Instead, enhance Relayer.processMessageTX
and Relayer.processMessageKV
methods the way they return a custom error, say, SubmissionError
, which contains error severity level:
critical
for errors that came fromTxProcessor.ProcessAndSubmit
and didn't match regexp;info
for all other errors.
Then, theRelayer
switches on errors severity (here) and decides whether to stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes and they are look like what you described except info
thing. Could you please check and let me know if you are happy with these ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is better, but still critical error handling logic is spread between two entities: relayer and txprocessor. I think it would be better to move all critical-noncritical error handling logic to the relayer
.env.example
Outdated
@@ -31,6 +31,7 @@ RELAYER_QUERIES_TASK_QUEUE_CAPACITY=10000 | |||
RELAYER_CHECK_SUBMITTED_TX_STATUS_DELAY=10s | |||
RELAYER_INITIAL_TX_SEARCH_OFFSET=0 | |||
RELAYER_PROMETHEUS_PORT=9999 | |||
RELAYER_IGNORE_ERRORS_REGEX=(execute wasm contract failed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't forget to update readme and documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, as soon as we merge, will update docs
if err != nil { | ||
return fmt.Errorf("failed to store tx: %w", err) | ||
// check error with regexp | ||
if !r.ignoreErrorsRegexp.MatchString(err.Error()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is better, but still critical error handling logic is spread between two entities: relayer and txprocessor. I think it would be better to move all critical-noncritical error handling logic to the relayer
@sotnikov-s can't get how can you manage flow in txProcessor and depend on configuration when you move logic to the relayer |
okay, I agree that it's not good. also: I wouldn't like the critical error to be confounded with an ordinary error, e.g. I don't like the strings comparison as the check for this case. I decided to try to write it myself instead of describing my thoughts here. please take a look: #67 |
fix: tiny fixes for the crit errors implementation
[NTRN-194] feat: add critical tx submit error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested locally: regex and critical error handling work as expected
As discussed we don't need to resubmit msg as described in the task instead we exit on every error but ones defined in exception.
So default behavior: it will exit on all errors but contract errors