Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce an error level (1.1.2 branch) #147

Open
wants to merge 1 commit into
base: release-1.1.2
Choose a base branch
from

Conversation

cbeck88
Copy link
Contributor

@cbeck88 cbeck88 commented Jan 4, 2022

Sometimes, we lose connection to the sql DB, and the report server write operation fails.

This should not be logged as an error because it's not actionable and errors get converted to sentry alerts.

Err(err) => {
// Failing to publish a report is not fatal, because we attempt to publish
Err(Error::PublishReport) => {
// This is already logged adequately within

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat unrelated to this PR, but in regards to the logging inside publish_report. I see a comment noting that if it "doesn't resolve itself" then we need to take action. Could you provide context for how I might identify that it is not going to resolve itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good question, and maybe we should change the code more aggressively.

So the idea here was that we try to publish, but we don't block if we can't. Because it's not critical that we republish the report with a higher expiry number, as long as one of our publications succeeds before the key actually expires.

There are other things we could do here:

  1. Use retries, and log an error but only if e.g. 3 retries fail
  2. Keep track of how often the report publication has failed, and start blocking and looping infinitely if it fails too many times in a row.

I wanted to avoid making significant changes in a release branch, so I just wanted to adjust the log level in this commit.

Could you provide context for how I might identify that it is not going to resolve itself?

If this warning persists every time we attempt to publish, then eventually the key will expire and users can't publish.
If we manage to process a block without emitting this warning, then the system has recovered.

I don't think random connection issues are going to cause this to happen -- if we randomly lose connection to postgres and this publication action fails, it has to be resolved when we try to write block data to postgres, because that is going to block infinitely on being able to write the data, and we won't hit this action again until that action succeeds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider whether to make this do something else in master branch (1.2.0) I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants