Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oasis-node crashes, if metrics in push mode and PGW crashes #2936

Closed
matevz opened this issue May 22, 2020 · 2 comments · Fixed by #2941
Closed

oasis-node crashes, if metrics in push mode and PGW crashes #2936

matevz opened this issue May 22, 2020 · 2 comments · Fixed by #2941
Labels
c:bug Category: bug c:instrumentation Category: metrics and tracing

Comments

@matevz
Copy link
Member

matevz commented May 22, 2020

Currently, if oasis-node is run in metrics push mode and push gateway unexpectedly crashes, so does oasis-node.

Improve oasis-node Prometheus pusher code as follows:

  1. If PGW is not responding, try to reconnect.
  2. When starting oasis-node in push mode and if PGW is not accessible, try to reconnect instead of failing to start.
@matevz matevz added c:bug Category: bug c:instrumentation Category: metrics and tracing labels May 22, 2020
@kostko
Copy link
Member

kostko commented May 22, 2020

How does it crash, any stack traces?

@Yawning
Copy link
Contributor

Yawning commented May 26, 2020

If PGW is not responding, try to reconnect.

Make the upstream code not brain-damaged then.

https://pkg.go.dev/github.com/prometheus/client_golang/prometheus/push?tab=doc#Pusher.Push

More specifically, our code assumes that each call to Pusher.Push can succeed or fail independently, which is incorrect. Once a call to Push fails, it will continue to fail, returning the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:bug Category: bug c:instrumentation Category: metrics and tracing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants