Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector recovery from component panic #7366

Closed
djaglowski opened this issue Mar 14, 2023 · 2 comments
Closed

Collector recovery from component panic #7366

djaglowski opened this issue Mar 14, 2023 · 2 comments
Labels
area:service question Further information is requested

Comments

@djaglowski
Copy link
Member

Is your feature request related to a problem? Please describe.
There have been a number of issues in the past where the collector crashes due to an unexpected panic in a component. A couple examples:

How can the collector be more resilient to panics that originate within components?

  • Are there any panics that we can fully recover from?
  • Assuming there are at least some panics that are properly fatal, is there a strategy that we should use to crash more elegantly?

Describe the solution you'd like
Recovery from a panic is only possible within the goroutine where the panic originated. Therefore, I think we can look at two separate cases:

  1. Component Start and Shutdown funcs are called synchronously within the service package. Potentially we could recover in a useful way here.
  2. Components often start their own goroutines. If there is a recovery behavior we would like to encourage, I think the best we can do is to export a recovery function from a common package and encourage that each new goroutine should call this function immediately before doing anything else. This might be overkill in some cases, but perhaps there is a balance. e.g. call this func for long running or complex goroutines.

Describe alternatives you've considered
open-telemetry/opentelemetry-collector-contrib#16598 is a possible strategy for crashing more elegantly.

@djaglowski djaglowski added question Further information is requested area:service labels Mar 14, 2023
@mx-psi
Copy link
Member

mx-psi commented Mar 15, 2023

Relates to #3955 (or duplicates it?)

@djaglowski
Copy link
Member Author

Thanks for pointing out the duplicate @mx-psi. I'll close this one and update the tracking issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:service question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants