You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
There have been a number of issues in the past where the collector crashes due to an unexpected panic in a component. A couple examples:
How can the collector be more resilient to panics that originate within components?
Are there any panics that we can fully recover from?
Assuming there are at least some panics that are properly fatal, is there a strategy that we should use to crash more elegantly?
Describe the solution you'd like
Recovery from a panic is only possible within the goroutine where the panic originated. Therefore, I think we can look at two separate cases:
Component Start and Shutdown funcs are called synchronously within the service package. Potentially we could recover in a useful way here.
Components often start their own goroutines. If there is a recovery behavior we would like to encourage, I think the best we can do is to export a recovery function from a common package and encourage that each new goroutine should call this function immediately before doing anything else. This might be overkill in some cases, but perhaps there is a balance. e.g. call this func for long running or complex goroutines.
Is your feature request related to a problem? Please describe.
There have been a number of issues in the past where the collector crashes due to an unexpected panic in a component. A couple examples:
How can the collector be more resilient to panics that originate within components?
Describe the solution you'd like
Recovery from a panic is only possible within the goroutine where the panic originated. Therefore, I think we can look at two separate cases:
Start
andShutdown
funcs are called synchronously within theservice
package. Potentially we could recover in a useful way here.Describe alternatives you've considered
open-telemetry/opentelemetry-collector-contrib#16598 is a possible strategy for crashing more elegantly.
The text was updated successfully, but these errors were encountered: