Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement graceful shutdown with pipeline draining #483

Closed
tigrannajaryan opened this issue Jan 3, 2020 · 11 comments
Closed

Implement graceful shutdown with pipeline draining #483

tigrannajaryan opened this issue Jan 3, 2020 · 11 comments
Assignees
Labels
area:pipeline area:processor help wanted Good issue for contributors to OpenTelemetry Service to pick up
Milestone

Comments

@tigrannajaryan
Copy link
Member

Currently when shutting down any data that is in the pipeline is lost. We need to drain the pipeline during shutdown. First we need to shutdown receivers, then processors (and implement draining in processors that stores data), then exporters.

@joe-elliott
Copy link
Contributor

I may be misunderstanding the need, but except for the processors this appears to be done:

func (app *Application) shutdownPipelines() {
// Shutdown order is the reverse of building: first receivers, then flushing pipelines
// giving senders a chance to send all their data. This may take time, the allowed
// time should be part of configuration.
app.logger.Info("Stopping receivers...")
app.builtReceivers.StopAll()
// TODO: shutdown processors by calling Shutdown() for each processor in the
// order they are arranged in the pipeline.
app.logger.Info("Shutting down exporters...")
app.exporters.ShutdownAll()
}

@tigrannajaryan
Copy link
Member Author

except for the processors

Yes, this is the missing part. Processors support partially coming in #488

We still need to implement draining in processors (particularly in batch and queue).

@tigrannajaryan tigrannajaryan added the help wanted Good issue for contributors to OpenTelemetry Service to pick up label May 13, 2020
@flands flands added this to To do in Collector May 22, 2020
@tigrannajaryan tigrannajaryan removed this from To do in Collector May 29, 2020
@flands flands added this to the GA 1.0 milestone Jun 18, 2020
@jan25
Copy link

jan25 commented Jun 28, 2020

Hi! Can i work on this?

After quick look at queued_processor, this is a rough idea to fix - On shutdown have a deadline of 5s or until queue is emptied(whichever happens first). Also, We can have stats on how many spans were flushed and dropped during shutdown process. WDYT?

I see there is Stop() in queued_processor, any hints where is that used?

Similar idea can be applied to batch_processor too.

@tigrannajaryan
Copy link
Member Author

@jan25 do you still want to work on this?

We will need to drain the pipeline by first stopping receivers, then stopping processors in the right order, then stop exporters. When stopping we will need to wait for up to certain max time, you are right. It needs to be done for all components, not just certain processors.

@jan25
Copy link

jan25 commented Oct 3, 2020

@tigrannajaryan Sure, I'll start the work and open a PR soon

@tigrannajaryan
Copy link
Member Author

@jan25 I will remove this from 1.0 milestone, since it is not a must-have, but feel free to work on it. We will accept it when it is ready.

@kamalmarhubi
Copy link

@tigrannajaryan Is there a suggested workaround to avoid dropping messages during collector upgrades?

@tigrannajaryan
Copy link
Member Author

There is no known workaround.

@kamalmarhubi
Copy link

@tigrannajaryan in that case, I feel like this should be added back to the GA milestone? Dropping things on the floor during upgrades is not great...

@tigrannajaryan
Copy link
Member Author

@kamalmarhubi Help is wanted on this one. If there are contributors who can submit a PR before GA, we will be happy to review it.

@bogdandrutu
Copy link
Member

I think this is done correct?

MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
…lNewPipeline (open-telemetry#483)

Co-authored-by: Rahul Patel <rghetia@yahoo.com>
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
* Bump version to 0.1.0

* Remove redundant info from README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:pipeline area:processor help wanted Good issue for contributors to OpenTelemetry Service to pick up
Projects
None yet
Development

No branches or pull requests

7 participants