Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods to Flush and Close a tracer #45

Merged
merged 6 commits into from
Mar 26, 2020

Conversation

rnburn
Copy link
Contributor

@rnburn rnburn commented Mar 4, 2020

No description provided.

@reyang
Copy link
Member

reyang commented Mar 4, 2020

Logically this change makes sense.
I have the following stepping back questions:

  1. In the specification, it seems flush will be treated as a SpanProcessor feature rather than Tracer feature. Add flush interface to span processor opentelemetry-specification#370
  2. The name FlushWithMicroseconds sounds weird to me. How about Flush(timeout_in_microseconds = INFINITE)?
  3. The return value concerns me. There could be multiple scenarios:
  • Flush succeeded, yeah!
  • Flush timed out, we don't know if the other side received the data or not.
  • Flush failed, and we know what has happened (e.g. the disk is full, so we return immediately for a file exporter; DNS resolution failure while we try to export to an HTTP endpoint).
  • Flush failed, and we don't know what has happened (e.g. connection reset, the server side might or might not receive the data).
  • Flush partially succeeded (e.g. some data in the buffer is too old and doesn't make sense to get exported - e.g. device scenario where a device wakes up after sleeping for months; we receive HTTP Partial Success from the ingestion service, that only part of the data has been accepted).

@rnburn
Copy link
Contributor Author

rnburn commented Mar 4, 2020

  1. In the specification, it seems flush will be treated as a SpanProcessor feature rather than Tracer feature. Add flush interface to span processor opentelemetry-specification#370

Tracer can forward to any attached SpanProcessors.

  1. The name FlushWithMicroseconds sounds weird to me. How about Flush(timeout_in_microseconds = INFINITE)?

There are non-virtual Flush and Close functions. You call them with the C++11 chrono types (e.g. tracer->Flush(std::chrono::milliseconds{5}).)

FlushWithMicroseconds is to have a virtual function with a stable ABI -- I wouldn't expect most users to invoke it directly. (But I'm open to tweaking the naming).

  1. The return value concerns me.

I'll remove the return value.

@reyang
Copy link
Member

reyang commented Mar 5, 2020

  1. In the specification, it seems flush will be treated as a SpanProcessor feature rather than Tracer feature. open-telemetry/opentelemetry-specification#370

Tracer can forward to any attached SpanProcessors.

Do we want to get Tracer.Flush into the OpenTelemetry specification? In general what's our position on exposing additional APIs that are not spec'ed out?

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good to me.
I've put some stepping back questions for discussion.

@pyohannes
Copy link
Contributor

Do we want to get Tracer.Flush into the OpenTelemetry specification? In general what's our position on exposing additional APIs that are not spec'ed out?

I have concerns about having those methods methods (particularly the Close method) on the API level. Those shouldn't block the merge, but I nevertheless want to express them.

As far as I understand, the API layer is strictly targeting instrumenters, while the SDK layer is targeting application operators. I think instrumenters aren't supposed to have the ability to close a tracer. If there's a Close method on the API level, an application operator using OT has to be aware that any (evil) OT-enabled library he's using might possible close the Tracer he has initialized.

I guess that's the reason why those methods aren't in the spec (and why they most likely will not make it into the spec).

@rnburn
Copy link
Contributor Author

rnburn commented Mar 6, 2020

Applications need the ability to exit quickly.

We can't have a tracer implementation hang up an application while it blocks on a socket for an indefinite amount of time to send buffered spans to a collection point.

At the same time, some applications may want to make an effort to send buffered spans before they exit. The amount of time an application would be willing to wait will, of course, vary on the use case.

That's why something like Close is necessary. If you don't call Close, then an app should be able to exit immediately without waiting on a tracer. Otherwise, it can Close with a timeout to allow the tracer to cleanly stop.

@pyohannes
Copy link
Contributor

Applications need the ability to exit quickly.

I 100% agree.

Thinking a bit more about this, I wonder it TracerProvider wouldn't be better candidate to have a Close method? If I understand correctly, a TracerProvider can return different Tracers (hence the library and version arguments to the Get method). To exit the application, one would need to close all tracers that are floating around.

@reyang
Copy link
Member

reyang commented Mar 10, 2020

FYI - there was a discussion in the OpenTelemetry specification around flush API design.
open-telemetry/opentelemetry-specification#351

@rnburn
Copy link
Contributor Author

rnburn commented Mar 10, 2020

@reyang - sorry I missed the meeting. I added open-telemetry/opentelemetry-specification#351 (comment) to describe the Flush use-case.

@rnburn rnburn merged commit f5bd54c into open-telemetry:master Mar 26, 2020
GerHobbelt pushed a commit to GerHobbelt/opentelemetry-cpp that referenced this pull request Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants