Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update untraced to suppress logging "Calling finish on an ended Span" warnings #1620

Merged

Conversation

yoheyk
Copy link
Contributor

@yoheyk yoheyk commented Mar 21, 2024

I believe the root cause is the same with this issue

Background

We are observing Calling set_attribute on an ended Span and Calling finish on an ended Span warnings when OpenTelemetry::Common::Utilities.untraced is used, after upgrading opentelemetry-sdk to 1.2.1.

You can reproduce it by adding the following code to a Rails controller:

def index
  OpenTelemetry::Common::Utilities.untraced do
    OpenTelemetry::Instrumentation::Rack::Instrumentation.instance.tracer.in_span('test span') do
      Faraday.get('http://httpbingo.org')
    end
  end
  
  render ...
end

This PR seems to be the cause:

Changes summary

Moved the untraced? check from Trace::Tracer#start_span to Trace::TracerProvider#internal_start_span

@fbogsany
Copy link
Contributor

fbogsany commented Apr 9, 2024

Can you please add a test that demonstrates the change in behaviour? I'm unsure why this change would fix the warning in the example you provided - in both cases, a non-recording API Span would be returned from start_span and .finish would be called on it. That alone wouldn't trigger the warning, since the API implementation of finish is a no-op.

@yoheyk yoheyk force-pushed the fix-untraced-calling-on-an-ended-span branch from bf5b7b2 to 108bde4 Compare April 9, 2024 20:03
@yoheyk
Copy link
Contributor Author

yoheyk commented Apr 9, 2024

Added tests in 108bde4

without the change, you are going to see these failures:

 1) Failure:
OpenTelemetry::SDK::Trace::Tracer::#in_span#test_0002_yields a no-op span within an untraced block [/Users/ykitamura/workspace/opentelemetry-ruby/sdk/test/opentelemetry/sdk/trace/tracer_test.rb:339]:
Expected #<OpenTelemetry::SDK::Trace::Span:0x000000011f619750 @context=#<OpenTelemetry::Trace::SpanContext:0x000000011ee782f8 @trace_id="\x13\xAB\b\xE3\xE4\xB4\x7F\xB3Y\xB4\xDB7\xCD\x11\x81\x96", @span_id="\x13\xD3\xC4=\xF9J\xE2\x92", @trace_flags=#<OpenTelemetry::Trace::TraceFlags:0x000000011e9d1fd8 @flags=1>, @tracestate=#<OpenTelemetry::Trace::Tracestate:0x000000011e87e050 @hash={}>, @remote=false>, @mutex=#<Thread::Mutex:0x000000011f4df8f8>, @name="root", @kind=:internal, @parent_span_id="\x00\x00\x00\x00\x00\x00\x00\x00", @span_limits=#<OpenTelemetry::SDK::Trace::SpanLimits:0x000000011e858b20 @attribute_count_limit=128, @attribute_length_limit=nil, @event_count_limit=128, @link_count_limit=128, @event_attribute_count_limit=128, @event_attribute_length_limit=nil, @link_attribute_count_limit=128>, @span_processors=[], @resource=#<OpenTelemetry::SDK::Resources::Resource:0x000000011f4dfad8 @attributes={}>, @instrumentation_scope=#<struct OpenTelemetry::SDK::InstrumentationScope name="component-tracer", version="1.0.0">, @ended=false, @status=#<OpenTelemetry::Trace::Status:0x000000011e87d4c0 @code=1, @description="">, @total_recorded_events=0, @total_recorded_links=0, @total_recorded_attributes=0, @attributes={}, @events=nil, @links=nil, @monotonic_start_timestamp=586032859903000, @realtime_start_timestamp=1712692764608202000, @start_timestamp=1712692764608202000, @end_timestamp=nil> to be an instance of OpenTelemetry::Trace::Span, not OpenTelemetry::SDK::Trace::Span.

 2) Failure:
OpenTelemetry::SDK::Trace::Tracer::#in_span#test_0003_does not log "Calling finish on an ended Span" warnings [/Users/ykitamura/workspace/opentelemetry-ruby/sdk/test/opentelemetry/sdk/trace/tracer_test.rb:353]:
Expected "W, [2024-04-09T15:59:24.608525 #24314]  WARN -- : Calling finish on an ended Span.\n" to not include "Calling finish on an ended Span".

@@ -329,5 +329,28 @@
_(span.status.code).must_equal(OpenTelemetry::Trace::Status::ERROR)
_(span.status.description).must_equal('Unhandled exception of type: RuntimeError')
end

it 'yields a no-op span within an untraced block' do
tracer_provider.sampler = Samplers::ALWAYS_ON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want untraced to bypass ALWAYS_ON?

Is they what other Language SDKs do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this line is not needed for this PR, it seems I copy-pasted too much. I'm going to remove it.

Do we really want untraced to bypass ALWAYS_ON?

I think this is a separate topic and unrelated to this PR...

@fbogsany
Copy link
Contributor

Added tests in 108bde4

Thank you! This is clearer to me now. The existing code in main calls super early in the SDK Tracer, and the API Tracer returns the current Span from the parent Context (aka the parent span) if it is valid. The issue here is that this span is valid, and is a SDK Span, whereas we expect an API Span (aka a non-recording span). Since it is valid, the in_span method calls span&.finish, which calls the SDK Span#finish method, but the outer block is also going to finish the same span, and that call produces the error because the span is finished twice.

So in this case, we want to return a distinct non-recording Span instance.

@@ -138,7 +138,7 @@ def internal_start_span(name, kind, attributes, links, start_timestamp, parent_c
trace_id ||= @id_generator.generate_trace_id
result = @sampler.should_sample?(trace_id: trace_id, parent_context: parent_context, links: links, name: name, kind: kind, attributes: attributes)
span_id = @id_generator.generate_span_id
if result.recording? && !@stopped
if !OpenTelemetry::Common::Utilities.untraced?(parent_context) && result.recording? && !@stopped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should return a non-recording span without calling @sampler.should_sample if untraced?(parent_context).

Rationale:

  1. We should not create a non-recording span using the tracestate from a sampler whose result we're overriding.
  2. We should re-use the span ID from the parent span (if there is one) to avoid breaking the trace (in case we propagate the trace context with an outbound request and the server makes it own sampling decision).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this? 5f9b4a2

Comment on lines +138 to +142

if OpenTelemetry::Common::Utilities.untraced?(parent_context)
span_id = parent_span_id || @id_generator.generate_span_id
return OpenTelemetry::Trace.non_recording_span(OpenTelemetry::Trace::SpanContext.new(trace_id: trace_id, span_id: span_id))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me.

@fbogsany fbogsany merged commit a1f1629 into open-telemetry:main Apr 23, 2024
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants