Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: OpenTelemetry trace/spanID integration for Python handlers #889

Merged
merged 17 commits into from
May 22, 2024

Conversation

gkevinzheng
Copy link
Contributor

@gkevinzheng gkevinzheng commented Apr 30, 2024

Changes made:

  • Added opentelemetry-api as a dependency for the library.
  • Changed trace/http gathering function from get_request_data to get_request_and_trace_data
  • get_request_and_trace_data extracts and returns OpenTelemetry span context information, if a valid span exists.
  • Added unit tests for get_request_and_trace_data, as well as for both CloudLoggingHandler and StructuredLogHandler
  • Added a system test for Open Telemetry integration using the SDK
  • Added opentelemetry-sdk as a system test external dependency.

@gkevinzheng gkevinzheng requested review from a team as code owners April 30, 2024 19:57
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Apr 30, 2024
@product-auto-label product-auto-label bot added the api: logging Issues related to the googleapis/python-logging API. label Apr 30, 2024
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Apr 30, 2024
@gkevinzheng gkevinzheng marked this pull request as draft April 30, 2024 20:47
@gkevinzheng gkevinzheng marked this pull request as ready for review May 1, 2024 20:04
cloud_logger.warning(LOG_MESSAGE)

entries = _list_entries(logger)
self.assertEqual(len(entries), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider instrumentation source entry here? http://go/cdpe-ops-logentry-changes-source

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not the instrumentation source entry gets added depends on whether or not the global variable google.cloud.logging_v2._instrumentation_emitted is true (see

_instrumentation_emitted = False
). When I was running the system tests locally I found that if I ran the entire test suite this case would pass, but if I ran just this test case the instrumentation source entry would be there and it would fail.

) = get_request_data()

# otel_trace_id existing means the other return values are non-null
if otel_trace_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no http request data, do we reuse the http_request from last request for otel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no http request data I have it set to null value.

Copy link
Contributor

@aabmass aabmass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look too close at tests but LGTM from an OTel perspective

setup.py Outdated
@@ -44,6 +44,7 @@
"google-cloud-audit-log >= 0.1.0, < 1.0.0dev",
"google-cloud-core >= 2.0.0, <3.0.0dev",
"grpc-google-iam-v1 >=0.12.4, <1.0.0dev",
"opentelemetry-api >= 1.22.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider using an older version here. The context APIs you are I believe were available in even the first 1.x release

def get_request_and_trace_data():
"""Helper to get http_request and trace data from supported web
frameworks (currently supported: Flask and Django), as well as OpenTelemetry. Attempts
to parse trace/spanID from OpenTelemetry first, before going to Traceparent then XCTC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit this is a bit misleading

Suggested change
to parse trace/spanID from OpenTelemetry first, before going to Traceparent then XCTC.
to retrieve trace/spanID from OpenTelemetry first, before going to Traceparent then XCTC.

@@ -191,9 +193,31 @@ def _parse_xcloud_trace(header):
return trace_id, span_id, trace_sampled


def _parse_current_open_telemetry_span():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit i wouldn't use "parse" here

Comment on lines 686 to 687
processor = BatchSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit these lines if you don't actually want any console output

processor = BatchSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)

tracer = trace.get_tracer("test_system", tracer_provider=provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit this is a bit simpler

Suggested change
tracer = trace.get_tracer("test_system", tracer_provider=provider)
tracer = provider.get_tracer("test_system")

Comment on lines 48 to 49
with mock.patch("opentelemetry.trace.get_current_span", return_value=span) as m:
yield m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need to do any mocking by just setting the span in the real context implementation, see this example https://opentelemetry.io/docs/languages/python/cookbook/#manually-setting-span-context:~:text=%23%20Or%20you%20can,(token)

Suggested change
with mock.patch("opentelemetry.trace.get_current_span", return_value=span) as m:
yield m
ctx = trace.set_span_in_context(span)
token = context.attach(ctx)
try:
yield
finally:
context.detach(token)

I am guessing that's why you are using import opentelemetry.trace instead of from opentelemetry import trace

@@ -211,3 +235,37 @@ def get_request_data():
return http_request, trace_id, span_id, trace_sampled

return None, None, None, False


def get_request_and_trace_data():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a new function? The docstrings of get_request_data says "Helper to get http_request and trace data from supported web frameworks". It seems to me like this logic should just be merged into the existing one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned that by changing the function name that it could potentially be a breaking change. It should be OK to change the function name because it's in a private module, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the module is private, so it should be safe

Does the function need to be renamed though? Why not add new functionality to the existing function?

@@ -662,6 +665,38 @@ def test_log_root_handler(self):
self.assertEqual(len(entries), 1)
self.assertEqual(entries[0].payload, expected_payload)

def test_log_handler_otel_integration(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to also add a system test without the otel sdk imported?

I'm not exactly sure what that should look like, but I want to make sure we have coverage of the situation where otel isn't used at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create another file for the newly added system tests, so that the existing TCs aren't importing otel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you can add an extra file. Or just import within the test functions. Or use a test class. Whatever seems cleanest

Does Otel behave any differently based on having the module installed? Or is import state the important part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually managed to resolve this by creating a decorator that deletes the otel SDK imports after the test case gets run. I don't think it's perfect but it feels good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great! Do you have any test cases that test _retrieve_current_open_telemetry_span without otel included?

Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving the logic in get_request_and_trace_data into get_request_data. I dont think a new function is required

Other than that, LGTM

@gkevinzheng gkevinzheng added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 21, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 21, 2024
@gkevinzheng gkevinzheng enabled auto-merge (squash) May 22, 2024 15:27
@gkevinzheng gkevinzheng merged commit 78168a3 into main May 22, 2024
17 checks passed
@gkevinzheng gkevinzheng deleted the otel-span-support-python branch May 22, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: logging Issues related to the googleapis/python-logging API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants