Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add urllib3 instrumentation #299

Merged
merged 10 commits into from
Mar 31, 2021

Conversation

mariojonke
Copy link
Contributor

@mariojonke mariojonke commented Jan 27, 2021

Description

Adds instrumentation support for the urllib3 library.

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Additional unit tests were added.

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

* use HTTPConnectionPool.urlopen as instrumentation point as it
  allows to instrument requests made with urllib3's PoolManager,
  HTTPConnectionPool and HTTPSConnectionPool
* use common suppression context variable 'suppress-httpinstrumentation'
  to avoid double tracing with requests instrumentation
* add missing install command for urllib instrumentation when linting
  with tox
@mariojonke mariojonke requested a review from a team as a code owner January 27, 2021 06:59
@mariojonke mariojonke requested review from codeboten and hectorhdzg and removed request for a team January 27, 2021 06:59
Base automatically changed from master to main January 29, 2021 19:28
@pboers1988
Copy link

I can confirm this works. Would be awesome if it could be added to the next release.


Usage
-----
.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are trying to take the SDK implementations out of the Usage strings like here and only show the instrumentation. For the below code, consider adding an examples folder and putting it there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariojonke Updates on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the boilerplate TracerProvider setup with the last commit. So it is just the plain usage of the instrumentation now.

@@ -56,7 +56,7 @@

# A key to a context variable to avoid creating duplicate spans when instrumenting
# both, Session.request and Session.send, since Session.request calls into Session.send
_SUPPRESS_REQUESTS_INSTRUMENTATION_KEY = "suppress_requests_instrumentation"
_SUPPRESS_HTTP_INSTRUMENTATION_KEY = "suppress_http_instrumentation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious as to why we are merging these into one context variable now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests uses urllib3 internally. If the instrumentations for both libraries are installed two spans equal spans would be created. So the common suppression key is to avoid this double tracing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applies to urllib as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urllib is actually separate but all three libraries internally use the http.client package. So in case there is a http.client instrumentation in the future i thought it would make sense to have a common suppression key for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean if today, someone does not want to track requests but wants to track urllib, they wouldn't be able to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the supression variables were only meant to avoid double tracing and not for users to supress spans for one specific library. Like for the requests library where a call to Session.request would create another span because it calls into Session.send which is also instrumented.

To generally avoid tracing for example the requests library one would just not install the instrumentation or simply call RequestsInsrumentor().uninstrument(). I doubt anyone would use the Context for that since the supression variable is then required to be in every context instance to work reliably.

To supress spans for a single call/request using the Context probably makes sense but for the requests, urllib3 example (not using urllib here since it does not share a code path with requests) it wouldn't make much of a difference since in the end both libraries are producing the same kind of span (with even the same attributes).

If we really want to support this i guess we would need 3 supression variables: one to just supress the library (e.g. requests), one to supress the traced aspect (e.g. http) and one to supress tracing alltogether. One could then just set the supression variables for the libraries they do not want to track.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the current changes since the supression variables were meant to only be used internally.

@@ -154,7 +154,7 @@ def _instrumented_open_call(
inject(type(headers).__setitem__, headers)

token = context.attach(
context.set_value(_SUPPRESS_URLLIB_INSTRUMENTATION_KEY, True)
context.set_value(_SUPPRESS_HTTP_INSTRUMENTATION_KEY, True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean if urllib instrumentation is used then http.client instrumentation (if present) will be suppressed? I don't think any instrumentation should dictate if another instrumentation should be silenced or not. What if a user actually wants both? May be there are issues with http.client or with urllib and users would want to detect exactly which or may be http.client has adds lower level information like DNS, connection, SSL-handshake, etc related info. Should urllib really automatically suppress it?

I think it should be left to the user whether to any specific instrumentation should be enabled or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.., i guess it depends on what one actually expects to see, the span produced by the instrumentation or the outgoing HTTP call.
The span produced by an instrumentations like e.g. for HTTP clients is usually the last in process span before the trace is continued in some other process (on probably some other host). Hence, the inject call to propagate the context to the next process. Without supressing the lower level HTTP instrumentations, each would separatly inject their span into the HTTP headers. From a functionality point of view this is probably not a problem since traceparent and tracecontext are going to ge overwritten (as far is a saw) but will introduce additional performance overhead. Not sure if a hierarchy of subsequent spans of kind CLIENT would cause problems though.

Looking at it from a pure span point of view it raises the question i guess why some instrumentations suppress library internal calls in the first place, e.g. the call from Session.request to Session.send in the requests library (and i think urllib instrumentor is doing something similar). The produced spans are more or less the same but would still provide additional info through timing and span hierarchy. For a user of the library this is probably of less to no interest but i guess the same can be said about spans produced by the libraries which are internally used by the instrumented library. E.g. requests and urllib3 produce the same span with just the InstrumentationInfo being different.

If a user is only interested in the span for the outgoing HTTP (without the spans produced by the internal libraries) i think it is not always possible to achieve this by deciding if an instrumentation is enabled or not. E.g. in an application which uses a mix of requests and urllib3 for communication. With an adapter requests can be configured to use a protocol other than HTTP for certain sessions. If one now wants to track the HTTP calls made with requests, urllib3 plus the calls with requests using other protocol, HTTP calls made with requests would always generate two outgoing HTTP spans (requests + urllib3) without the possibility to deactivate this behavior.

Imo from a users point of view the outgoing HTTP call might of more interest than the internal workings of the repquests, urllib, ... libraries but i can change the PR to not suppress other instrumentations if you want .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal library spans are a completely difference use case. If they are not silenced then every span exported/generated will generate more spans which will in-turn generate more spans. So even generating a single span will start a loop of spans being generated. So we have to silence the internal library spans.

However, I'm convinced by the argument that not silencing will result in a parent > child pair of spans that will both be of kind CLIENT. I still don't think the solution should be for one instrumentation to silence other instrumentations but I don't see a simpler solution to solve this without adding some advanced features.

So it's fine if urllib3 or requests silences http instrumentation in this PR. I'll create an issue to discuss this further and possibly improve it in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK we can probably solve in a simple enough way.

What if urllib3 and requests never injected tracing context headers and always set span kind to INTERNAL and then added http instrumentation as a dependency.

So whenever either urllib3 or requests is installed, http.client instrumentation is installed as well. If requests instrumentation is smart enough to never set kind to CLIENT and never inject tracing context when HTTP transport is used, this should cleanly solve the entire problem.

In this case, we'd replace the current suppression calls with a comment that says if we ever build a http.client instrumentation, we should update urllib3 and requests to not generate client spans.

Only issue I see with this is that these libs will hard depend on our http.client instrumentation so it'll be a bit weird (install other package, set env var to disable our http.client instrument) for users to swap out the http.client instrumentation with their own but I think this would be a very rare edge-case with a very viable solution.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue to discuss this further: #369

I'm OK with this PR to suppress http.client instrumentation in the meanwhile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owais
You have a changes requested for this PR.

@mariojonke mariojonke requested a review from owais March 26, 2021 07:53
@mariojonke
Copy link
Contributor Author

I'll resolve the conflicts after open-telemetry/opentelemetry-python#1690 is merged as otherwise there is the potential of breaking the build due to the instrumentation still using the old API.

@mariojonke
Copy link
Contributor Author

Resolved conflicts and updated dependencies to release 1.0.0

@lzchen lzchen merged commit 1c5f0b5 into open-telemetry:main Mar 31, 2021
@mariojonke mariojonke deleted the urllib3-instrumentation branch April 13, 2021 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants