Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress botocore downstream instrumentation like urllib3 #563

Merged

Conversation

NathanielRN
Copy link
Contributor

@NathanielRN NathanielRN commented Jul 2, 2021

Description

I noticed that if I instrument both urllib3 and botocore that I will get 2 traces (e.g. S3 and s3.amazon.com) when I should only get 1.

Before this fix:

Screen Shot 2021-07-02 at 2 40 58 PM

After this fix:

Screen Shot 2021-07-02 at 2 40 34 PM

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

The first way I thought to write a test for this would be to create an instance of the urllib3 Instrumentor and prove that only 1 trace gets created, but since this is a common pattern among all the instrumentation libraries I didn't add it.

Let me know if this would be useful though!

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
    - [] Unit tests have been added
    - [ ] Documentation has been updated

@NathanielRN NathanielRN requested a review from a team as a code owner July 2, 2021 21:54
@NathanielRN NathanielRN requested review from codeboten and srikanthccv and removed request for a team July 2, 2021 21:54
@lzchen
Copy link
Contributor

lzchen commented Jul 9, 2021

Does botocore and urllib3 share the same code path when executing a client call?

@NathanielRN
Copy link
Contributor Author

@lzchen Yes as far as I am aware!

A quick dive into the botocore package helped me find that botocore > httpsession.py has the following code:

class URLLib3Session(object):
...
def send(self, request):
        try:
            ...
            # conn is a `urllib3.connectionpool.HTTPConnectionPool` derived class type
            urllib_response = conn.urlopen(
                method=request.method,
                url=request_target,
                body=request.body,
                headers=request.headers,
                retries=Retry(False),
                assert_same_host=False,
                preload_content=False,
                decode_content=False,
                chunked=self._chunked(request.headers),
            )

And in our instrumentation we do the following for the opentelemtry-instrumentation-urllib package:

wrapt.wrap_function_wrapper(
urllib3.connectionpool.HTTPConnectionPool,
"urlopen",
instrumented_urlopen,
)

So the trace is recording twice.

@lzchen
Copy link
Contributor

lzchen commented Jul 9, 2021

I believe requests also has this problem. We reserved this key word for http instrumentation suppression. Maybe we should use this instead, and also do something similar.

@NathanielRN
Copy link
Contributor Author

@lzchen That's a fair ask, I see that the urllib3 instrumentation will check both keys:

def _is_instrumentation_suppressed() -> bool:
return bool(
context.get_value(_SUPPRESS_INSTRUMENTATION_KEY)
or context.get_value(_SUPPRESS_HTTP_INSTRUMENTATION_KEY)
)

Further botocore only requires jmespath, urllib3, python-dateutil so the only one we will instrument is probably the HTTP library. Will make the change!

@lzchen
Copy link
Contributor

lzchen commented Jul 13, 2021

Not a blocker for this pr, but we should probably move the context variable to a common location like here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants