Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/urllib instrumentation #222

Merged

Conversation

moaddib666
Copy link
Contributor

@moaddib666 moaddib666 commented Nov 27, 2020

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • tox -e test-instrumentation-urllib

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@moaddib666 moaddib666 requested a review from a team as a code owner November 27, 2020 08:00
@moaddib666 moaddib666 requested review from toumorokoshi and hectorhdzg and removed request for a team November 27, 2020 08:00
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 27, 2020

CLA Signed

The committers are authorized under a signed CLA.

@moaddib666 moaddib666 force-pushed the feature/urllib-instrumentation branch 10 times, most recently from 2cbd7ba to 6f20f48 Compare November 27, 2020 20:20
Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! A couple questions around coding decisions, but nothing that would change functionality.

opener_open = OpenerDirector.open

@functools.wraps(opener_open)
def instrumented_request(opener, fullurl, data=None, timeout=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to allow *args / *kwargs here, since new methods may be added that need to be passed through to open.

Does that make sense here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, as AFAIK open does not support specific arguments open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT). I'd preferred left it as is. We can always add *args, **kwargs by request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with that approach is that you're fixing an issue for people once it breaks them. I'm a bit wary of inconveniencing users knowingly.

That all said, this will probably only change over major version of Python, so the likelyhood is lower.

if not span_name or not isinstance(span_name, str):
span_name = get_default_span_name(method)

recorder = URLLibInstrumentor().metric_recorder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a bit weird, to assume that a class has the appropriate field instantiated without actually being a method in that class.

Any reason not to move this method into URLLibInstrumentor, or pass a reference to the metric_recorder object that would be used the by the instrumented method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry by I didn't catch the idea, metric_recorder is initialized inside main class in _instrument method. Or you are talking about a specific method like
get_metric_recorder(self): ?

Copy link
Member

@toumorokoshi toumorokoshi Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the fact that you're instantiating a ne wURLLibInstrumentor class, just to retrieve metric_recorder. This requires a reader to know:

  1. URLLibInstrumentor is a singleton, which requires reading more code about instrumentors
  2. understanding that this is safe because _instrument will only be called by URLLibInstrumentor, which may not be clear to someone who modifies this code later.

It would be much clearer if you put this code in as a method into URLLibInstrumentor because:

  1. you wouldn't have to instantiate a new URLLibInstrumentor, you could just reference it via self._metric_recorder
  2. It's much clearer to a reader that this method is intended to only be used with URLLibInstrumentor, because it's in the class.

span_name = ""
if name_callback is not None:
span_name = name_callback()
if not span_name or not isinstance(span_name, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ok as a falsey check? I suppose an empty string span name being an actual desired span name is unlikely, but it might be better to just clarify the expected type here.


token = context.attach(
context.set_value(
_SUPPRESS_URLLIB_INSTRUMENTATION_KEY, True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify why this is needed? It seems to me that, as this is the request being sent, it would be impossible really to have subsequent logic also fire another span, or do something that would create such spans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, but here we only wrap this request or I've missing something ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we do this for requests because some of the request calls have the same code path (but we instrument all of them) so we add this context variable to avoid duplicates. I'm not sure if urllib has the same issue though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a situation where this was needed in the botocore instrumentation to avoid infinite recursion. You can use botocore to send traces to our X-Ray backend using botocore.client('x-ray').put_trace_segments(). We had created our own exporter for the traces to not use the OTel Collector. So we would have:

  1. Send trace using our exporter and botocore.put_trace_segments()
  2. BotocoreInstrumentor intercepts call to botocore and generates a new trace A.
  3. Send trace A using our exporter and botocore.put_trace_segments() to X-Ray.
  4. BotocoreInstrumentor intercepts call to botocore and generates a new trace B.
  5. Send trace B using our exporter and botocore.put_trace_segments() to X-Ray.

etc.

I imagine if someone exports using urllib they would encounter a similar problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. But isn't the solution in that scenario to have the exporter and instrumentation honor the "suppress_instrumentation" value?

@NathanielRN in the scenario you stated, the put_trace_segements in 1 should not have emitted a trace, avoiding this whole loop. If the BotocoreInstrumentor checks for the "suppress_instrumentation" context value, it should have emitted nothing.

@moaddib666
Copy link
Contributor Author

Could we merge this code to the upstream without changes as no major issues were found.
As for now we are in the process of integration of our services with OTEL, and we must use dirty hacks to install instrumentation to our projects like:
-e git+https://github.com/moaddib666/opentelemetry-python-contrib.git@6f20f4845f9b82ade987bddc7db0df018d45f88f#egg=opentelemetry_instrumentation_urllib&subdirectory=instrumentation/opentelemetry-instrumentation-urllib ? If, so I will refactor discussed moments and add new PR during this week.

@lzchen
Copy link
Contributor

lzchen commented Dec 1, 2020

@moaddib666
I would prefer to have @toumorokoshi 's comments addressed in this PR before merging anything.

@moaddib666
Copy link
Contributor Author

@moaddib666
I would prefer to have @toumorokoshi 's comments addressed in this PR before merging anything.

Thank you, sure, no rush as we already have workaround to deploy this instrumentation.

Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have a couple changes I'd personally like to see (the URLLibInstrumentor() call in _instrumentor really bugs me), but not strong enough to request changes.

Thanks!


token = context.attach(
context.set_value(
_SUPPRESS_URLLIB_INSTRUMENTATION_KEY, True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. But isn't the solution in that scenario to have the exporter and instrumentation honor the "suppress_instrumentation" value?

@NathanielRN in the scenario you stated, the put_trace_segements in 1 should not have emitted a trace, avoiding this whole loop. If the BotocoreInstrumentor checks for the "suppress_instrumentation" context value, it should have emitted nothing.

@lzchen
Copy link
Contributor

lzchen commented Dec 8, 2020

@moaddib666
open-telemetry/opentelemetry-python#1438 changed metrics to instruments as part of the property in metrics. Can you rebase?

@moaddib666
Copy link
Contributor Author

@moaddib666
open-telemetry/opentelemetry-python#1438 changed metrics to instruments as part of the property in metrics. Can you rebase?

Sure, sorry for delay, I have limited internet connection right now, will update the PR early next week.

moaddib666 and others added 9 commits December 14, 2020 20:38
Co-authored-by: Leighton Chen <lechen@microsoft.com>
…elemetry/instrumentation/urllib/__init__.py

Co-authored-by: Leighton Chen <lechen@microsoft.com>
Co-authored-by: Leighton Chen <lechen@microsoft.com>
…elemetry/instrumentation/urllib/__init__.py

Co-authored-by: Leighton Chen <lechen@microsoft.com>
…elemetry/instrumentation/urllib/version.py

Co-authored-by: Leighton Chen <lechen@microsoft.com>
Co-authored-by: Leighton Chen <lechen@microsoft.com>
…elemetry/instrumentation/urllib/__init__.py

Co-authored-by: Leighton Chen <lechen@microsoft.com>
@moaddib666
Copy link
Contributor Author

Hi @lzchen, repo has been updated all tests were successfully passed.

@lzchen
Copy link
Contributor

lzchen commented Dec 14, 2020

@moaddib666
Hey looks good.

Just one more thing, we've started using a consolidated top-level CHANGELOG so can you add your entry to that instead of creating a new one?

@lzchen lzchen closed this Dec 15, 2020
@lzchen lzchen reopened this Dec 15, 2020
@lzchen lzchen merged commit 187987d into open-telemetry:master Dec 15, 2020
@moaddib666
Copy link
Contributor Author

@lzchen I just have email that PR was merged, thanks for your help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants