Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OTel for performance instrumentation (PoC) #2272

Merged
merged 35 commits into from Aug 28, 2023

Conversation

sentrivana
Copy link
Contributor

@sentrivana sentrivana commented Jul 26, 2023

This is a PR for powering our performance monitoring with OTel instead of our custom Sentry instrumentation. Note that this is a proof of concept which we might end up utilizing or not -- depending on how successful this attempt is at addressing the various issues we've identified with regards to our compatibility with OTel.

See the associated issue for more details on the motivation behind this.

On OTel's opentelemetry-instrument

As the goal was to make this work automatically without requiring the user to set anything up, the autoinstrumentation builds on what the official opentelemetry-instrument tool does, but without having to actually use it to run a program (opentelemetry-instrument python app.py).

What opentelemetry-instrument does to make autoinstrumentation work is that it uses the built-in site mechanism to run the instrumentation code before anything else is run. In simplified terms, site is a Python module loaded at interpreter start that looks for a file called sitecustomize.py in specific locations and if it finds it, it loads and runs it before any user code is run. sitecustomize.py is usually meant to be put in site-packages. opentelemetry-instrument works around this by first doing some path manipulation before execl-ing to force site to look at its sitecustomize.py, which does the autoinstrumentation. Being able to ensure that the autoinstrumentation runs before any user code is handy since by the time actual user code is executed, everything necessary has been monkeypatched in the background.

We can't use the sitecustomize.py hack unless we also want to execl the user program, which seems dangerous. This means we need to require folks to sentry_sdk.init() before importing any class that's supposed to get instrumented. This PR tries to work around that by looking at sys.modules after the autoinstrumentation has finished patching, searching for unpatched classes, and patching them.

Differences to opentelemetry-instrument

We're doing the same things as OTel does for autoinstrumenting, with some differences.

The autoinstrumentation itself is taken from here. We do the same except calling load_configurators(), which as far as I can tell does this in the background. OTel describes configurators as "Configurators are used to configure SDKs (i.e. TracerProvider, MeterProvider, Processors...) to reduce the amount of manual configuration required." The autoinstrumentation seems to work fine without this, and we do some minimal configuration ourselves.

We also don't do any PYTHONPATH manipulations since these seem to only have to do with making the sitecustomize trick describe above work and are then reverted right away.

Closes #2251

@sentrivana sentrivana force-pushed the ivana/performance-powered-by-otel branch from ce9dbf4 to 4c6e310 Compare August 4, 2023 10:50
@sentrivana sentrivana marked this pull request as ready for review August 7, 2023 10:57
inheriting from it). In those cases it's still necessary to sentry_sdk.init()
before importing anything that's supposed to be instrumented.
"""
for module_name, module in sys.modules.copy().items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the outermost loop here, why do we need to go through sys.modules and then original_classes in each of them? Can you explain with a simple flask example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So imagine a user has something like this:

# app.py

from flask import Flask
import sentry_sdk

sentry_sdk.init(...)  # potel enabled
app = Flask()

The OTel autoinstrumentation runs during sentry_sdk.init(), patching the Flask class with its own _InstrumentedFlask, but since the user imported Flask before, in app.py it is still the old unpatched Flask, and so is app.

The loop goes through all modules loaded so far via sys.modules and does the following for each of the original classes:

  • Tries reimporting them to see if what we import is instrumented or not. This is used to check if the autoinstrumentation was actually successful in the first place, because if not, we don't want to patch any leftover unpatched classes. (This check needs to be moved out of this loop since this is something that needs to only be checked once for a class, I will do that.)
  • After having verified that the patching was successful, it then goes through the vars of each sys.module and checks whether the original type is in scope, and if so, replaces it with the patched type.

So in the above example, it would find the app.py module in sys.modules, it would check it for occurrences of the original Flask type, and replace them with _InstrumentedFlask.

sentry_sdk/integrations/opentelemetry/integration.py Outdated Show resolved Hide resolved
@sl0thentr0py
Copy link
Member

sl0thentr0py commented Aug 17, 2023

ok just thinking out loud because this way we'll need to maintain the dict INSTRUMENTED_CLASSES and also it's kinda hard to reason about what patching is happening when.

The BaseInstrumentor class in otel (that all their instrumentors derive from) has the following:

  • instrument
  • uninstrument
  • _is_instrumented_by_opentelemetry attribute

can we somehow make the logic a bit cleaner (and also possibly more general) by inspecting / triggering those methods ?

@sentrivana
Copy link
Contributor Author

ok just thinking out loud because this way we'll need to maintain the dict INSTRUMENTED_CLASSES and also it's kinda hard to reason about what patching is happening when.

The BaseInstrumentor class in otel (that all their instrumentors derive from) has the following:

* `instrument`

* `uninstrument`

* `_is_instrumented_by_opentelemetry` attribute

can we somehow make the logic a bit cleaner (and also possibly more general) by inspecting / triggering those methods ?

This is a good point, I tried using those directly in the beginning but then dropped that approach in favor of this more automagic approach. This was at a time when I wasn't doing any post-patching, but with that now added on top, there are essentially two ways to patch the same thing which is not great.

So my plan now:

  • see if I can make the whole thing (auto patching + the post patching for things imported earlier) use the methods/attrs directly and make it more consistent
  • if not, get rid of the post-patching entirely (meaning people will have to init sentry_sdk before importing anything that should be instrumented), but still see if we can do the initial patching by using the methods/attrs directly

No need to reimport the _InstrumentedClass again to replace
the old class. We already have a ref to the fully set up
_InstrumentedClass from before.
@sentrivana sentrivana enabled auto-merge (squash) August 28, 2023 09:05
@sentrivana sentrivana merged commit 3d2517d into master Aug 28, 2023
245 of 246 checks passed
@sentrivana sentrivana deleted the ivana/performance-powered-by-otel branch August 28, 2023 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[python] Performance powered by OTel
2 participants