Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flask.copy_current_request_context+gevent+FlaskInstrumentor cause improper span lifecycles #1551

Closed
matthewgrossman opened this issue Dec 27, 2022 · 1 comment · Fixed by #1654
Labels
bug Something isn't working

Comments

@matthewgrossman
Copy link
Contributor

high-level problem

To create a new request-aware greenlet within a flask request, we copy the existing flask-req-ctx and push a new req-ctx onto the req-ctx-stack. When this greenlet+req-ctx ends, it tries to call the patched _teardown_request within flask-instrumentor, which throws errors because the greenlet doesn't "own" the span, and therefore shouldn't be allowed to end it.

Describe your environment

I created a test repo with all of the source code for reproduction: https://github.com/matthewgrossman/otel-flask-repro. The important file is the app.py.

From trying a few different versions, it doesn't seem that the pinned deps matter too much, but here's a requirements.txt on macOS 13.0.1:

gevent==22.10.2
flask==1.1.4
opentelemetry-instrumentation-flask
opentelemetry-sdk
opentelemetry-api

If you'd prefer, I could turn this env into a docker image to make reproing easier for others. They key deps are flask, gevent, and opentelemetry-instrumentation-flask, and I've noticed this problem in every variation of those deps that I've tried.

Steps to reproduce

$ python3.8 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python -m flask run -p 8080

in other shell:

curl localhost:8080/test -H 'x-test-header: val`

In the running shell with the flask service, you should see an error like:

(venv)  ~/src/otel-flask-repro  » python -m flask run -p 8080
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
Failed to detach context
Traceback (most recent call last):
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/__init__.py", line 157, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x10f888040> at 0x10f88f240> was created in a different Context
Failed to detach context
Traceback (most recent call last):
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/__init__.py", line 157, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x10f888040> at 0x10f88f180> was created in a different Context
127.0.0.1 - - [15/Dec/2022 11:17:54] "GET /test HTTP/1.1" 200 -

What is the expected behavior?

I wouldn't expect to see this error message.

What is the actual behavior?

This error message is caused by the greenlet's flask-req-ctx trying to call _teardown_request for the overall parent flask-req-ctx. This causes issues with contextvars and span ownership.

When a new request comes into the flask application, FlaskInstrumentor set the current span+ctxmanager on the flask.request.environ, seen here: flask-instrumentor

When we create a new greenlet, we copy in the current contextvars context, as well as create a new flask-req-context based upon the existing flask-req-context. When this greenlet ends, the teardown gets called and attempts to call activation.__exit__. This causes the error seen in the above reproduction steps.

Just a few lines above, there already is a guard clause meant to stop this exact problem: https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-flask/src/opentelemetry/instrumentation/flask/__init__.py#L434-L435.

However, this only ends up protecting us in situations where the ending-req-ctx has a fresh flask.request.environ, like with a test_request_context (noted in the comments of the guard clause). When creating greenlets based on the existing flask request context, this guard clause doesn't sufficiently realize that this ending-req-ctx should also escape early, and not attempt to end the span

Additional context

In the reproduction source code, I provided a workaround. When you create a new greenlet based on the current request context, first pop off the _ENVIRON_ACTIVATION_KEY to ensure the code does go through the guard clause (workaround source code). Commenting out the existing wrap_fn_in_req_context__broken and commenting-in the wrap_fn_in_req_context__workaround removes the error.

However, I'm hoping there's some way to fix such that my code shouldn't need to pop a specific key from an environ before spawning greenlets. A couple options:

  • Somehow indicate in the environ which flask-req-ctx-id ended up creating the span. When calling _teardown_request, escape early if the current flask-req-ctx doesn't match the original req-ctx that created the span in the first place
  • ?? open to suggestions!
@rafa-munoz
Copy link
Contributor

rafa-munoz commented Jun 13, 2023

Even though I have 0.38b0 version, I see a similar error:

RuntimeError: Working outside of request context.

This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.

  File "flask/ctx.py", line 399, in pop
    self.app.do_teardown_request(exc)
  File "flask/app.py", line 2373, in do_teardown_request
    for name in chain(request.blueprints, (None,)):
  File "werkzeug/local.py", line 311, in __get__
    obj = instance._get_current_object()
  File "werkzeug/local.py", line 508, in _get_current_object
    raise RuntimeError(unbound_message) from None
LookupError: <ContextVar name='flask.request_ctx' at 0x7fd8bf272d90>
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/trace/__init__.py", line 573, in use_span
    yield span
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/instrumentation/wsgi/__init__.py", line 553, in __call__
    iterable = self.wsgi(environ, start_response)
  File "my_lib_instrumentation/services/http/wsgi.py", line 92, in __call__
    res = self.app(environ, start_instrumented_response)
  File "flask/app.py", line 2544, in wsgi_app
    ctx.pop(error)
  File "flask/ctx.py", line 405, in pop
    ctx = _cv_request.get()
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x7fd8b062da30> at 0x7fd86f332f80> was created in a different Context
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/context/__init__.py", line 163, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore

It happens during the teardown of the request, and it crashes it with a 500 status code. At least it would be nicer not to crash and log the error, since instrumentation is something that should not prevent the final user from getting the request data. This is not consistent, most of the requests return 200.

Like the original post, I am using gevent. opentelemetry-instrumentation-wsgi as well, but I am not using opentelemetry-instrumentation-flask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants