`flask.copy_current_request_context`+`gevent`+`FlaskInstrumentor` cause improper span lifecycles #1551

matthewgrossman · 2022-12-27T17:38:20Z

high-level problem

To create a new request-aware greenlet within a flask request, we copy the existing flask-req-ctx and push a new req-ctx onto the req-ctx-stack. When this greenlet+req-ctx ends, it tries to call the patched _teardown_request within flask-instrumentor, which throws errors because the greenlet doesn't "own" the span, and therefore shouldn't be allowed to end it.

Describe your environment

I created a test repo with all of the source code for reproduction: https://github.com/matthewgrossman/otel-flask-repro. The important file is the app.py.

From trying a few different versions, it doesn't seem that the pinned deps matter too much, but here's a requirements.txt on macOS 13.0.1:

gevent==22.10.2
flask==1.1.4
opentelemetry-instrumentation-flask
opentelemetry-sdk
opentelemetry-api

If you'd prefer, I could turn this env into a docker image to make reproing easier for others. They key deps are flask, gevent, and opentelemetry-instrumentation-flask, and I've noticed this problem in every variation of those deps that I've tried.

Steps to reproduce

$ python3.8 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python -m flask run -p 8080

in other shell:

curl localhost:8080/test -H 'x-test-header: val`

In the running shell with the flask service, you should see an error like:

(venv)  ~/src/otel-flask-repro  » python -m flask run -p 8080
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
Failed to detach context
Traceback (most recent call last):
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/__init__.py", line 157, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x10f888040> at 0x10f88f240> was created in a different Context
Failed to detach context
Traceback (most recent call last):
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/__init__.py", line 157, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/Users/mgrossman/src/otel-flask-repro/venv/lib/python3.8/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x10f888040> at 0x10f88f180> was created in a different Context
127.0.0.1 - - [15/Dec/2022 11:17:54] "GET /test HTTP/1.1" 200 -

What is the expected behavior?

I wouldn't expect to see this error message.

What is the actual behavior?

This error message is caused by the greenlet's flask-req-ctx trying to call _teardown_request for the overall parent flask-req-ctx. This causes issues with contextvars and span ownership.

When a new request comes into the flask application, FlaskInstrumentor set the current span+ctxmanager on the flask.request.environ, seen here: flask-instrumentor

When we create a new greenlet, we copy in the current contextvars context, as well as create a new flask-req-context based upon the existing flask-req-context. When this greenlet ends, the teardown gets called and attempts to call activation.__exit__. This causes the error seen in the above reproduction steps.

Just a few lines above, there already is a guard clause meant to stop this exact problem: https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation/opentelemetry-instrumentation-flask/src/opentelemetry/instrumentation/flask/__init__.py#L434-L435.

However, this only ends up protecting us in situations where the ending-req-ctx has a fresh flask.request.environ, like with a test_request_context (noted in the comments of the guard clause). When creating greenlets based on the existing flask request context, this guard clause doesn't sufficiently realize that this ending-req-ctx should also escape early, and not attempt to end the span

Additional context

In the reproduction source code, I provided a workaround. When you create a new greenlet based on the current request context, first pop off the _ENVIRON_ACTIVATION_KEY to ensure the code does go through the guard clause (workaround source code). Commenting out the existing wrap_fn_in_req_context__broken and commenting-in the wrap_fn_in_req_context__workaround removes the error.

However, I'm hoping there's some way to fix such that my code shouldn't need to pop a specific key from an environ before spawning greenlets. A couple options:

Somehow indicate in the environ which flask-req-ctx-id ended up creating the span. When calling _teardown_request, escape early if the current flask-req-ctx doesn't match the original req-ctx that created the span in the first place
?? open to suggestions!

The text was updated successfully, but these errors were encountered:

rafa-munoz · 2023-06-13T13:05:05Z

Even though I have 0.38b0 version, I see a similar error:

RuntimeError: Working outside of request context.

This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.

  File "flask/ctx.py", line 399, in pop
    self.app.do_teardown_request(exc)
  File "flask/app.py", line 2373, in do_teardown_request
    for name in chain(request.blueprints, (None,)):
  File "werkzeug/local.py", line 311, in __get__
    obj = instance._get_current_object()
  File "werkzeug/local.py", line 508, in _get_current_object
    raise RuntimeError(unbound_message) from None
LookupError: <ContextVar name='flask.request_ctx' at 0x7fd8bf272d90>
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/trace/__init__.py", line 573, in use_span
    yield span
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/instrumentation/wsgi/__init__.py", line 553, in __call__
    iterable = self.wsgi(environ, start_response)
  File "my_lib_instrumentation/services/http/wsgi.py", line 92, in __call__
    res = self.app(environ, start_instrumented_response)
  File "flask/app.py", line 2544, in wsgi_app
    ctx.pop(error)
  File "flask/ctx.py", line 405, in pop
    ctx = _cv_request.get()
ValueError: <Token var=<ContextVar name='current_context' default={} at 0x7fd8b062da30> at 0x7fd86f332f80> was created in a different Context
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/context/__init__.py", line 163, in detach
    _RUNTIME_CONTEXT.detach(token)  # type: ignore
  File "/opt/project-venv/lib/python3.11/site-packages/opentelemetry/context/contextvars_context.py", line 50, in detach
    self._current_context.reset(token)  # type: ignore

It happens during the teardown of the request, and it crashes it with a 500 status code. At least it would be nicer not to crash and log the error, since instrumentation is something that should not prevent the final user from getting the request data. This is not consistent, most of the requests return 200.

Like the original post, I am using gevent. opentelemetry-instrumentation-wsgi as well, but I am not using opentelemetry-instrumentation-flask.

matthewgrossman added the bug Something isn't working label Dec 27, 2022

matthewgrossman mentioned this issue Feb 11, 2023

opentelemetry-instrumentation-flask error during _teardown_request if request spawns child threads and copy_current_request_context is used #1653

Closed

srikanthccv mentioned this issue Feb 13, 2023

Fix issue with Flask instrumentation when a request spawn children threads and copies the request context #1654

Merged

6 tasks

srikanthccv closed this as completed in #1654 Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`flask.copy_current_request_context`+`gevent`+`FlaskInstrumentor` cause improper span lifecycles #1551

`flask.copy_current_request_context`+`gevent`+`FlaskInstrumentor` cause improper span lifecycles #1551

matthewgrossman commented Dec 27, 2022

rafa-munoz commented Jun 13, 2023 •

edited

Loading

flask.copy_current_request_context+gevent+FlaskInstrumentor cause improper span lifecycles #1551

flask.copy_current_request_context+gevent+FlaskInstrumentor cause improper span lifecycles #1551

Comments

matthewgrossman commented Dec 27, 2022

high-level problem

rafa-munoz commented Jun 13, 2023 • edited Loading

`flask.copy_current_request_context`+`gevent`+`FlaskInstrumentor` cause improper span lifecycles #1551

`flask.copy_current_request_context`+`gevent`+`FlaskInstrumentor` cause improper span lifecycles #1551

rafa-munoz commented Jun 13, 2023 •

edited

Loading