Python "getting started" demo does not produce output as shown on instructions #3610

ekquasar · 2023-11-27T22:22:14Z

What went wrong?

In the Python otel getting started docs, under the run-the-instrumented-app section, there is example output that user is to expect from the application. Reproduced from the docs:

{
    "name": "/rolldice",
    "context": {
        "trace_id": "0xdb1fc322141e64eb84f5bd8a8b1c6d1f",
        "span_id": "0x5c2b0f851030d17d",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2023-10-10T08:14:32.630332Z",
    "end_time": "2023-10-10T08:14:32.631523Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "GET",
        "http.server_name": "127.0.0.1",
        "http.scheme": "http",
        "net.host.port": 8080,
        "http.target": "/rolldice?rolls=12",
        "net.peer.ip": "127.0.0.1",
        "http.user_agent": "curl/8.1.2",
        "net.peer.port": 58419,
        "http.flavor": "1.1",
        "http.route": "/rolldice",
        "http.status_code": 200,
...

Notice the richness of the information included: HTTP method, port, peer IP, route, status, etc. are included. However, when running the demo as-is, none of this information is included in the server logs. I've attached the full logs to this ticket (rolldice.log below) and reproduced some of the output here for ease of viewing:

{
    "body": "\u001b[31m\u001b[1mWARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.\u001b[0m\n * Running on http://127.0.0.1:8080",
    "severity_number": "<SeverityNumber.INFO: 9>",
    "severity_text": "INFO",
    "attributes": {
        "otelSpanID": "0",
        "otelTraceID": "0",
        "otelTraceSampled": false,
        "otelServiceName": "dice-server"
    },
    "dropped_attributes": 0,
    "timestamp": "2023-11-27T22:09:01.452634Z",
    "trace_id": "0x00000000000000000000000000000000",
    "span_id": "0x0000000000000000",
    "trace_flags": 0,
    "resource": "BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.21.0', 'service.name': 'dice-server', 'telemetry.auto.version': '0.42b0'}, maxlen=None)"
}
{
    "body": "\u001b[33mPress CTRL+C to quit\u001b[0m",
    "severity_number": "<SeverityNumber.INFO: 9>",
    "severity_text": "INFO",
    "attributes": {
        "otelSpanID": "0",
        "otelTraceID": "0",
        "otelTraceSampled": false,
        "otelServiceName": "dice-server"
    },
    "dropped_attributes": 0,
    "timestamp": "2023-11-27T22:09:01.453917Z",
    "trace_id": "0x00000000000000000000000000000000",
    "span_id": "0x0000000000000000",
    "trace_flags": 0,
    "resource": "BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.21.0', 'service.name': 'dice-server', 'telemetry.auto.version': '0.42b0'}, maxlen=None)"
}
{
    "body": "me is rolling the dice: 2",
    "severity_number": "<SeverityNumber.WARN: 13>",
    "severity_text": "WARNING",
    "attributes": {
        "otelSpanID": "0",
        "otelTraceID": "0",
        "otelTraceSampled": false,
        "otelServiceName": "dice-server"
    },
    "dropped_attributes": 0,
    "timestamp": "2023-11-27T22:09:22.982898Z",
    "trace_id": "0x00000000000000000000000000000000",
    "span_id": "0x0000000000000000",
    "trace_flags": 0,
    "resource": "BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.21.0', 'service.name': 'dice-server', 'telemetry.auto.version': '0.42b0'}, maxlen=None)"
}

Why care?

Notice that it extracts none of the richness (HTTP method etc.) shown in the docs. Were that the only issue, maybe that's ok, but also notice that it zeros out all the trace and span IDs. That seems like more of an issue.

Moreover, none of the histogram information e.g.

                    "bucket_counts": [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    "count": 1,
                    "explicit_bounds": [
                      0, 5, 10, 25, 50, 75, 100, 250, 500, 1000
                    ],
                    "max": 1,
                    "min": 1,
                    "start_time_unix_nano": 1666077040063027610,
                    "sum": 1,
                    "time_unix_nano": 1666077098181107419

as shown on the docs was produced after running the demo, curling upwards of 30 times, and waiting upwards of 60 seconds. cat rolldice.log | grep bucket to confirm.

rolldice.log

The text was updated successfully, but these errors were encountered:

cartermp · 2023-11-28T02:45:17Z

This sounds like autoinstrumentation didn't execute. Could this be related to your python version?

ekquasar · 2023-11-28T13:53:45Z

I suppose it could be related, but the docs say Python 3.6 and higher (https://opentelemetry.io/docs/instrumentation/python/#version-support). I'm using Python 3.11.6. It would be great to get an informative error in such a case, I'm happy to make that PR, were the issue to truly be related to an unsupported Python version.

cartermp · 2023-11-28T20:32:56Z

Yeah, I'd be shocked if 3.11 didn't work since it's not exactly hot off the presses. It's not always the case that instrumentation failing or something not getting initialized fails loudly, since doing so could be a bad time in production scenarios (imagine failing to add the python instrumentation agent on a live app and that crashing it, bringing down the app).

@open-telemetry/opentelemetry-python-contrib-approvers any ideas why autoinstrumented spans aren't getting created?

ekquasar · 2023-11-30T15:56:12Z

Here's e.g. what Datadog does - their auto-instrumentation provides discoverable warning messages.

srikanthccv · 2023-11-30T16:44:22Z

A recent change made to flask instrumentation is the reason open-telemetry/opentelemetry-python-contrib#1980. Here is the issue open-telemetry/opentelemetry-python-contrib#2058 and PR open-telemetry/opentelemetry-python-contrib#2013 that supports flask 3.x which also fixes this issue.

Auto instrumentation also emits yummy informative messages https://github.com/open-telemetry/opentelemetry-python-contrib/blob/4bf3577fb76480dacca12eecd8ed93cb9fccc274/opentelemetry-instrumentation/src/opentelemetry/instrumentation/auto_instrumentation/_load.py#L79-L84 when it fails to install or instrument a framework. In this case, it was dependency conflict conflict DependencyConflict: requested: "werkzeug<3.0.0" but found: "werkzeug 3.0.1". These debug logs don't show up until this is supported open-telemetry/opentelemetry-python#1059. Perhaps docs/getting started should also pin the opentelemetry-distro to avoid the surprises and update occasionally or there could be some docs tests that could assert the output of the instrumentation.

ekquasar · 2023-11-30T20:00:50Z

Automatic tests for the code in the docs would be premium.

svrnm · 2023-12-01T11:02:31Z

Automatic tests for the code in the docs would be premium.

I agree. If you like to help with that, let us know, I have some requirements for that in my head, but never came to write down an issue for that yet.

ekquasar · 2023-12-01T13:17:31Z

Excellent, is this a good place to get stakeholder feedback on those requirements or does this community use another forum for that?

svrnm · 2023-12-01T14:19:07Z

Excellent, is this a good place to get stakeholder feedback on those requirements or does this community use another forum for that?

GitHub issues is the right place for it, but we also have bi-weekly SIG meetings & you can find folks at #otel-comms on CNCF slack, see https://github.com/open-telemetry/community for more details. If you are willing to invest time and resources to tackle that problem, I recommend that you raise an issue for that and collect some feedback first, because some people will have thought about this problem in the past already and can provide some valuable feedback

ekquasar · 2023-12-01T16:41:28Z

Unrelated to above discussion of CI, I reported this issue in detail in the otel-python repo: open-telemetry/opentelemetry-python-contrib#2079

Re: above discussion of CI: ACK I'll go to the Slack and start collecting stakeholder input.

UPDATE: see https://cloud-native.slack.com/archives/CJFCJHG4Q/p1701449543685509 for ongoing discussion.

svrnm · 2023-12-01T16:58:02Z

UPDATE: see cloud-native.slack.com/archives/CJFCJHG4Q/p1701449543685509 for ongoing discussion.

Please create a dedicated issue first

cartermp · 2023-12-01T17:51:48Z

Hrm. At best the action we'd take would be to temporarily roll back everything python-instrumentation-related until such a time that the issues @srikanthccv mentioned are fixed and rolled out.

I've worked at a place where we had live-running code samples and it was an absolute pain to maintain long-term. We ended up rolling back to just pasting code samples inline since the vast majority of the time they just work.

ekquasar · 2023-12-01T19:03:37Z

The issue is, for Python at least, I haven't yet found a single piece of the documentation that works as advertised. Literally not a one.

There may be some creative middle ground, regardless. For example, the example code samples that are linked to by the docs could have CI enforced on them - so we're guaranteed at least one working example.

Or something like this - and then disclaimer on the docs that they are untested and not expected to work as is, then point to the working examples.

UPDATE: I found a working example - the manual implementation at https://opentelemetry.io/docs/instrumentation/python/automatic/example/#manually-instrumented-server appears to work.

cartermp · 2023-12-01T21:26:31Z

Yeah, I think lacking open-telemetry/opentelemetry-python-contrib#2013 is why you're seeing what you're seeing. This is just an extremely impactful bug that hits people everywhere, since "autoinstrument your flask app" is one of the most basic operations to do when starting with OTel + Python.

So I don't want to say no to the idea of some more automation to at least catch when a basic scenario like this just completely doesn't work. But at the same time, I think it's a reasonable assumption for anyone to have that a scenario like this should always work, and that anyone who documents how to do it shouldn't need to have a last-known-good-version in their back pocket to fall back on each time (nor should we necessarily have to pepper banners on the docs saying "doesn't quite work yet, sorry!").

I don't have a great alternative idea, though. It's just that maintenance along these lines will be fairly expensive.

svrnm · 2023-12-06T16:26:58Z

Let's discuss the CI stuff on the dedicated issue for this and get back to the original issue.

@srikanthccv:

Perhaps docs/getting started should also pin the opentelemetry-distro to avoid the surprises and update occasionally or there could be some docs tests that could assert the output of the instrumentation.

Agreed, we do the same for Java and other languages already, including a workflow to autoupdate versions, let me look into that.

svrnm · 2023-12-06T16:32:20Z

Werkzeug==2.3.7 does the trick, cc @luke6Lh43

Let's fix that and also work on pinning the versions of other packages

cwegener · 2023-12-06T20:54:28Z

From this discussion, I don't understand which of the changes in Werkzeug 3.0.0 is causing the issue. @svrnm Are you able to clarify what is causing the issue?

svrnm · 2023-12-06T21:15:29Z

See this previous comment by @srikanthccv: #3610 (comment), which eventually leads to this PR pinning werkzeug < 3 for the instrumentation: open-telemetry/opentelemetry-python-contrib#1980

cwegener · 2023-12-06T21:26:00Z

I think I understand now.

Flask 3.0 is not supported (yet), pending open-telemetry/opentelemetry-python-contrib#1975

When the SDK was updated to restrict the supported Flask version to <3.0 (since, 3.0 support is still in development), the docs were not updated to reflect the change? Am I getting this right?

So, this is where the whole "automated Documentation testing" discussion comes in, right?

svrnm · 2023-12-06T22:00:23Z

No, the docs were updated with the <3 restriction for flask (see #3359), but at that point in time werkzeug was still in 2.x, which changed afterwards (https://pypi.org/project/Werkzeug/#history) and led to a new issue.

cwegener · 2023-12-07T01:44:56Z

and led to a new issue.

I think that was my question 😂 ... what is the issue? What is causing it?

ekquasar added the bug Something isn't working label Nov 27, 2023

cartermp added the sig:python label Nov 28, 2023

ekquasar mentioned this issue Dec 1, 2023

CI for docs' code snippets #3633

Closed

svrnm mentioned this issue Dec 6, 2023

Pin werkzeug version to fix python getting started issue #3655

Merged

cartermp closed this as completed in #3655 Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python "getting started" demo does not produce output as shown on instructions #3610

Python "getting started" demo does not produce output as shown on instructions #3610

ekquasar commented Nov 27, 2023 •

edited

cartermp commented Nov 28, 2023

ekquasar commented Nov 28, 2023 •

edited

cartermp commented Nov 28, 2023

ekquasar commented Nov 30, 2023

srikanthccv commented Nov 30, 2023

ekquasar commented Nov 30, 2023

svrnm commented Dec 1, 2023

ekquasar commented Dec 1, 2023

svrnm commented Dec 1, 2023

ekquasar commented Dec 1, 2023 •

edited

svrnm commented Dec 1, 2023

cartermp commented Dec 1, 2023

ekquasar commented Dec 1, 2023 •

edited

cartermp commented Dec 1, 2023

svrnm commented Dec 6, 2023

svrnm commented Dec 6, 2023

cwegener commented Dec 6, 2023

svrnm commented Dec 6, 2023

cwegener commented Dec 6, 2023

svrnm commented Dec 6, 2023

cwegener commented Dec 7, 2023

Python "getting started" demo does not produce output as shown on instructions #3610

Python "getting started" demo does not produce output as shown on instructions #3610

Comments

ekquasar commented Nov 27, 2023 • edited

What went wrong?

Why care?

cartermp commented Nov 28, 2023

ekquasar commented Nov 28, 2023 • edited

cartermp commented Nov 28, 2023

ekquasar commented Nov 30, 2023

srikanthccv commented Nov 30, 2023

ekquasar commented Nov 30, 2023

svrnm commented Dec 1, 2023

ekquasar commented Dec 1, 2023

svrnm commented Dec 1, 2023

ekquasar commented Dec 1, 2023 • edited

svrnm commented Dec 1, 2023

cartermp commented Dec 1, 2023

ekquasar commented Dec 1, 2023 • edited

cartermp commented Dec 1, 2023

svrnm commented Dec 6, 2023

svrnm commented Dec 6, 2023

cwegener commented Dec 6, 2023

svrnm commented Dec 6, 2023

cwegener commented Dec 6, 2023

svrnm commented Dec 6, 2023

cwegener commented Dec 7, 2023

ekquasar commented Nov 27, 2023 •

edited

ekquasar commented Nov 28, 2023 •

edited

ekquasar commented Dec 1, 2023 •

edited

ekquasar commented Dec 1, 2023 •

edited