Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python "getting started" demo does not produce output as shown on instructions #3610

Closed
ekquasar opened this issue Nov 27, 2023 · 21 comments · Fixed by #3655
Closed

Python "getting started" demo does not produce output as shown on instructions #3610

ekquasar opened this issue Nov 27, 2023 · 21 comments · Fixed by #3655
Labels
bug Something isn't working sig:python

Comments

@ekquasar
Copy link
Contributor

ekquasar commented Nov 27, 2023

What went wrong?

In the Python otel getting started docs, under the run-the-instrumented-app section, there is example output that user is to expect from the application. Reproduced from the docs:

{
    "name": "/rolldice",
    "context": {
        "trace_id": "0xdb1fc322141e64eb84f5bd8a8b1c6d1f",
        "span_id": "0x5c2b0f851030d17d",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2023-10-10T08:14:32.630332Z",
    "end_time": "2023-10-10T08:14:32.631523Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "GET",
        "http.server_name": "127.0.0.1",
        "http.scheme": "http",
        "net.host.port": 8080,
        "http.target": "/rolldice?rolls=12",
        "net.peer.ip": "127.0.0.1",
        "http.user_agent": "curl/8.1.2",
        "net.peer.port": 58419,
        "http.flavor": "1.1",
        "http.route": "/rolldice",
        "http.status_code": 200,
...

Notice the richness of the information included: HTTP method, port, peer IP, route, status, etc. are included. However, when running the demo as-is, none of this information is included in the server logs. I've attached the full logs to this ticket (rolldice.log below) and reproduced some of the output here for ease of viewing:

{
    "body": "\u001b[31m\u001b[1mWARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.\u001b[0m\n * Running on http://127.0.0.1:8080",
    "severity_number": "<SeverityNumber.INFO: 9>",
    "severity_text": "INFO",
    "attributes": {
        "otelSpanID": "0",
        "otelTraceID": "0",
        "otelTraceSampled": false,
        "otelServiceName": "dice-server"
    },
    "dropped_attributes": 0,
    "timestamp": "2023-11-27T22:09:01.452634Z",
    "trace_id": "0x00000000000000000000000000000000",
    "span_id": "0x0000000000000000",
    "trace_flags": 0,
    "resource": "BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.21.0', 'service.name': 'dice-server', 'telemetry.auto.version': '0.42b0'}, maxlen=None)"
}
{
    "body": "\u001b[33mPress CTRL+C to quit\u001b[0m",
    "severity_number": "<SeverityNumber.INFO: 9>",
    "severity_text": "INFO",
    "attributes": {
        "otelSpanID": "0",
        "otelTraceID": "0",
        "otelTraceSampled": false,
        "otelServiceName": "dice-server"
    },
    "dropped_attributes": 0,
    "timestamp": "2023-11-27T22:09:01.453917Z",
    "trace_id": "0x00000000000000000000000000000000",
    "span_id": "0x0000000000000000",
    "trace_flags": 0,
    "resource": "BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.21.0', 'service.name': 'dice-server', 'telemetry.auto.version': '0.42b0'}, maxlen=None)"
}
{
    "body": "me is rolling the dice: 2",
    "severity_number": "<SeverityNumber.WARN: 13>",
    "severity_text": "WARNING",
    "attributes": {
        "otelSpanID": "0",
        "otelTraceID": "0",
        "otelTraceSampled": false,
        "otelServiceName": "dice-server"
    },
    "dropped_attributes": 0,
    "timestamp": "2023-11-27T22:09:22.982898Z",
    "trace_id": "0x00000000000000000000000000000000",
    "span_id": "0x0000000000000000",
    "trace_flags": 0,
    "resource": "BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.21.0', 'service.name': 'dice-server', 'telemetry.auto.version': '0.42b0'}, maxlen=None)"
}

Why care?

Notice that it extracts none of the richness (HTTP method etc.) shown in the docs. Were that the only issue, maybe that's ok, but also notice that it zeros out all the trace and span IDs. That seems like more of an issue.

Moreover, none of the histogram information e.g.

                    "bucket_counts": [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    "count": 1,
                    "explicit_bounds": [
                      0, 5, 10, 25, 50, 75, 100, 250, 500, 1000
                    ],
                    "max": 1,
                    "min": 1,
                    "start_time_unix_nano": 1666077040063027610,
                    "sum": 1,
                    "time_unix_nano": 1666077098181107419

as shown on the docs was produced after running the demo, curling upwards of 30 times, and waiting upwards of 60 seconds. cat rolldice.log | grep bucket to confirm.

rolldice.log

@ekquasar ekquasar added the bug Something isn't working label Nov 27, 2023
@cartermp
Copy link
Contributor

This sounds like autoinstrumentation didn't execute. Could this be related to your python version?

@ekquasar
Copy link
Contributor Author

ekquasar commented Nov 28, 2023

I suppose it could be related, but the docs say Python 3.6 and higher (https://opentelemetry.io/docs/instrumentation/python/#version-support). I'm using Python 3.11.6. It would be great to get an informative error in such a case, I'm happy to make that PR, were the issue to truly be related to an unsupported Python version.

@cartermp
Copy link
Contributor

Yeah, I'd be shocked if 3.11 didn't work since it's not exactly hot off the presses. It's not always the case that instrumentation failing or something not getting initialized fails loudly, since doing so could be a bad time in production scenarios (imagine failing to add the python instrumentation agent on a live app and that crashing it, bringing down the app).

@open-telemetry/opentelemetry-python-contrib-approvers any ideas why autoinstrumented spans aren't getting created?

@ekquasar
Copy link
Contributor Author

Here's e.g. what Datadog does - their auto-instrumentation provides discoverable warning messages.

Screenshot 2023-11-28 at 5 47 31 PM

@srikanthccv
Copy link
Member

A recent change made to flask instrumentation is the reason open-telemetry/opentelemetry-python-contrib#1980. Here is the issue open-telemetry/opentelemetry-python-contrib#2058 and PR open-telemetry/opentelemetry-python-contrib#2013 that supports flask 3.x which also fixes this issue.

Auto instrumentation also emits yummy informative messages https://github.com/open-telemetry/opentelemetry-python-contrib/blob/4bf3577fb76480dacca12eecd8ed93cb9fccc274/opentelemetry-instrumentation/src/opentelemetry/instrumentation/auto_instrumentation/_load.py#L79-L84 when it fails to install or instrument a framework. In this case, it was dependency conflict conflict DependencyConflict: requested: "werkzeug<3.0.0" but found: "werkzeug 3.0.1". These debug logs don't show up until this is supported open-telemetry/opentelemetry-python#1059. Perhaps docs/getting started should also pin the opentelemetry-distro to avoid the surprises and update occasionally or there could be some docs tests that could assert the output of the instrumentation.

@ekquasar
Copy link
Contributor Author

Automatic tests for the code in the docs would be premium.

@svrnm
Copy link
Member

svrnm commented Dec 1, 2023

Automatic tests for the code in the docs would be premium.

I agree. If you like to help with that, let us know, I have some requirements for that in my head, but never came to write down an issue for that yet.

@ekquasar
Copy link
Contributor Author

ekquasar commented Dec 1, 2023

Excellent, is this a good place to get stakeholder feedback on those requirements or does this community use another forum for that?

@svrnm
Copy link
Member

svrnm commented Dec 1, 2023

Excellent, is this a good place to get stakeholder feedback on those requirements or does this community use another forum for that?

GitHub issues is the right place for it, but we also have bi-weekly SIG meetings & you can find folks at #otel-comms on CNCF slack, see https://github.com/open-telemetry/community for more details. If you are willing to invest time and resources to tackle that problem, I recommend that you raise an issue for that and collect some feedback first, because some people will have thought about this problem in the past already and can provide some valuable feedback

@ekquasar
Copy link
Contributor Author

ekquasar commented Dec 1, 2023

Unrelated to above discussion of CI, I reported this issue in detail in the otel-python repo: open-telemetry/opentelemetry-python-contrib#2079

Re: above discussion of CI: ACK I'll go to the Slack and start collecting stakeholder input.

UPDATE: see https://cloud-native.slack.com/archives/CJFCJHG4Q/p1701449543685509 for ongoing discussion.

@svrnm
Copy link
Member

svrnm commented Dec 1, 2023

UPDATE: see cloud-native.slack.com/archives/CJFCJHG4Q/p1701449543685509 for ongoing discussion.

Please create a dedicated issue first

@cartermp
Copy link
Contributor

cartermp commented Dec 1, 2023

Hrm. At best the action we'd take would be to temporarily roll back everything python-instrumentation-related until such a time that the issues @srikanthccv mentioned are fixed and rolled out.

I've worked at a place where we had live-running code samples and it was an absolute pain to maintain long-term. We ended up rolling back to just pasting code samples inline since the vast majority of the time they just work.

@ekquasar
Copy link
Contributor Author

ekquasar commented Dec 1, 2023

The issue is, for Python at least, I haven't yet found a single piece of the documentation that works as advertised. Literally not a one.

There may be some creative middle ground, regardless. For example, the example code samples that are linked to by the docs could have CI enforced on them - so we're guaranteed at least one working example.

Or something like this - and then disclaimer on the docs that they are untested and not expected to work as is, then point to the working examples.

UPDATE: I found a working example - the manual implementation at https://opentelemetry.io/docs/instrumentation/python/automatic/example/#manually-instrumented-server appears to work.

@cartermp
Copy link
Contributor

cartermp commented Dec 1, 2023

Yeah, I think lacking open-telemetry/opentelemetry-python-contrib#2013 is why you're seeing what you're seeing. This is just an extremely impactful bug that hits people everywhere, since "autoinstrument your flask app" is one of the most basic operations to do when starting with OTel + Python.

So I don't want to say no to the idea of some more automation to at least catch when a basic scenario like this just completely doesn't work. But at the same time, I think it's a reasonable assumption for anyone to have that a scenario like this should always work, and that anyone who documents how to do it shouldn't need to have a last-known-good-version in their back pocket to fall back on each time (nor should we necessarily have to pepper banners on the docs saying "doesn't quite work yet, sorry!").

I don't have a great alternative idea, though. It's just that maintenance along these lines will be fairly expensive.

@svrnm
Copy link
Member

svrnm commented Dec 6, 2023

Let's discuss the CI stuff on the dedicated issue for this and get back to the original issue.

@srikanthccv:

Perhaps docs/getting started should also pin the opentelemetry-distro to avoid the surprises and update occasionally or there could be some docs tests that could assert the output of the instrumentation.

Agreed, we do the same for Java and other languages already, including a workflow to autoupdate versions, let me look into that.

@svrnm
Copy link
Member

svrnm commented Dec 6, 2023

Werkzeug==2.3.7 does the trick, cc @luke6Lh43

Let's fix that and also work on pinning the versions of other packages

@cwegener
Copy link

cwegener commented Dec 6, 2023

From this discussion, I don't understand which of the changes in Werkzeug 3.0.0 is causing the issue. @svrnm Are you able to clarify what is causing the issue?

@svrnm
Copy link
Member

svrnm commented Dec 6, 2023

See this previous comment by @srikanthccv: #3610 (comment), which eventually leads to this PR pinning werkzeug < 3 for the instrumentation: open-telemetry/opentelemetry-python-contrib#1980

@cwegener
Copy link

cwegener commented Dec 6, 2023

I think I understand now.

Flask 3.0 is not supported (yet), pending open-telemetry/opentelemetry-python-contrib#1975

When the SDK was updated to restrict the supported Flask version to <3.0 (since, 3.0 support is still in development), the docs were not updated to reflect the change? Am I getting this right?

So, this is where the whole "automated Documentation testing" discussion comes in, right?

@svrnm
Copy link
Member

svrnm commented Dec 6, 2023

No, the docs were updated with the <3 restriction for flask (see #3359), but at that point in time werkzeug was still in 2.x, which changed afterwards (https://pypi.org/project/Werkzeug/#history) and led to a new issue.

@cwegener
Copy link

cwegener commented Dec 7, 2023

and led to a new issue.

I think that was my question 😂 ... what is the issue? What is causing it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sig:python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants