-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(docs): Update opentelemetry example to show access logs #33925
base: main
Are you sure you want to change the base?
Conversation
Hi @mmanciop, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
f36493c
to
cbf5b79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for working on this @mmanciop
can i suggest removing all changes other than the access log additions
as said below, if we make the access log changes then this should be reflected in the docs for the sandbox, and tested
@@ -63,9 +63,11 @@ services: | |||
healthcheck: | |||
test: ["CMD-SHELL", "curl -sf http://localhost:13133 || exit 1"] | |||
interval: 1s | |||
timeout: 120s | |||
timeout: 10s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure why this change is here or whether its desirable - i think it would make it retry every 1/12s
that said - 120 retries is probably too high - not sure where these values came from originally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collector comes up in a second or two tops. 120s is IMO just cause for FUD for the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool - probably in that case tho - my reading of docs (maybe incorrectly) is that we would want also to change the retries
values also to prevent it spinning out with healthchecks - happy to be wrong about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, the Otel collector comes up or not. I don't remember I saw a collector crashing for issues that were not related with configuration being invalid. And those won't be helped with retries. I could happily see myself drop the retry entirely, although the original reason might have been to bring another collector up when the current goes OOM; whether that precaution in an example warrants the humongous delay on crash loop when the user edits the configs to be invalid... I am unconvinced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - nm - i misread - interval
is set
in which case i think retries
is most likely not required at all, but im possibly missing something there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interval
The health check will first run interval seconds after the container is started, and then again interval seconds after each previous check completes.
Timeout
If a single run of the check takes longer than timeout seconds then the check is considered to have failed.
Retries
It takes retries consecutive failures of the health check for the container to be considered unhealthy.
From StackOverflow and this cheatsheet. I feel retries are useless. The collector will virtually always answer if it's alive (unless under the most massive GC I ever saw it doing), and if it's not alive, it's 99% a config issue, which are not helped by retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - i didnt read docs properly obviously - not sure in that case but what we want to avoid at all costs is making the sandbox flakey in ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel getting rid of retries in this case should make anything flakier tbh. If anything, it'd fail faster.
@@ -25,6 +25,16 @@ static_resources: | |||
cluster_name: opentelemetry_collector | |||
timeout: 0.250s | |||
service_name: envoy-1 | |||
access_log: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these access_log
changes are the real change in the PR
im not totally clear what the purpose is but if we are going to change these then the docs and verify.sh
script should reflect that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these access_log changes are the real change in the PR
Pretty much, but also updating the collector to a more recent version and adjusting the configs is worthy in its own right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless i misunderstand tho - the version isnt updated - just made explicit
happy to carry some cleanups, but i think the change is probably great and requires focus
to land, we really need to update the docs to explain, and relatedly to test what the docs say to prevent bitrot etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the version did not actually change.
Before updating the docs, I would love for the correlation to... well actually work (see #30268). My idea to open the PR this early was to give @ashishb-solo a quick way to test the changes, and to motivate why the PR to actually fix the correlation was needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks for context
is it the same issue as #33906 ?
i reopened the one you posted but i have a feeling that they are now duping
with these sandboxes they are documentation first, so to justify adding a load of configuration it needs to be documented and tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI #33909 solves the problem pretty nicely, see #33909 (comment)
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
8e723eb
to
03fc3a4
Compare
/docs |
Docs for this Pull Request will be rendered here: https://storage.googleapis.com/envoy-pr/33925/docs/index.html The docs are (re-)rendered each time the CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the file that needs to be edited to update docs https://github.com/envoyproxy/envoy/blob/main/docs/root/start/sandboxes/opentelemetry.rst
i think lets insert a new step 4 (maybe more) that demonstrates what the config is doing
and this is the rendered docs for this sandbox at current commit https://storage.googleapis.com/envoy-pr/a712ed7/docs/start/sandboxes/opentelemetry.html
once we update we can look at the https://github.com/envoyproxy/envoy/blob/main/examples/opentelemetry/verify.sh which is intended to ~reflect the steps in the docs
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
abc07d6
to
8abd1e3
Compare
Done in 8abd1e3 |
Preparing the docs I noticed that maybe a little more setup is needed to add attributes. It will take me a little to hammer out :-) |
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
9c1863d
to
64a7b50
Compare
/docs |
Docs for this Pull Request will be rendered here: https://storage.googleapis.com/envoy-pr/33925/docs/index.html The docs are (re-)rendered each time the CI |
@mmanciop will review more thoroughly shortly - but copying out ci suggestion to get past (one of) ci fails Please fix your editor to ensure:
- no trailing whitespace
- no preceding mixed tabs/spaces
- all files end with a newline |
051d660
to
2344c8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating @mmanciop - adding/updating these sandboxes generally takes a few
main feedback is to use the existing yaml files from the examples/opentelemtry
dir as any user checking out repo and following along will have them directly
these get added automagically to the rst build dir (by bazel) so can be referenced with _include/openetelemetry/*yaml
existing sandbox docs mostly dont have captions but elsewhere we use this for downloading the file - in that case i think its "name" should be examples/opentelemetry/*.yaml
- ie repo-relative
Co-authored-by: phlax <phlax@users.noreply.github.com> Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
… for docs Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmanciop quick docs review
step 5 comes after the see also
section which should be last - i would move step 5 -> step 4 and increment current step 5
wrt step 6 - its misplaced here - this should be in the config reference section - but when i checked that it seems that this page is currently missing or ~empty - this is the only config reference i could see https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/stat_sinks/open_telemetry_stat_sink.html
Moved see_also to the end of the page in 93676b8, and deleted the configuration section that is misplaced. |
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
b7f93cf
to
93676b8
Compare
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really close, thanks @mmanciop
for the verify.sh
script i think we just want to add another section reflecting the addition to the docs - the tests can be pretty simple - just grep in the logs for the highlighted lines - something like
"${DOCKER_COMPOSE[@]}" logs opentelemetry | grep "Trace ID"
Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
@mmanciop could you update the verify.sh script and fix formatting so we can see the verify ci running for ref you can run/test locally (assumes local user has access to Docker socket) $ cd examples/opentelemetry
$ ./verify.sh |
@phlax that's what I am working on, but it won't be pretty. The collector debug output cannot currently be formatted to JSON and that's gonna hurt. |
in that case just use grep i reckon - that is ~equivalent to the highlighted lines in console output docs i think |
for a stricter test you could fish out the ids and check they match - that would be pretty cool - but i reckon optional |
…ccess logs match those of spans Signed-off-by: Michele Mancioppi <michele.mancioppi@dash0.com>
Multiline |
/wait for #33909 |
Commit Message: Update the OpenTelemetry demo to show access log correlation and a more recent collector
Additional Description: This PR updates the OpenTelemetry demo to show how to send access logs to an OpenTelemetry collector, and how to correlate those logs with tracing data (pending #30268). Also, use a more recent build of the OpenTelemetry collector.
Risk Level: Low
Testing: Manual
Docs Changes: None
Release Notes: The OpenTelemetry demo has been updated to show access log correlation with tracing data and to use a more recent build of the OpenTelemetry collector
Platform Specific Features: N/A
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]