Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add RESPONSE_CODE_DETAILS and CONNECTION_TERMINATION_DETAILS to access log #27903

Merged
merged 2 commits into from
Nov 25, 2020

Conversation

yangminzhu
Copy link
Contributor

@yangminzhu yangminzhu commented Oct 13, 2020

These log operators should be very useful for troubleshooting and debugging in many cases.

Example Logs (RESPONSE_CODE_DETAILS): (via_upstream , filter_chain_not_found, http1.codec_error, rbac_access_denied_matched_policy[none])

[2020-10-29T03:22:16.126Z] "GET /headers HTTP/1.1" 200 - via_upstream - "-" 0 547 5 4 "-" "curl/7.73.0-DEV" "24e58e43-e576-9048-829b-02e3f3da16f0" "httpbin:8000" "127.0.0.1:80" inbound|8000|| 127.0.0.1:51680 172.17.0.7:80 172.17.0.6:59022 outbound_.8000_._.httpbin.foo.svc.cluster.local default
[2020-10-29T03:23:33.415Z] "- - -" 0 NR filter_chain_not_found - "-" 0 0 0 - "-" "-" "-" "-" "-" - - 172.17.0.7:80 172.17.0.6:59800 - -
[2020-10-29T03:24:13.520Z] "- - HTTP/1.1" 400 DPE http1.codec_error - "-" 0 11 0 - "-" "-" "-" "-" "-" - - 172.17.0.7:80 172.17.0.6:60214 - -
[2020-10-29T03:25:20.162Z] "GET /headers HTTP/1.1" 403 - rbac_access_denied_matched_policy[none] - "-" 0 19 0 - "-" "curl/7.73.0-DEV" "07541777-6ee2-908f-a78b-867fe257ac80" "httpbin:8000" "-" - - 172.17.0.7:80 172.17.0.6:60898 outbound_.8000_._.httpbin.foo.svc.cluster.local -
[2020-10-29T03:26:11.561Z] "GET /headers HTTP/1.1" 403 - rbac_access_denied_matched_policy[ns[foo]-policy[httpbin]-rule[0]] - "-" 0 19 0 - "-" "curl/7.73.0-DEV" "090a1897-d696-99b1-90f0-2673113ba258" "httpbin:8000" "-" - - 172.17.0.7:80 172.17.0.6:33204 outbound_.8000_._.httpbin.foo.svc.cluster.local -

Example Logs (CONNECTION_TERMINATION_DETAILS): (rbac_access_denied_matched_policy[ns[foo]-policy[httpbin]-rule[0]])

[2020-10-29T03:28:26.690Z] "- - -" 0 - - rbac_access_denied_matched_policy[ns[foo]-policy[httpbin]-rule[0]] "-" 761 701 5 - "-" "-" "-" "-" "127.0.0.1:80" inbound|8000|| 127.0.0.1:55464 172.17.0.7:80 172.17.0.6:34574 outbound_.8000_._.httpbin.foo.svc.cluster.local -

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[ ] Performance and Scalability
[X] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Pull Request Attributes

Please check any characteristics that apply to this pull request.

[X] Does not have any changes that may affect Istio users.

@yangminzhu yangminzhu added the do-not-merge/hold Block automatic merging of a PR. label Oct 13, 2020
@yangminzhu yangminzhu requested review from a team as code owners October 13, 2020 00:45
@istio-policy-bot istio-policy-bot added the release-notes-none Indicates a PR that does not require release notes. label Oct 13, 2020
@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Oct 13, 2020
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 13, 2020
@howardjohn
Copy link
Member

would be great to see an example of what this looks like

@@ -41,6 +42,14 @@ const (
"\"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\" " +
"%UPSTREAM_CLUSTER% %UPSTREAM_LOCAL_ADDRESS% %DOWNSTREAM_LOCAL_ADDRESS% " +
"%DOWNSTREAM_REMOTE_ADDRESS% %REQUESTED_SERVER_NAME% %ROUTE_NAME%\n"
// EnvoyTextLogFormatIstio18 format for envoy text based access logs for Istio 1.8 onwards
EnvoyTextLogFormatIstio18 = "[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this break people parsing output, or are they expected to be using json format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's possible but I don't think we treat the log format as a stable API otherwise it seems we can never really update it? Also I think the user is probably using their customized access log format if they're going to parse it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with you, just wanted to put out the comment in case someone else disagrees. I have used https://github.com/nitishm/engarde which does this but that doesn't mean its a valid use case.

@yangminzhu
Copy link
Contributor Author

would be great to see an example of what this looks like

sure, I'm waiting for my other istio/proxy PR (istio/proxy#3045) merged and then update Envoy in istio to test this PR and include the output.

@howardjohn
Copy link
Member

Sounds good. This sounds super useful

@yangminzhu yangminzhu removed the do-not-merge/hold Block automatic merging of a PR. label Oct 27, 2020
@hzxuzhonghu
Copy link
Member

really helpful

@yangminzhu
Copy link
Contributor Author

/test integ-pilot-k8s-tests_istio

@howardjohn
Copy link
Member

Confused by your original example: HTTP/1.1 400 http1.codec_error DPE. Shouldn't it be HTTP/1.1 400 DPE http1.codec_error based on %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS%? was that for a different version of the PR?

@howardjohn
Copy link
Member

Clearly useful, will approve if @istio/wg-policies-and-telemetry-maintainers approve

@douglas-reid
Copy link
Contributor

to be clear, this is changing the default Istio format for access logs?

@mandarjog @nrjpoddar what do we want to do for defaulting in a Telemetry API world to match this? This is currently NOT shared across "providers".

@nrjpoddar
Copy link
Member

nrjpoddar commented Oct 28, 2020

I do think this is an API and it will break users who have tooling around log parsing. I would suggest that we create a global option to conditionally disable it in 1.9 with upgrade/release notes documentation around the impact of this change.

@yangminzhu
Copy link
Contributor Author

yangminzhu commented Oct 29, 2020

I do think this is an API and it will break users who have tooling around log parsing. I would suggest that we create a global option to conditionally disable it in 1.9 with upgrade/release notes documentation around the impact of this change.

@nrjpoddar The user should explicitly set the log format in accessLogFormat if they intend to process it, relying on the default log format seems too fragile for real use cases. And it is a bit strange to create a global option to control the default logging format when it is already configurable by accessLogFormat.

@yangminzhu
Copy link
Contributor Author

yangminzhu commented Oct 29, 2020

Confused by your original example: HTTP/1.1 400 http1.codec_error DPE. Shouldn't it be HTTP/1.1 400 DPE http1.codec_error based on %RESPONSE_CODE% %RESPONSE_FLAGS% %RESPONSE_CODE_DETAILS%? was that for a different version of the PR?

@howardjohn Oh, sorry about that, it's from my test cluster that is using out-dated code. Updated the description to include the latest log based on this PR.

@nrjpoddar
Copy link
Member

nrjpoddar commented Oct 30, 2020

I do think this is an API and it will break users who have tooling around log parsing. I would suggest that we create a global option to conditionally disable it in 1.9 with upgrade/release notes documentation around the impact of this change.

@nrjpoddar The user should explicitly set the log format in accessLogFormat if they intend to process it, relying on the default log format seems too fragile for real use cases. And it is a bit strange to create a global option to control the default logging format when it is already configurable by accessLogFormat.

I see, if that’s the case why do we need to change defaults? Shouldn’t we document how to update the default format and add these additional fields. Changing the default might be warranted if you can share issues filed by users where this info would have helped in debugging. Other way to think about this is we can document steps for logging these fields, add a warning about making them default next release but keep the default as is for a release and measure if we can get some feedback.

@yangminzhu
Copy link
Contributor Author

yangminzhu commented Nov 19, 2020

I do think this is an API and it will break users who have tooling around log parsing. I would suggest that we create a global option to conditionally disable it in 1.9 with upgrade/release notes documentation around the impact of this change.

@nrjpoddar The user should explicitly set the log format in accessLogFormat if they intend to process it, relying on the default log format seems too fragile for real use cases. And it is a bit strange to create a global option to control the default logging format when it is already configurable by accessLogFormat.

I see, if that’s the case why do we need to change defaults? Shouldn’t we document how to update the default format and add these additional fields. Changing the default might be warranted if you can share issues filed by users where this info would have helped in debugging. Other way to think about this is we can document steps for logging these fields, add a warning about making them default next release but keep the default as is for a release and measure if we can get some feedback.

RESPONSE_CODE_DETAILS and CONNECTION_TERMINATION_DETAILS are generally quite useful in troubleshooting why a request is denied, Envoy and many filters will set the two fields to give extra information of the deny, for example, RBAC filter could set it to include the name of the policy that caused the deny, the HTTP connection manager could set it to include the reason the stream is reset and by who (and there are more reasons can be set there), etc.

The most relevant documentation I can find on istio.io is this one: https://istio.io/latest/docs/tasks/observability/logs/access-log/#enable-envoy-s-access-logging, it tells how to enable the access log but doesn't mention the default value of accessLogFormat.

The Mesh config doc (https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#MeshConfig) mentions the following:

accessLogFormat | string | Format for the proxy access log Empty value results in proxy’s default access log format

It says proxy's default access log format but doesn't document the log format at all.

And I believe we had changed the log format a few times before, the most recent I can find is this PR #16237 that adds the ROUTE_NAME log operator to the default log format.

I can abandon this PR and just update the task to mention the two log operator but I'm afraid that would be much less useful and cannot be used if not all proxy version is up-to-date because the new log operator is only available in newer Envoy, and the accessLogFormat will send the unsupported log operator to all proxies causing issues to load the config.

@howardjohn
Copy link
Member

I am in favor of enabling this by default for the reasons @yangminzhu mentioned.

Users who are broken by this, are "wrong". We cannot make the entire project freeze just because we may break someones workflow. If you want to parse access logs, you should use JSON and/or define a stable format.

This is no different then us changing an istiod log format.

@howardjohn howardjohn added the do-not-merge/hold Block automatic merging of a PR. label Nov 19, 2020
Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a hold since there may not be consensus

@nrjpoddar
Copy link
Member

nrjpoddar commented Nov 19, 2020

Ok, I see your points and I agree with the benefits. Here are my asks:

  1. Release notes "none" is not correct for this change. I would suggest adding a release and upgrade note to let users know about this change in 1.9.
  2. Can you augment this https://istio.io/latest/docs/tasks/observability/logs/access-log/#enable-envoy-s-access-logging document to add a section about changing logging format and what the default format is?
  3. Add tests for "default" access logs ensuring we can catch changes in future.

@yangminzhu what do you think? We can do some of this in follow-on PRs if we agree on principle.

@douglas-reid
Copy link
Contributor

  1. Add tests for "default" access logs ensuring we can catch changes in future.

This will be especially important as we transition to a fuller Telemetry API. We want to ensure that transition is seamless, and having tests around default behavior will help in that regard.

@yangminzhu
Copy link
Contributor Author

Ok, I see your points and I agree with the benefits. Here are my asks:

  1. Release notes "none" is not correct for this change. I would suggest adding a release and upgrade note to let users know about this change in 1.9.

Sure, will add one.

  1. Can you augment this https://istio.io/latest/docs/tasks/observability/logs/access-log/#enable-envoy-s-access-logging document to add a section about changing logging format and what the default format is?

will do.

  1. Add tests for "default" access logs ensuring we can catch changes in future.

I can add some unit tests, does that sound good to you?

@yangminzhu what do you think? We can do some of this in follow-on PRs if we agree on principle.

I think I can just update the PR to include the release note and unit test, will need a separate PR to update the istio.io documentation. Thanks!

@nrjpoddar
Copy link
Member

@yangminzhu I'm not the right person to ask unit vs integration tests for this as I'm not familiar with the code. If @douglas-reid or @mandarjog are happy with your test I'm good too, so check with them.

@yangminzhu
Copy link
Contributor Author

@nrjpoddar @douglas-reid @mandarjog added release note and unit tests, could you take a second look, thanks.

@yangminzhu
Copy link
Contributor Author

/test unit-tests_istio

Copy link
Member

@nrjpoddar nrjpoddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for accommodating our suggestions.

Copy link
Contributor

@mandarjog mandarjog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks for pushing this through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. release-notes-none Indicates a PR that does not require release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants