Monitoring and logging HTTP 500 errors & HTTP 200s for Vault? #13072

vallamost · 2021-11-05T22:30:18Z

Is your feature request related to a problem? Please describe.

It appears that Vault does not return an HTTP status code in the response logs to a client as documented here at the time of writing this: https://support.hashicorp.com/hc/en-us/articles/360000995548-Audit-and-Operational-Log-Details

I am trying to diagnose and root cause HTTP 500 errors being received by our clients that are interacting with our Vault service's API. At this time it is impossible to know the amount of HTTP 500 errors being thrown by our Vault service. If I could set up a logging filter to know when and where HTTP 500s are being thrown as well as getting their request IDs to troubleshoot the request in the stack that would be super valuable.

Describe the solution you'd like
A clear and concise description of what you want to happen.

I'd like to see Vault add a new attribute in the JSON log output for the http_response_code sent in a response to a client calling Vault's REST API.

Example log entry

{
...
  "type": "response",
....
  "http_response_code": 500,
  "error": "Server was too busy to handle the request, please try again later",
...
}

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Log all error responses and categorize them to some type of HTTP status?

Explain any additional use-cases
If there are any use-cases that would help us understand the use/need/value please share them as they can help us decide on acceptance and prioritization.

Debugging HTTP errors, monitoring for HTTP errors, looking at success rates and availability.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

ncabatoff · 2021-11-08T12:56:50Z

Hi @vallamost,

Audit logs are often big enough on a busy vault as to be impractical to scan in realtime for monitoring purposes. While I'm not opposed to adding some details like this to the audit log, I would prefer to prioritize adding some metrics to provide this information - something like request_errors with a code label to break it down by response code.

vallamost · 2021-11-08T19:38:45Z

These HTTP codes don't have to be in the audit logs... If there's a better logging location for them then that would be preferred. Wouldn't a proposed request_errors metric only show HTTP error codes? I'd be willing to bet many Vault admins including myself would rather have the full list of status codes and their counts over time. 200s, 300s, 400s, etc. Every application with web server logs has always included the HTTP response codes that I have worked with. I don't see why Vault should be different in that regard :)

If status codes are only available as a metric and they're excluded from logs then it seems like advance debugging is harder than it should be. If an admin saw a large uptick in 400 status codes, I'm sure they would like to filter and search through their logs for 400 status codes and find the culprit. Just having a metric would tell you there's an issue but without Vault web server logs with HTTP status codes you would be stuck having to go to all of your client logs if you even have those or have access to them...that's no bueno.

ncabatoff · 2021-11-10T19:52:26Z

Wouldn't a proposed request_errors metric only show HTTP error codes?

What other error codes were you thinking of?

The rest of your comments make me think that we have different visions of what the audit log is for and how it's meant to be used. Some differences between the vault audit log and an http server's request log:

vault will fail a request if it can't write an audit log for it
unauthenticated requests aren't written to the audit log
audit logs include much of the contents of the request and response bodies (obfuscated)

There's some overlap between the use cases for audit logging and request logging, but they're not the same thing.

In the course of this discussion I realized that Vault should have an (opt-in) request logging feature like you're envisioning, whereby some key fields like (code, path, method) are logged to the regular server log. It's something I've wanted in the past but always assumed we had a reason for not doing... turns out I was wrong! We're already working on a related feature so we'll throw that in there.

ncabatoff · 2022-01-14T15:29:28Z

The related feature in question will be in 1.10, changelog entry:

Report in-flight requests: Adding a trace capability to show in-flight requests, and a new gauge metric to show the total number of in-flight requests [GH-13024]

Docs for the logging part of it:
https://github.com/hashicorp/vault/pull/13024/files#diff-d584f191e3c65d4ea5ac58a9d7f22d4f0d3c0d25d351c2a997bd054dbdf222ef

hghaf099 · 2022-02-08T01:17:30Z

Based on the comments, it seems that the requested capability in vault is already addressed. I am going to close this ticket for now. Please reopen this issue or open a new one for further discussions.

ncabatoff added core/metric enhancement labels Nov 8, 2021

hghaf099 closed this as completed Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring and logging HTTP 500 errors & HTTP 200s for Vault? #13072

Monitoring and logging HTTP 500 errors & HTTP 200s for Vault? #13072

vallamost commented Nov 5, 2021 •

edited

Loading

ncabatoff commented Nov 8, 2021

vallamost commented Nov 8, 2021 •

edited

Loading

ncabatoff commented Nov 10, 2021

ncabatoff commented Jan 14, 2022

hghaf099 commented Feb 8, 2022

Monitoring and logging HTTP 500 errors & HTTP 200s for Vault? #13072

Monitoring and logging HTTP 500 errors & HTTP 200s for Vault? #13072

Comments

vallamost commented Nov 5, 2021 • edited Loading

ncabatoff commented Nov 8, 2021

vallamost commented Nov 8, 2021 • edited Loading

ncabatoff commented Nov 10, 2021

ncabatoff commented Jan 14, 2022

hghaf099 commented Feb 8, 2022

vallamost commented Nov 5, 2021 •

edited

Loading

vallamost commented Nov 8, 2021 •

edited

Loading