Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring and logging HTTP 500 errors & HTTP 200s for Vault? #13072

Closed
vallamost opened this issue Nov 5, 2021 · 5 comments
Closed

Monitoring and logging HTTP 500 errors & HTTP 200s for Vault? #13072

vallamost opened this issue Nov 5, 2021 · 5 comments

Comments

@vallamost
Copy link

vallamost commented Nov 5, 2021

Is your feature request related to a problem? Please describe.

It appears that Vault does not return an HTTP status code in the response logs to a client as documented here at the time of writing this: https://support.hashicorp.com/hc/en-us/articles/360000995548-Audit-and-Operational-Log-Details

I am trying to diagnose and root cause HTTP 500 errors being received by our clients that are interacting with our Vault service's API. At this time it is impossible to know the amount of HTTP 500 errors being thrown by our Vault service. If I could set up a logging filter to know when and where HTTP 500s are being thrown as well as getting their request IDs to troubleshoot the request in the stack that would be super valuable.

Describe the solution you'd like
A clear and concise description of what you want to happen.

I'd like to see Vault add a new attribute in the JSON log output for the http_response_code sent in a response to a client calling Vault's REST API.

Example log entry

{
...
  "type": "response",
....
  "http_response_code": 500,
  "error": "Server was too busy to handle the request, please try again later",
...
}

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Log all error responses and categorize them to some type of HTTP status?

Explain any additional use-cases
If there are any use-cases that would help us understand the use/need/value please share them as they can help us decide on acceptance and prioritization.

Debugging HTTP errors, monitoring for HTTP errors, looking at success rates and availability.

Additional context
Add any other context or screenshots about the feature request here.

@ncabatoff
Copy link
Collaborator

Hi @vallamost,

Audit logs are often big enough on a busy vault as to be impractical to scan in realtime for monitoring purposes. While I'm not opposed to adding some details like this to the audit log, I would prefer to prioritize adding some metrics to provide this information - something like request_errors with a code label to break it down by response code.

@vallamost
Copy link
Author

vallamost commented Nov 8, 2021

These HTTP codes don't have to be in the audit logs... If there's a better logging location for them then that would be preferred. Wouldn't a proposed request_errors metric only show HTTP error codes? I'd be willing to bet many Vault admins including myself would rather have the full list of status codes and their counts over time. 200s, 300s, 400s, etc. Every application with web server logs has always included the HTTP response codes that I have worked with. I don't see why Vault should be different in that regard :)

If status codes are only available as a metric and they're excluded from logs then it seems like advance debugging is harder than it should be. If an admin saw a large uptick in 400 status codes, I'm sure they would like to filter and search through their logs for 400 status codes and find the culprit. Just having a metric would tell you there's an issue but without Vault web server logs with HTTP status codes you would be stuck having to go to all of your client logs if you even have those or have access to them...that's no bueno.

@ncabatoff
Copy link
Collaborator

Wouldn't a proposed request_errors metric only show HTTP error codes?

What other error codes were you thinking of?

The rest of your comments make me think that we have different visions of what the audit log is for and how it's meant to be used. Some differences between the vault audit log and an http server's request log:

  • vault will fail a request if it can't write an audit log for it
  • unauthenticated requests aren't written to the audit log
  • audit logs include much of the contents of the request and response bodies (obfuscated)

There's some overlap between the use cases for audit logging and request logging, but they're not the same thing.

In the course of this discussion I realized that Vault should have an (opt-in) request logging feature like you're envisioning, whereby some key fields like (code, path, method) are logged to the regular server log. It's something I've wanted in the past but always assumed we had a reason for not doing... turns out I was wrong! We're already working on a related feature so we'll throw that in there.

@ncabatoff
Copy link
Collaborator

The related feature in question will be in 1.10, changelog entry:

Report in-flight requests: Adding a trace capability to show in-flight requests, and a new gauge metric to show the total number of in-flight requests [GH-13024]

Docs for the logging part of it:
https://github.com/hashicorp/vault/pull/13024/files#diff-d584f191e3c65d4ea5ac58a9d7f22d4f0d3c0d25d351c2a997bd054dbdf222ef

@hghaf099
Copy link
Contributor

hghaf099 commented Feb 8, 2022

Based on the comments, it seems that the requested capability in vault is already addressed. I am going to close this ticket for now. Please reopen this issue or open a new one for further discussions.

@hghaf099 hghaf099 closed this as completed Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants