Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming: set default query timeout #10707

Merged
merged 3 commits into from
Jul 28, 2021

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented Jul 27, 2021

This PR fixes two problems:

  1. the HTTP API would log cancelled requests as an error
  2. the health endpoint, with the streaming backend, did not set the default timeout that was used with the blocking query backend

Now that we have at least one endpoint that uses context for cancellation we can
encounter this scenario where the returned error is a context.Cancelled or
context.DeadlineExceeded.

If the request.Context().Err() is not nil, then we know the request itself was cancelled, so
we can log a different message at Info level, instad of the error.
@dnephin dnephin added the theme/streaming Related to Streaming connections between server and client label Jul 27, 2021
@dnephin dnephin requested a review from a team July 27, 2021 21:56
@github-actions github-actions bot added the theme/config Relating to Consul Agent configuration, including reloading label Jul 27, 2021
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging July 27, 2021 21:58 Inactive
@vercel vercel bot temporarily deployed to Preview – consul July 27, 2021 21:58 Inactive
agent/config/runtime.go Show resolved Hide resolved

// ApplyDefaultQueryOptions returns a function which will set default values on
// the options based on the configuration.
func ApplyDefaultQueryOptions(config *RuntimeConfig) func(options *structs.QueryOptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to keep this a pointer? Otherwise we may need a nil-check on config

Copy link
Contributor Author

@dnephin dnephin Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I often try not to use pointers and instead use values when possible. I think it would be fine to use a value here, but I opted for a pointer because RuntimeConfig is a very large struct, and generally (because it is so large) we have been passing it around by pointer. I think we expect to only call this method on startup, so using a non-pointer might also be ok.

Regarding the nil-check, if this value were to come from a user then I think that would be very important. But the user can't directly set RuntimeConfig. It's expected the agent startup will always create one of these, and there should be no way that it could be nil.

So the only way for RuntimeConfig to be nil would be a "programming error". A panic is an appropriate way of handling programming errors because we expect our test suite to catch them. If we were to check for the nil, we'd either have to change the signature and return an error from this function, or we'd create a custom panic with more detail.

I suspect a detailed panic wouldn't really help here, since a nil pointer panic should hopefully be easy to debug. Changing the function signature to return an error would imply a runtime error, but there is no possible runtime error here (only programming errors). I generally try to avoid adding error returns unless there is a possible runtime error because it can force all the callers to also handle the error (which can be a distraction if the only cause is a possible programming error).

I think passing a non-pointer value could in some cases be worse. If we were to pass a zero value for RuntimeConfig it would force MaxQueryTime to be 0 always. Hopefully that kind of error would be noticed by tests, but a panic should be a lot more obvious.

In short, I think that's an excellent call out (checking for nil), but in this case I'm tempted to say the current implementation is reasonable given that we always expect a non-nil RuntimeConfig.

That said, it would be good to document these assumptions in the function docstring, so I will add a line about "RuntimeConfig must not be nil".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed answer! I definitely got more than I asked for :D
Your point about programming error vs runtime is really interesting - I've never thought of it that way and have seen my fair share of "this change touches a billion files because something down the stack now returns an error"

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel. I think this looks like a good pattern for applying those defaults.

I had one comment inline which I think should probably be changed to avoid creating a new confusing log line if ever we have context timeout errors which are not related to clients at all. What do you think?

"from", req.RemoteAddr,
"error", err,
)
if req.Context().Err() != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also catch context.DeadlineExceeded errors for which I don't think this message is appropriate. Should we limit it to only context.Canceled?

Copy link
Member

@banks banks Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually not sure about this. I wasn't thinking about the fact that this is the req.Context directly rather than a test against any err that might be returned by the handler.

In that case I don't know of any way the request context would error with DeadlineExceeded so this is probably equivalent. I'm thinking this would potentially occur if there were some middle ware in the server that implemented a server-side timeout in which case it's not really the client cancelling the request.

But that doesn't exist so this is probably moot. Will leave it up to you if you thing the extra specificity is worthwhile.

Copy link
Contributor Author

@dnephin dnephin Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the context hierarchy, I think the only way for req.Context().Err() to be not nil is if either:

  • the request itself was cancelled by the user making the request
  • if http.Server.BaseContext is set, and something cancels that context

I don't think we use BaseContext yet, but even if we did I think this message would be appropriate. Generally the only reason for the BaseContext to be cancelled would be if the server was shutting down (because once cancelled every request would fail immediately), in which case logging the request as cancelled would also be accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, I guess because BaseContext is a function, that could act as the middleware, and it could set a deadline, so I think you're right it is possible.

I guess in most cases the err here would include the context about it being a timeout or a cancel, so maybe this log line will be ok in that case as well? I think it's a bit hard to say until we start using BaseContext.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised I was not reading it right. This LGTM, see if you feel like the possible tweak to make this potentially more future proof but not a blocker.

The blocking query backend sets the default value on the server side.
The streaming backend does not using blocking queries, so we must set the timeout on
the client.
@dnephin dnephin force-pushed the dnephin/streaming-setup-default-timeout branch from 16b49ca to 5edee9b Compare July 28, 2021 21:50
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging July 28, 2021 21:50 Inactive
@vercel vercel bot temporarily deployed to Preview – consul July 28, 2021 21:50 Inactive
@dnephin dnephin merged commit d2b58cd into main Jul 28, 2021
@dnephin dnephin deleted the dnephin/streaming-setup-default-timeout branch July 28, 2021 22:29
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/418180.

@hc-github-team-consul-core
Copy link
Collaborator

🍒✅ Cherry pick of commit d2b58cd onto release/1.10.x succeeded!

hc-github-team-consul-core pushed a commit that referenced this pull request Jul 28, 2021
…ult-timeout

streaming: set default query timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/config Relating to Consul Agent configuration, including reloading theme/streaming Related to Streaming connections between server and client
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants