Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Streaming] Results from nodes with streaming set header X-Consul-Knownleader: false #9776

Open
pierresouchay opened this issue Feb 17, 2021 · 6 comments
Labels
theme/streaming Related to Streaming connections between server and client type/enhancement Proposed improvement or new feature

Comments

@pierresouchay
Copy link
Contributor

When Streaming is running, the header X-Consul-Knownleader: false is set to false.

Some SDKs such as Orbitz, use this field to throw an error, thus, it fails to retrieve data.

This header should be set to true with streaming activated

pierresouchay added a commit to pierresouchay/consul that referenced this issue Feb 17, 2021
This is not a real fix but will avoid breaking clients relying on this header

Fix hashicorp#9776
pierresouchay added a commit to criteo-forks/consul that referenced this issue Feb 17, 2021
This is not a real fix but will avoid breaking clients relying on this header

Fix hashicorp#9776
@pierresouchay
Copy link
Contributor Author

Cc @rickfast, @heruv1m, @yfouquet any opinion on this (and the linked PR and discussion)?

@jsosulska jsosulska added theme/streaming Related to Streaming connections between server and client type/enhancement Proposed improvement or new feature labels Feb 24, 2021
@chuckyz
Copy link

chuckyz commented Feb 26, 2021

@pierresouchay I encountered a situation in a very small production DC where X-Consul-Knownleader: false was appended only on a specific X-Consul-Index which ended up breaking all communication from a client perspective in one case (e.g.: curl http://127.0.0.1/v1/health/service/foo?index=12345 returned a 500 but curl http://127.0.0.1/v1/health/service/foo?index=12346 did not). I've only ever seen this happen with grpc streaming. Could this possibly be related?

I saw your issue and was curious if anyone else has seen the above behavior.

@pierresouchay
Copy link
Contributor Author

@chuckyz Unlikely, in such case, the Consul agent does not return HTTP 500, it does return HTTP 200, but with such header wrongly set (which is interpreted by Orbitz as no-result). So, if you are sure it was an HTTP 500, it was not the same thing.

You could have however hit another but fixed (in our case) by #9555 when a service is not changing too much (I mean less than every 10min), which is likely the case if you are using a vanilla Consul without any patch. See https://medium.com/criteo-engineering/consul-streaming-whats-behind-it-6f44f77a5175

criteoconsul pushed a commit to criteo-forks/consul that referenced this issue Apr 22, 2021
This is not a real fix but will avoid breaking clients relying on this header

Fix hashicorp#9776
@dnephin
Copy link
Contributor

dnephin commented May 5, 2021

Cross posting from #9778,

The Consistency Modes docs for this header say:

can be used by clients to gauge the staleness of a result and take appropriate action

A false value for this header does not mean "there is no leader", it simply means that the leader status was not verified. There are plenty of places where we return KnownLeader: false simply because we didn't set the field, and when the response is returned from the cache we may set it to true, even though the leader status was not verified as part of that request.

It does not seem appropriate to error when this header is false.

@pierresouchay
Copy link
Contributor Author

Hello @dnephin ,

I highly disagree with your analysis of the doc and about who is wrong/right. From an implementor POV, this field means : the data you see is not correct, hence the choice of Orbitz to say: this data cannot be trusted. So it seems reasonable to make the choice of failing.

There are plenty of places where we return KnownLeader: false simply because we didn't set the field

That's not my experience : it happens only IIRC where there are some calls to Consul Servers, even so, having this field set, from an implementor POV means don't trust this data. I think implementors of SDKs cannot be blamed for that, because it sounds reasonable. This field not being set in Consul internals is not an valid excuse IMHO. Either set it to true, either discard it. But having it false in API responses is a breaking change to me, blowing away existing semantics.

It simply means that streaming breaks the API (and it did for us at large scale when I was working at Criteo) in some kind of subtle ways, and this is really dangerous.

Cc @yfouquet

@dnephin
Copy link
Contributor

dnephin commented May 5, 2021

I am not sure how it implies the data is incorrect. At most it seems like it could imply the data is maybe stale. It can't even say definitely if it is stale or not, because if the data hasn't changed on the leader it is not stale.

I can definitely say that it is misleading in many cases. If any response is returned from the cache this value could be true, but the value in the cache could be up to 3 days old, so a leader could have been lost any time during those 3 days and it will still say KnownLeader: true anyway.

I agree with you that subtle changes like this to the API are a problem. A safer approach would be to completely remove the header from the API entirely, since it can't be used safely. That would require a change to any SDK anyway, so probably ignoring the header for now is a good approach until it can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/streaming Related to Streaming connections between server and client type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants