Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Prepared Queries to return Critical Nodes #4983

Open
jjathman opened this issue Nov 21, 2018 · 10 comments
Open

Allow Prepared Queries to return Critical Nodes #4983

jjathman opened this issue Nov 21, 2018 · 10 comments
Labels
theme/api Relating to the HTTP API interface theme/prepared-query Anything related to prepared queries type/enhancement Proposed improvement or new feature

Comments

@jjathman
Copy link

Feature Description

Allow for a new parameter of prepared query which will allow the query to return failing nodes.

Use Case(s)

We are looking at changing our usage of Consul to move from the /health/service endpoint to using prepared queries. Occasionally, we would like to use a service in a critical status as the final fallback if there are no available services in a passing or warning state. Right now the service endpoint will return those nodes, we would like to be able to replicate that behavior with a prepared query.

Ideally, the failing nodes would only be returned as the last ones.

@banks banks added the type/enhancement Proposed improvement or new feature label Nov 26, 2018
@banks
Copy link
Member

banks commented Nov 26, 2018

Hey @jjathman, thanks for the details.

Can you describe whether you would consume this via DNS or HTTP API (or both). If we add it we'd need to make it work for both which leads to fun dealing with DNS response truncation correctly when you have "order" to take into account. For example, we con't control which "order" DNS clients choose to consume results in. SRV records can include a weight so we can set that lower than other nodes but just ensuring they are last in the response doesn't actually guarantee that clients are less likely to use them in DNS A responses in general.

@jjathman
Copy link
Author

Hi @banks,

We would use this via the HTTP API only. Would it be possible to make this an HTTP only option? I can appreciate the difficulties in having this option via DNS. If no one has asked for the feature with DNS maybe it isn't necessary?

We have internally debated using services in a Error status as a final option, but it seems like it is the best thing to try if there are no other options available.

@banks
Copy link
Member

banks commented Nov 26, 2018

@jjathman if it's only for working around unexpected incidents then there is already a way to do something similar with prepared queries, with manual intervention.

We added field to prepared queries for ignoring certain checks . For example some people have found that if they have some bug or incident where one of their service health checks is returning an error, they can just update the prepared query to ignore that check temporarily so instances are still served until the check is fixed.

Would that be sufficient for your case? It's different than needing all services via prepared query for some other reason like trying to sync the whole set with a loadbalancer that does it's own health probes granted, but interested to know if you have specific needs that fall outside of that?

@banks
Copy link
Member

banks commented Nov 26, 2018

Your request is actually more-or-less the same thing as IgnoreCheckIDs: ["*"] (if that worked). We may want to make it more explicit like IncludeCritical: true though given that we have to add something to make it work.

I guess the real question is: do you really want everything all the time, or do you only want critical if there are no (or fewer than some threshold) healthy instances?

I could imagine for example adding a "min_healthy_threshold" which you might set to say 40% and only if there are fewer than 40% f the registered instances in a healthy state would it start returning critical. This is similar to how other load balances implement safety valves on circuit breakers.

@jjathman
Copy link
Author

Regarding your first question, yes I think your idea of using the IgnoreCheckIDs would work for us. I saw that field but it didn't occur to me to use that for this purpose. Thank you for the tip.

Trying to make this work I expected making a prepared query like this would do what I expected, but it doesn't seem to be returning services in a critical status.

{
  "Name": "",
  "Template": {
    "Type": "name_prefix_match"
  },
  "Service": {
    "Service": "${name.full}",
    "IgnoreCheckIDs": ["service:${name.full}"]
  }
}

I thought this would ignore the health check of the application. I must be doing something wrong.

For your second question, in this specific use case I think we would only want/use critical services if they were the only option. However, I think there is a more general use case where I would want to see everything and deal with the results at the client level. When I originally started looking at prepared queries I assumed it would be possible to retrieve any service instances that I can see via the /health endpoint, but maybe that wasn't the intention of using a prepared query.

I would be in favor of your IncludeCritical: true idea. That seems the most similar to the OnlyPassing option that currently exists.

@jjathman
Copy link
Author

Nevermind my question about why it wasn't working for me, we are on too old of a version for the ignore checks. I'll upgrade us now and test it again.

@banks banks removed the type/enhancement Proposed improvement or new feature label Nov 22, 2019
@stale
Copy link

stale bot commented Jan 21, 2020

Hey there,
We wanted to check in on this request since it has been inactive for at least 60 days.
If you think this is still an important issue in the latest version of Consul
or its documentation please reply with a comment here which will cause it to stay open for investigation.
If there is still no activity on this issue for 30 more days, we will go ahead and close it.

Feel free to check out the community forum as well!
Thank you!

@stale stale bot added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 21, 2020
@clkamp
Copy link

clkamp commented Jul 9, 2020

This feature would be extremely nice to have, because vault creates checks with unique names on each node, so IgnoreCheckIDs does not work for me, because it only matches exactly

@ghost ghost removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jul 9, 2020
@cbroglie
Copy link
Contributor

I could imagine for example adding a "min_healthy_threshold" which you might set to say 40% and only if there are fewer than 40% f the registered instances in a healthy state would it start returning critical. This is similar to how other load balances implement safety valves on circuit breakers.

This is exactly what I'm looking for, essentially an automated way to set IgnoreCheckIDs: ["*"] (again, pretending that works) in case of widespread failure.

@jsosulska jsosulska added theme/api Relating to the HTTP API interface type/enhancement Proposed improvement or new feature labels Sep 17, 2020
@jkirschner-hashicorp jkirschner-hashicorp added the theme/prepared-query Anything related to prepared queries label Jul 19, 2021
@urusha
Copy link

urusha commented Nov 1, 2022

It seems that IgnoreCheckIDs: ["*"] currently doesn't work, and we have no variable to dynamically populate checkids so there is no way to return all services including critical in DNS.
Actually I'd like to suggest another parameter for Service: States (list with default value: ["normal","warning"] which is the same as OnlyPassing: false). Available list values are normal, warning, critical. This will allow to create prepared queries like this:

{
    "Name": "crit-",
    "Template": {
        "Type": "name_prefix_match",
        "Regexp": "^crit-(.+)$"
    },
    "Service": {
        "Service": "${match(1)}",
        "States": ["critical"]
    }
}

To match critical services only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface theme/prepared-query Anything related to prepared queries type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

7 participants