Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More understandable timeouts in checks when value exceeds interval, Fixes #5834 #5835

Closed

Conversation

pierresouchay
Copy link
Contributor

When Timeouts were greater or equal than interval for checks,

Consul used to set timeout to 10s. Now, use the interval as upper boundary in that case.

Also outputs a warning in logs in that case.

Thus:

Checks: [
 {
   "http": "http://localhost:80/healthz",
   "interval": "30s",
   "timeout": "30s"
  }
]

Will now use a timeout of 30s and not 10s as it used to be the case

@pierresouchay
Copy link
Contributor Author

Will fix #5834

…ixes hashicorp#5834

When Timeouts were greater or equal than interval for checks,
Consul used to set timeout to 10s. Now, use the interval as upper
boundary in that case.

Also outputs a warning in logs in that case
Added unit tests for both TCP and HTTP checks
Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work!
There is one case in which your computation returns a value that I did not expect:

timeout = 12s, interval = 0: the default interval is set and the timeout gets set to 10s. I think it should still be 12s because the user explicitly configured it like that.

This playground has a naive implementation of what I mean: https://play.golang.org/p/uTIQBvnghWg.

Since the comments are saying that the timeout must not be greater than the interval, I was under the impression that the checks were run at this particular interval. But this is not the case, the interval is used to determine the waiting time until the next check is performed:

case <-next:
c.check()
next = time.After(c.Interval)

Which means that even when the timeout is bigger than the interval, there will never be the situation were unfinished checks are piling up. Which was my assumption of why we had this code in the first place. Maybe it was that way in the past.

Looking forward to hear what you think.

@pierresouchay
Copy link
Contributor Author

@i0rek Thank you for watching this PR.

To be frank, I don't know what is the best. I was not sure whether having timeout bigger than interval could cause issues, I thought that compared to existing implementation, it is the less disruptive.

The interval = 0 seems impossible for checks, since it would be set to checks.MinInterval which means 1s , so the code is still valid.

I can change it if you want, but compared to the existing implementation, I think it is a reasonable choice.

Maybe we can document more that having timeouts greater than interval will cause timeouts to be used?

@pierresouchay
Copy link
Contributor Author

@i0rek what do you think about my last comment #5835 (comment) ?

Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at this I realised again that the checks cannot overlap because the interval starts after the check returns. Which makes me think that we should just use whatever timeout was provided by the user or use the default one.

In case we are doing it like that, it would have to go into 1.6, because it is a breaking change.

What do you think about that?

@pierresouchay
Copy link
Contributor Author

@i0rek It basically means that for now: duration between 2 checks is between 1 and 2*interval (timeout which is bounded by interval + interval)

So basically, this PR changes nothing to the existing behavior except for people using very long timeout (bigger than interval) that used to be time outed after 10s and will be at interval now...

Sound quite OK to me and easier to understand for people without the knowledge about Consul internals without completely changing behavior in an insane way.

Might probably deserves a note un documentation anyway

@freddygv
Copy link
Contributor

freddygv commented Jun 27, 2019

@pierresouchay If the configured timeout is 12s and the interval is 10s why not set the timeout at 12s? This bounds the duration between checks at timeout + interval.

We are leaning toward changing the implementation so that timeouts are configurable without being capped. (While likely maintaining the Timeout > 0 constraint, to avoid not having a timeout in place)

It's worth considering that the check registration docs don't mention the timeouts being restricted:
https://www.consul.io/api/agent/check.html#timeout

@s-mang
Copy link
Contributor

s-mang commented Jul 17, 2019

Closing this in favor of #6094 which will go in with the 1.6 release.

@s-mang s-mang closed this Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants