Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LastModifyTime to catalog entities #4744

Closed
wants to merge 1 commit into from

Conversation

ShimmerGlass
Copy link
Contributor

Implementation for #3973

This PR is based on @pierresouchay's #4720 as they touch the same code (#4720 fixes cases where ModifyIndex is updated when it shouldn't be, and LastModifyTime is tied to ModifyIndex)

This change adds a new field LastModifyTime to NodeService, ServiceNode, Node, and HealthCheck structs. This field is initialized to the current time upon registration on the server side, and is then kept up to date when a mofification is made (aka when !old.IsSame(new)).

Copy link
Contributor

@pierresouchay pierresouchay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite nice, however, I'll also add a lastStateChange specifically for heathcheck that would be modified Only when state do change, ie: none to passing, passing to critical or warning... The idea is to be able to build progressive load balancing based on this value, for instance for increasing up weight smoothly.

With LastModified, changing notes or output on a healthcheck would change the LastModified, but we would like to have clear semantics here

@ShimmerGlass
Copy link
Contributor Author

@pierresouchay Done :)

Copy link
Contributor

@pierresouchay pierresouchay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aestek looks good to me. Are we sure the clock is UTC and monotonic as well?

@ShimmerGlass
Copy link
Contributor Author

@pierresouchay
This will not be using the monotonic clock when serialized to json since this is an absolute time. However, it will be using the monotonic clock for operations like time.Since(check.LastModifyTime) in the same process.
This currently using the server's timezone, which is included in the time.Time struct, and in the JSON representation. Do you think it should be forced to UTC ?

@pierresouchay
Copy link
Contributor

@Aestek absolutely, all should be UTC only and monotonic if possible, so increasing index should increase (or equal) the timestamp. Second granularity should be enough, but OK for Ms

@ShimmerGlass
Copy link
Contributor Author

ShimmerGlass commented Oct 5, 2018

@pierresouchay
Assuming the following implementation :

  1. an entity is created, its LastModifyTime is set to current wall clock
  2. update: entity.LastModifyTime += monotonic_clock - entity.LastModifyTime.monotonic

There's a few things to note :

  • Reported times can be in the future compared to wall clock, so for the clients time.Since(myCheck.LastModifyTime) can be negative (or higher that it should, but that's less likely to cause problems).
  • Monotonic clock is lost on master change. This means that LastModifyTime can be decreased when it is next updated by the new master, or if we do not allow that, stay the same until the wall clock has caught up with LastModifyTime even if a change has occurred.

@uepoch
Copy link
Contributor

uepoch commented Oct 5, 2018

The second point issue could actually be worse if you had one server going far in the future because of a (really) bogus situation, actually preventing all remaining "healthy" servers even to update the entries for undefined time
Could be good to add a reset mechanism if the drift is too high

@pierresouchay
Copy link
Contributor

Let's get the monotonic change of clock for a given server leader. Of course addressing clock drift between servers is out of scope, but time adjustments is not (hence the monotonic and UTC)

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I like the idea of keeping mtimes for entries. However, before giving this a more thorough review there is one big problem I see.

In order to keep the state consistent across all servers the last modification time needs to be set BEFORE the modification gets raftApply'ed in the agent/consul/catalog_endpoint.go code. Otherwise each server is going to have a slightly different lastModifyTime (hopefully not more than a few millisecond difference but still a difference).

So you should timestamp it within the RPC code. This also means that the fsm and state store do not need any timestamping capabilities so much of the modifications made to that code wont be needed.

@ShimmerGlass
Copy link
Contributor Author

@mkeeler Indeed, I totally agree with you.

Modifying the RPC code for this seems like too much change for this feature.

Proposal :

The most important thing for us in these changes is the LastStatusModifyTime on the checks. Lets drop the fields on the other structs and set the LastStatusModifyTime on the agent side when it sees a Status change. It would then be synced to the servers, assuring that they all have the exact same information.
What do think ?

@ShimmerGlass ShimmerGlass force-pushed the last-modify-time branch 2 times, most recently from 30b75a7 to d68d003 Compare October 9, 2018 15:16
@ShimmerGlass
Copy link
Contributor Author

@mkeeler I've updated the PR based on my last message.

@ShimmerGlass
Copy link
Contributor Author

@pierresouchay I've also changed to use UTC + monotonic increase. One thing to note is that if a check is serialized are loaded back in (from an agent restart for example) the monotonic clock will be lost and the implementation will fallback to wall clock.

@mkeeler
Copy link
Member

mkeeler commented Oct 9, 2018

@Aestek If I understand your proposal you are saying that the LastStatusModifyTime would be set on the agent and then the AE sync would push that up to the servers.

I will have to think on that a little more and get back with you. This should work for agent managed services but external services managed via the API would have to find another way to coordinate on the mtime.

@mkeeler mkeeler added the thinking More time is needed to research by the Consul Contributors label Oct 9, 2018
@ShimmerGlass
Copy link
Contributor Author

@mkeeler Yep, exactly.

It seems to me that we can treat API managed just like agent managed checks: as soon a check status is updated, LastStatusModifyTime would be updated in the same fashion. Do you propose that a new new field should be added to /agent/check/update/:check_id to manually specify the LastStatusModifyTime ?

This field is updated when the check status changes and allows to adjust
behaviour based on how much time has passed since the check is "passing"
for example.
@hanshasselberg
Copy link
Member

@Aestek are you still interested in adding this? I went through all the previous conversation and I like the latest version of your PR, because it has a small scope and fixes your issue.

I was wondering if it is necessary to provide an integration for services that are managed externally (eg by consul-esm). It seems to me that would be something we could do later if there is demand for it.

/cc @eikenb

@hanshasselberg hanshasselberg self-assigned this Dec 19, 2019
@hanshasselberg hanshasselberg added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Dec 19, 2019
@ShimmerGlass
Copy link
Contributor Author

@i0rek yes, we are still interested in this. as you pointed out external services will need to be managed manually but at least "standard" services will have this information.

@stale stale bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Dec 20, 2019
@hanshasselberg hanshasselberg added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 6, 2020
@hanshasselberg hanshasselberg removed the thinking More time is needed to research by the Consul Contributors label Jan 13, 2020
@stale stale bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 13, 2020
@hanshasselberg hanshasselberg added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 20, 2020
@hanshasselberg
Copy link
Member

@ShimmerGlass do you have time to rebase your PR?

@ghost ghost removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Feb 18, 2020
@hanshasselberg hanshasselberg added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Feb 18, 2020
@ghost ghost removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Feb 18, 2020
@hanshasselberg hanshasselberg added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Feb 19, 2020
@jsosulska
Copy link
Contributor

Hello all - This has been in limbo for a while now. I am going to close due to inactivity and the code base progressing from here.

@ShimmerGlass , or whomever would like to continue this - feel free to rebase, open a new PR, and we will review it.

@jsosulska jsosulska closed this Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-reply Waiting on response from Original Poster or another individual in the thread
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants