-
Notifications
You must be signed in to change notification settings - Fork 23.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always replicate TTLs as absolute timestamps in milliseconds #8474
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
28a04ab
to
d265120
Compare
d265120
to
a219f07
Compare
@redis/core-team Ok, probably time for a wider review. Nan outlined the two major additions in the top comment. The first one is more should be mostly agreed upon. The second item is just a mechanism to test that the absolute timestamp is happening correctly, the current implementation minimizes the amount of code. |
i'm sorry, i didn't follow the discussion in this PR and issue (wasn't aware of them). |
@oranagra, I was under the impression this would not be in 6.2 given that we said only bug fixes for the last RC. You can take as much time as you need to think it through :) |
a219f07
to
3d09aa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update LGTM, still waiting on the major change decision.
@oranagra I just read through the other related issues on this topic #8433 and #5171. I am not entirely convinced about the original argument that Salvatore was making that the TTL should be the relative time since the moment key was written onto the redis server which is subject to the replication delay on the replica. From the point of view of the Redis user, I am writing a key with an TTL onto the master, and the expectation is that the key should be expired after that time duration from this moment on. So I also lean towards having the replica attempt to expire the item at the same point in time as the master regardless of the replication lag between master and replica. I just looked at this PR from @ny0312 and I think the implementation looks good mostly. I raised a few minor comments for him namely around introducing the new |
Sorry for the delay, also sorry to be the last to join the party. TL:DR i'm voting in favor of this PR (didn't read the code yet). The main reason is that we're trying to fight two contradicting issues here:
Item 2 can also be said to be affected by trivial network latency, but that's not in the same ballpark of the other two replication delay issues. On top of that there's the concern of code simplicity and consistency, and it would certainly be better to replicate the same to AOF and replicas, and have the same type of (absolute) value in full-sync and replication command stream. I don't agree with some of the arguments that were posted, but the bottom line is that these two concerns contradict each other, and one of these concerns can be eliminated while the other can't (at least not easily). Now that we seem to agree about this change we need to decide when it's safe to merge. if we conclude that we don't wanna release this change in 6.2.x, it would be better to leave it out of unstable for a while, so that unstable doesn't diverge too far from 6.2, making it harder to cherry pick bug fixes. Footnotes: I think what matters here is actually the wall-clock of the client machine. Requesting a key to expire in 3 minutes means 3 minutes since the command was sent, or since the time it was processed by the master (the client machine's wall-clock at that time). And as Salvattore mentioned if the replica clock is 2 minutes ahead of master's clock, and the client is using the replica to accelerate reads, from the client's perspective, the key will appear to expire on the replica after one minute (and remain in the master for another 2). If we really wanted to solve it anyway, what we can do with quite a lot of extra complexity is:
this way the replica knows how to respect the client's intent (imagine it knows the client machine's wall-clock at the time the command was processed by the master). I don't think we wanna this way... |
@oranagra Appreciate the thoughtful comments. I agree. Two things:
(1) Stop replicas from expiring(hiding) keys independently based on local clock. This is basically what Madelyn was referring to in #8433 with With this scheme, TTL will essentially become a logical-time-based concept - A TTL key’s life span is measured in terms of replication offsets. Its starting offset is when a master first received the TTL. Its end offset is when a master expires it. The key lives for the same span of offsets on every node, regardless how much wall time it takes each node to replicate/traverse through that span. (2) Another option is like what you suggested - replicate master's local time. E.g. master would periodically send its current local time to its replicas via replication stream. Upon receiving these timestamps, replicas would reset their local clock accordingly. This way, as long as time's "velocity" is roughly the same between master and replicas, TTL keys should live relatively the same lifetime on them. Either way, even though it is related, I think it's better if we tackle this issue separately and leave it out of scope for this particular PR. I would love to hear what you think. |
9e02c9c
to
7ac2165
Compare
Nice. I will create a doc PR this week and reference the link here. Thanks again for the reviews. |
FYI: valgrind and our freebsd CI is so slow that more than 10 seconds passed from the time the command was sent to the time it was executed: #9010 |
@madolson @oranagra Follow-up PR for API documentation on Please review. Thanks. |
i see the valgrind run still fails despite my fix (to change from 10s to 100s).
@ny0312 maybe you can please look into that? |
@oranagra I am taking a look on this |
…onds (redis#8474) Till now, on replica full-sync we used to transfer absolute time for TTL, however when a command arrived (EXPIRE or EXPIREAT), we used to propagate it as is to replicas (possibly with relative time), but always translate it to EXPIREAT (absolute time) to AOF. This commit changes that and will always use absolute time for propagation. see discussion in redis#8433 Furthermore, we Introduce new commands: `EXPIRETIME/PEXPIRETIME` that allow extracting the absolute TTL time from a key.
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. We could see that on the unstable profile there are around 7% of CPU cycles spent on rewriteClientCommandVector that are not present on 6.2.7. This was introduced in #8474. This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments. the above usage of rewriteClientCommandArgument reduces the overhead in half. This PR should also improve PEXPIREAT performance by avoiding at all rewriteClientCommandArgument usage. Co-authored-by: Oran Agra <oran@redislabs.com>
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. We could see that on the unstable profile there are around 7% of CPU cycles spent on rewriteClientCommandVector that are not present on 6.2.7. This was introduced in redis#8474. This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments. the above usage of rewriteClientCommandArgument reduces the overhead in half. This PR should also improve PEXPIREAT performance by avoiding at all rewriteClientCommandArgument usage. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit c3fb48d)
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. We could see that on the unstable profile there are around 7% of CPU cycles spent on rewriteClientCommandVector that are not present on 6.2.7. This was introduced in redis#8474. This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments. the above usage of rewriteClientCommandArgument reduces the overhead in half. This PR should also improve PEXPIREAT performance by avoiding at all rewriteClientCommandArgument usage. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit c3fb48d)
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. We could see that on the unstable profile there are around 7% of CPU cycles spent on rewriteClientCommandVector that are not present on 6.2.7. This was introduced in #8474. This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments. the above usage of rewriteClientCommandArgument reduces the overhead in half. This PR should also improve PEXPIREAT performance by avoiding at all rewriteClientCommandArgument usage. Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit c3fb48d)
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. We could see that on the unstable profile there are around 7% of CPU cycles spent on rewriteClientCommandVector that are not present on 6.2.7. This was introduced in redis#8474. This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments. the above usage of rewriteClientCommandArgument reduces the overhead in half. This PR should also improve PEXPIREAT performance by avoiding at all rewriteClientCommandArgument usage. Co-authored-by: Oran Agra <oran@redislabs.com>
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. We could see that on the unstable profile there are around 7% of CPU cycles spent on rewriteClientCommandVector that are not present on 6.2.7. This was introduced in redis#8474. This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments. the above usage of rewriteClientCommandArgument reduces the overhead in half. This PR should also improve PEXPIREAT performance by avoiding at all rewriteClientCommandArgument usage. Co-authored-by: Oran Agra <oran@redislabs.com>
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. We could see that on the unstable profile there are around 7% of CPU cycles spent on rewriteClientCommandVector that are not present on 6.2.7. This was introduced in redis#8474. This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments. the above usage of rewriteClientCommandArgument reduces the overhead in half. This PR should also improve PEXPIREAT performance by avoiding at all rewriteClientCommandArgument usage. Co-authored-by: Oran Agra <oran@redislabs.com>
Related to ##8433
Part I
This commit is to always replicate time-to-live(TTL) values as absolute UNIX timestamps in milliseconds. With this commit, all mutations in Redis would start to be propagated the same way in both AOF and replication stream. No more special command rewrite/translation for AOF only.
This commit aims to mitigate two issues, as discussed here ##8433:
SET K V EX 10
on the primary, setting TTL of key K to be 10 seconds.In aggregate, the key lived for T2+10s-T1 in wall time, where T2-T1 is the replication lag between old primary and new primary. The larger the replication lag is, the more the key outlives its intended lifetime.
SET A a EX 3600
andSET B b EX 3600
This is a counter-intuitive experience for clients. The client set two keys to expire after 1 hour at the relatively same wall time, but one outlived another for 1 hour.
Part II
Introduced a new command
EXPIRETIME
that returns the absolute Unix timestamp of an expire:Returns the absolute Unix timestamp(since January 1, 1970) at which the given key will expire.
Options
The
EXPIRETIME
command supports a set of options that modify its behavior:Return value
Integer reply: TTL in milliseconds, or a negative value in order to signal an error (see the description above).