-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event history is not granular enough #925
Comments
@titanous The original idea of watch history is to fill the processing/latency gap between two continuous watch request. The assumption is that the gap should not be more than several seconds. So a global event history pool might be just fine. What you have suggested is a multiple version store.
Even if we provide this feature, we still need to do garbage collection at some point:
However, we probably can never guarantee that all watchers can see all the history no matter what happened. That means if you really care about consistency/correctness of your system, you have to have a recovery mechanism which is based on current state of etcd rather than the whole history of etcd. |
@xiangli-cmu We're running into this problem because we have another service using etcd that does a lot of writes, and another that does low frequency writes but also uses watch, so the gap is not the problem. This is the first time I've looked at this part of the code, I had assumed that the persistence mechanism was like the Redis AOF and the log was a core part of the system. It's obvious now that watch can't be relied upon to deliver a consistent set of events, so I will implement synchronization. The documentation doesn't mention this major caveat, so I assume that others would also think that watch would deliver a consistent state. |
@titanous Watch delivers a consistent events of the store. It can be point-in-time recovered based on the log. We can easily change the size of event history as large as the log file. But we still need to compact and snapshot the log. So watchers still suffers from losing previous event. Recording all the events means allowing the log growing forever. What I am saying is that we are trying to improving the watching behavior, but we still cannot give you a strong guarantee. |
@xiangli-cmu Is it possible to check the current |
Yes, and it could filter out the case that no modification happened. You may also implement it by sending a following GET request. If you wanna keep all history in memory, you could set We are discussing about adding an argument to set the length of watch history. If that happens and your system has assumption that the watch will happen after some time that write has finished, you will not receive this error forever. However, we recommend that the system should be able to handle this error. If network partition happens and one node is partitioned from the majority for a long time, all watches on this node may fail with this error when the node is connected back to the cluster. And they may need to find a way to handle this. |
@titanous Looks like the our last response answers the your question, if not please feel free to reopen this issue. Thanks. |
We're running into an issue with watches that looks like this:
watch /key1 waitIndex=123
/key2
401 The event in requested index is outdated and cleared
Now the process that cares about the
/key1
tree has to manually do aGET
and do a diff of a local cache (updated after each watch event). This diff/sync logic must be implemented for every service that uses watch for synchronization.Is there already a solution to this that I'm missing?
My first instinct is that instead of a global event history it should be per-key so that this scenario is impossible if clients are keeping up with updates.
The text was updated successfully, but these errors were encountered: