Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event history is not granular enough #925

Closed
titanous opened this issue Aug 10, 2014 · 7 comments
Closed

Event history is not granular enough #925

titanous opened this issue Aug 10, 2014 · 7 comments

Comments

@titanous
Copy link

We're running into an issue with watches that looks like this:

  • watch /key1 waitIndex=123
  • Many writes to /key2
  • First watch times out and tries to reconnect, then gets a 401 The event in requested index is outdated and cleared

Now the process that cares about the /key1 tree has to manually do a GET and do a diff of a local cache (updated after each watch event). This diff/sync logic must be implemented for every service that uses watch for synchronization.

Is there already a solution to this that I'm missing?

My first instinct is that instead of a global event history it should be per-key so that this scenario is impossible if clients are keeping up with updates.

@xiang90
Copy link
Contributor

xiang90 commented Aug 10, 2014

@titanous The original idea of watch history is to fill the processing/latency gap between two continuous watch request. The assumption is that the gap should not be more than several seconds. So a global event history pool might be just fine.

What you have suggested is a multiple version store.
We are still evaluating this idea, since:

  1. more and more people want to keep the pervious version of key/value paris.
  2. that could help with non-blocking snapshotting.
    ...

Even if we provide this feature, we still need to do garbage collection at some point:

  1. based on time
  2. the progress of slowest watcher
    ...

However, we probably can never guarantee that all watchers can see all the history no matter what happened. That means if you really care about consistency/correctness of your system, you have to have a recovery mechanism which is based on current state of etcd rather than the whole history of etcd.

@titanous
Copy link
Author

@xiangli-cmu We're running into this problem because we have another service using etcd that does a lot of writes, and another that does low frequency writes but also uses watch, so the gap is not the problem.

This is the first time I've looked at this part of the code, I had assumed that the persistence mechanism was like the Redis AOF and the log was a core part of the system.

It's obvious now that watch can't be relied upon to deliver a consistent set of events, so I will implement synchronization. The documentation doesn't mention this major caveat, so I assume that others would also think that watch would deliver a consistent state.

@xiang90
Copy link
Contributor

xiang90 commented Aug 10, 2014

@titanous Watch delivers a consistent events of the store. It can be point-in-time recovered based on the log. We can easily change the size of event history as large as the log file.

But we still need to compact and snapshot the log. So watchers still suffers from losing previous event. Recording all the events means allowing the log growing forever. What I am saying is that we are trying to improving the watching behavior, but we still cannot give you a strong guarantee.

@xiang90
Copy link
Contributor

xiang90 commented Aug 10, 2014

@titanous #926

@titanous
Copy link
Author

@xiangli-cmu Is it possible to check the current modifiedIndex of the watch target before returning the 401 error?

@yichengq
Copy link
Contributor

Yes, and it could filter out the case that no modification happened. You may also implement it by sending a following GET request.

If you wanna keep all history in memory, you could set snapshot=false. But we strongly don't recommend it because it could mess up memory and make garbage collection slow and finally makes the cluster cannot move forward anymore.

We are discussing about adding an argument to set the length of watch history. If that happens and your system has assumption that the watch will happen after some time that write has finished, you will not receive this error forever.

However, we recommend that the system should be able to handle this error. If network partition happens and one node is partitioned from the majority for a long time, all watches on this node may fail with this error when the node is connected back to the cluster. And they may need to find a way to handle this.

@kelseyhightower
Copy link
Contributor

@titanous Looks like the our last response answers the your question, if not please feel free to reopen this issue.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants