Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QGET with less file write & syncs #5193

Closed
sangmank opened this issue Apr 26, 2016 · 12 comments
Closed

QGET with less file write & syncs #5193

sangmank opened this issue Apr 26, 2016 · 12 comments
Assignees
Milestone

Comments

@sangmank
Copy link

sangmank commented Apr 26, 2016

A QGET operation in etcd incurs a small write & fdatasync(), and it can wear out an SSD rather quickly (within a couple of years) if QGET operations are issued frequently. We want eliminate or reduce the frequency of write & fdatasync() that happen with QGET operations.

One question is, how does the current QGET make sure that the history never goes back once QGET is committed? As discussed in Diego Ongaro's thesis 3.6.1, a log entry may get reverted in the future even if the majority appends the entry to their logs and the current leader commits it. Is it possible for the QGET entry to be reverted in such a fashion? If not, I wonder how etcd ensures the QGET entry is committed persistently even in face of the leader failure & re-election.

@xiang90
Copy link
Contributor

xiang90 commented Apr 26, 2016

Current QGET only applies after it get committed. Committed entries will never be reverted.

We probably want to do what described in 6.3 section to bypass the disk io path.

@sangmank
Copy link
Author

(Sorry for spamming your inbox.) I could not locate a definition of "commit" in any document, so I would wonder what the definition of commit here is. I guess a log entry is committed if the commit index variables of the majority are updated, even if the leader fails to get enough responses. Is my interpretation reasonable?

And what does the 'apply' mean here? Does it mean returning to the client with the queried value at the time of the QGET entry?

@xiang90
Copy link
Contributor

xiang90 commented Apr 26, 2016

@sangmank

From raft paper,

commit index
index of highest log entry known to be committed 
(initialized to 0, increases monotonically)

applied index
index of highest log entry applied to state machine 
(initialized to 0, increases monotonically)

For etcd, commit index is the commit index of raft. Applied index is the index of the log entries that the kv layer has applied to.

@xiang90
Copy link
Contributor

xiang90 commented Apr 26, 2016

Diego Ongaro's thesis 3.6.1, a log entry may get reverted in the future even if the majority appends the entry to their logs and the current leader commits it

This is not accurate I believe. It can be reverted even if the current leader appends it to the WAL, but not after it get committed.

@sangmank
Copy link
Author

sangmank commented Apr 26, 2016

@xiang90 It seems I interpreted the section inaccurately. Thank you for correcting my understanding.

The following is my current understanding of QGET and the commit index: (Please correct me.)

The commit index of a leader actually doesn't seem to mean that the entry is 'committed' across the majority. It only means that the majority has stored the entry in the log, unless the next set of rpcs get responded.

Unlike quorum=false, QGET should respond after the next set of log append rpcs get responded, which is after the commit index of the majority got updated & acknowledged to the leader.

In the current etcd, QGET is implemented with an additional QGET entry in the log, and once the QGET entry gets committed by the majority of nodes, etcd returns the requested value at the point QGET gets applied.

With a (potential) new implementation, if the etcd server gets a QGET request, etcd passes a sequence number for the last request to raft as a parameter. The raft leader waits until all the requests before or at the sequence number to be committed by the majority, and only then the raft returns the state after the logs up the the requested sequence number are applied.

@sangmank
Copy link
Author

sangmank commented Apr 26, 2016

@mattstrathman pointed out that the thesis chapter 6.4 (read-only requests) contains the sequence of our implementation, and I think the step 1 is something I missed -- there needs to be at least one log entry at the current leader's term.

We are sort of wondering why QGET is not the default behavior. Maybe this deserves another issue.

@xiang90
Copy link
Contributor

xiang90 commented Apr 26, 2016

@sangmank For v3, QGET is default. We do not want to change this for v2.

@sangmank
Copy link
Author

@xiang90 I see. Thank you for the comment.

@xiang90 xiang90 added this to the unplanned milestone May 10, 2016
@xiang90
Copy link
Contributor

xiang90 commented May 11, 2016

/cc @swingbach

@xiang90
Copy link
Contributor

xiang90 commented May 27, 2016

@sangmank The effort is happening at #5468. We are getting close to get this done.

@sangmank
Copy link
Author

@xiang90 Sounds great. We got swamped by our other projects. I will keep an eye on the issue #5468 .

@xiang90 xiang90 self-assigned this Jun 27, 2016
@xiang90 xiang90 modified the milestones: v3.1.0, unplanned Jun 27, 2016
@xiang90
Copy link
Contributor

xiang90 commented Sep 27, 2016

#6212 fixes this. QGET in v3 does not write to disk anymore.

@xiang90 xiang90 closed this as completed Sep 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants