Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour on setting TrailingLogs to 0 #86

Open
fahadullah opened this issue Feb 13, 2016 · 3 comments
Open

Behaviour on setting TrailingLogs to 0 #86

fahadullah opened this issue Feb 13, 2016 · 3 comments
Labels
Milestone

Comments

@fahadullah
Copy link

https://gist.github.com/fahadullah/e49a4578094f4efd7746

If TrailingLogs is set to 0 and raft node is restarted right after taking a snapshot, it tries to look up the last index but all entries were already compacted. TrailingLogs being 0 should be a valid config though a little inefficient under certain cases.

@slackpad slackpad added the bug label Mar 29, 2016
@armon armon mentioned this issue Mar 29, 2016
14 tasks
@slackpad slackpad added this to the v2 milestone Oct 7, 2016
@Lz-Gustavo
Copy link
Contributor

Is this still an observed issue on the current version?

Running the same test routine pointed by @fahadullah with some small modifications to adequate to the new API, I've reached the same "log not found" state. At first, I thought there could be a bug on InmemStore DeleteRange(), which is the implementation of LogStore interface used on unit tests. After some time, I've came back to the test definition and I think it's equivocated.

Please correct me if I'm wrong, but the expected log index after a successfully snapshot installation, and entire log truncation, shouldn't be zero? Not 102 as was currently defined.

@lalalalatt
Copy link

lalalalatt commented Aug 23, 2024

I encountered same issue on my own implementation.
In the current hashicorp/raft implementation, "log not found" may occur under unreliable network.
So I make some modifications in the appendEntries RPC handler.
Original:

raft/raft.go

Lines 1476 to 1502 in 185ae2e

if a.PrevLogEntry > 0 {
lastIdx, lastTerm := r.getLastEntry()
var prevLogTerm uint64
if a.PrevLogEntry == lastIdx {
prevLogTerm = lastTerm
} else {
var prevLog Log
if err := r.logs.GetLog(a.PrevLogEntry, &prevLog); err != nil {
r.logger.Warn("failed to get previous log",
"previous-index", a.PrevLogEntry,
"last-index", lastIdx,
"error", err)
resp.NoRetryBackoff = true
return
}
prevLogTerm = prevLog.Term
}
if a.PrevLogTerm != prevLogTerm {
r.logger.Warn("previous log term mis-match",
"ours", prevLogTerm,
"remote", a.PrevLogTerm)
resp.NoRetryBackoff = true
return
}
}

After:

if args.PrevLogIndex > 0 {
    lastIdx, lastTerm := rf.getLastEntry()
    lastSnapshotIdx, lastSnapshotTerm := rf.getSnapshot()
    
    var prevLogTerm uint64
    if args.PrevLogIndex == lastIdx {
	    prevLogTerm = lastTerm
    
	    if args.PrevLogTerm != prevLogTerm && args.PrevLogIndex == lastSnapshotIdx && args.PrevLogTerm == lastSnapshotTerm {
		    // We only compact the log range from `firstLogIndex` to `Min(lastLogIndex, lastSnapshotIndex - trailingLogs)`.
		    // If `trailLogs` is not 0, the log entry on `lastSnapshotIndex` is not included in the log range to be compacted.
		    // And sometimes the log entry on `lastSnapshotIndex` is the last log entry, but with a different term.
		    // So we do this additional check
		    prevLogTerm = lastSnapshotTerm
	    }
    
    } else if args.PrevLogIndex == lastSnapshotIdx && args.PrevLogTerm == lastSnapshotTerm {
	    // If there are remaining logs after the snapshot index, the `lastIdx` will be the last log index.
	    // Without this additional check, even if the log entry on lastSnapshotIdx == args.PrevLogIndex exists semantically;
	    // However, the log entry on index `lastSnapshotIdx` has been compacted, so we need to use the snapshot term as the term of compacted log on index `lastSnapshotIdx`.
	    prevLogTerm = lastSnapshotTerm
    } else {
	    var prevLog LogEntry
	    if err := rf.logs.GetLog(args.PrevLogIndex, &prevLog); err != nil {
		    rf.logger.Debug("failed to get previous log",
			    zap.Int("receiver-id", rf.me),
			    zap.Uint64("previous-index", args.PrevLogIndex),
			    zap.Uint64("last-index", lastIdx),
			    zap.Error(err))
		    reply.NoRetryBackoff = true
		    return
	    }
	    prevLogTerm = prevLog.Term
    }
    
    if args.PrevLogTerm != prevLogTerm {
	    rf.logger.Debug("previous log term mis-match",
		    zap.Int("sender-id", int(args.LeaderId)),
		    zap.Int("receiver-id", rf.me),
		    zap.Uint64("ours", prevLogTerm),
		    zap.Uint64("remote", args.PrevLogTerm))
	    reply.NoRetryBackoff = true
    
	    // Find the last log that matches the term
	    commitIdx := rf.getCommitIndex()
	    conflictTerm := prevLogTerm
	    var idx uint64
	    for idx = args.PrevLogIndex - 1; idx >= commitIdx; idx-- {
		    var log LogEntry
		    if err := rf.logs.GetLog(idx, &log); err != nil {
			    rf.logger.Debug("failed to get log",
				    zap.Int("receiver-id", rf.me),
				    zap.Uint64("index", idx),
				    zap.Error(err),
			    )
			    return
		    }
		    if log.Term != conflictTerm {
			    break
		    }
	    }
    
	    reply.LastLog = idx
    
	    return
    }
}

@lalalalatt
Copy link

lalalalatt commented Aug 23, 2024

@slackpad Sorry for disturbing you, I guess you are the maintainer of this repo.
Would like to hear your thoughts on this issue, Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants