Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update reconfiguration logic in TLA+ specification #5173

Merged
merged 31 commits into from
Apr 20, 2023

Conversation

heidihoward
Copy link
Member

@heidihoward heidihoward commented Apr 13, 2023

Following up on #5172 and #5168, this PR attempts to clarify the purpose of configurations and ensure that configurations is correctly updated each time that log or commitIndex is updated

@heidihoward heidihoward added the tla TLA+ specifications label Apr 13, 2023
@heidihoward heidihoward changed the title Tla conf consistency Update LogConfigurationConsistentInv to check active configuration is consistent with log Apr 13, 2023
@heidihoward heidihoward changed the title Update LogConfigurationConsistentInv to check active configuration is consistent with log Update reconfiguration logic in TLA+ specification Apr 17, 2023
@heidihoward heidihoward marked this pull request as ready for review April 17, 2023 15:13
@heidihoward heidihoward requested a review from a team as a code owner April 17, 2023 15:13
@heidihoward heidihoward requested a review from lemmy April 17, 2023 15:13
tla/ccfraft.tla Show resolved Hide resolved
@lemmy
Copy link
Contributor

lemmy commented Apr 17, 2023

The reconfiguration-related raft_scenarios such as tests/raft_scenarios/election_while_reconfiguration initialize configurations[n] for some n \in Servers to equal {}. If this is just an artifact of the raft_scenarios, we can work around it in thee trace validation spec. If not, I suggest correcting InitReconfigurationVars:

CCF/tla/ccfraft.tla

Lines 361 to 367 in 64c2df3

\* Note that CCF has a bootstrapping procedure to start a new network and to join new nodes to the network (see
\* https://microsoft.github.io/CCF/main/operations/start_network.html). In both cases, a node has the current (see
\* https://microsoft.github.io/CCF/main/operations/ledger_snapshot.html#join-or-recover-from-snapshot) or some stale configuration
\* such as the initial configuration. A node's configuration is *never* "empty", i.e., the equivalent of configuration[node] = {} here.
\* For simplicity, the set of servers/nodes all have the same initial configuration at startup.
/\ \E c \in SUBSET Servers \ {{}}:
configurations = [i \in Servers |-> [ j \in {0} |-> c ] ]

@heidihoward
Copy link
Member Author

The reconfiguration-related raft_scenarios such as tests/raft_scenarios/election_while_reconfiguration initialize configurations[n] for some n \in Servers to equal {}. If this is just an artifact of the raft_scenarios, we can work around it in thee trace validation spec. If not, I suggest correcting InitReconfigurationVars:

@achamayou @eddyashton will know better than me but I think the first tx in the log is always a reconfigurations tx to the initial configuration. A node has an empty configuration iff the log is empty

@heidihoward
Copy link
Member Author

heidihoward commented Apr 19, 2023

So, this is the current initial state

InitReconfigurationVars ==
    /\ reconfigurationCount = 0
    /\ removedFromConfiguration = {}
    /\ \E c \in SUBSET Servers \ {{}}:
        configurations = [i \in Servers |-> [ j \in {0} |-> c ] ]

InitMessagesVars ==
    /\ messages = {}
    /\ messagesSent = [i \in Servers |-> [j \in Servers |-> << >>] ]
    /\ commitsNotified = [i \in Servers |-> <<0,0>>] \* i.e., <<index, times of notification>>

InitServerVars ==
    /\ currentTerm = [i \in Servers |-> 1]
    /\ state       = [i \in Servers |-> IF i \in configurations[i][0] THEN Follower ELSE Pending]
    /\ votedFor    = [i \in Servers |-> Nil]

InitCandidateVars ==
    /\ votesGranted   = [i \in Servers |-> {}]
    /\ votesRequested = [i \in Servers |-> [j \in Servers |-> 0]]

InitLeaderVars ==
    /\ nextIndex  = [i \in Servers |-> [j \in Servers |-> 1]]
    /\ matchIndex = [i \in Servers |-> [j \in Servers |-> 0]]

InitLogVars ==
    /\ log          = [i \in Servers |-> << >>]
    /\ commitIndex  = [i \in Servers |-> 0]
    /\ clientRequests = 1
    /\ committedLog = [ node |-> NodeOne, index |-> 0]

Init ==
    /\ InitReconfigurationVars
    /\ InitMessagesVars
    /\ InitServerVars
    /\ InitCandidateVars
    /\ InitLeaderVars
    /\ InitLogVars

How about something like this instead? Note that the logs on followers start with a committed reconfiguration tx

InitReconfigurationVars ==
    /\ reconfigurationCount = 0
    /\ removedFromConfiguration = {}
    /\ \E c \in SUBSET Servers \ {{}}:
        configurations = [i \in Servers |-> IF i \in c THEN [ j \in {1} |-> c ] ELSE <<>> ]

InitMessagesVars ==
    /\ messages = {}
    /\ messagesSent = [i \in Servers |-> [j \in Servers |-> << >>] ]
    /\ commitsNotified = [i \in Servers |-> <<0,0>>] \* i.e., <<index, times of notification>>

InitServerVars ==
    /\ currentTerm = [i \in Servers |-> 1]
    /\ state       = [i \in Servers |-> IF i \in configurations[i][1] THEN Follower ELSE Pending]
    /\ votedFor    = [i \in Servers |-> Nil]

InitCandidateVars ==
    /\ votesGranted   = [i \in Servers |-> {}]
    /\ votesRequested = [i \in Servers |-> [j \in Servers |-> 0]]

InitLeaderVars ==
    /\ nextIndex  = [i \in Servers |-> [j \in Servers |-> 1]]
    /\ matchIndex = [i \in Servers |-> [j \in Servers |-> 0]]

InitLogVars ==
    /\ log = [i \in Servers |-> IF i \in configurations[i][1] 
       THEN << [ contentType |-> ReconfigurationType, term |-> 0, value |-> configurations[i][1] ] >>
       ELSE << >>
    /\ commitIndex  = [i \in Servers |-> IF i \in configurations[i][1] THEN 1 ELSE 0]
    /\ clientRequests = 1
    /\ committedLog = [ node |-> NodeOne, index |-> 0]

Init ==
    /\ InitReconfigurationVars
    /\ InitMessagesVars
    /\ InitServerVars
    /\ InitCandidateVars
    /\ InitLeaderVars
    /\ InitLogVars

@lemmy
Copy link
Contributor

lemmy commented Apr 19, 2023

@heidihoward Are you aware that a06c8d2 (part of #5187) defines currentTerm to equal zero in the initial states?

@heidihoward
Copy link
Member Author

@heidihoward Are you aware that a06c8d2 (part of #5187) defines currentTerm to equal zero in the initial states?

Yes, I'm happy with that change. I think these two changes are (somewhat) orthogonal

@lemmy
Copy link
Contributor

lemmy commented Apr 19, 2023

The raft_scenarios confirms that InitReconfigurationVars needs to be modified to model a node having no configuration initially; observe the "configurations":[]for a follower node on line 9 line below. A minor point, are you planning to modify configurations to be a function of sequences rather than a function of functions?

{"h_ts":"2","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"494","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0}],"event":{"component":"raft","function":"add_configuration"},"leadership":"none","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":0,"last_idx":0,"my_node_id":"0"}}}
{"h_ts":"5","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1809","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0}],"event":{"component":"raft","function":"become_candidate"},"leadership":"Candidate","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"}}}
{"h_ts":"9","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1877","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0}],"event":{"component":"raft","function":"become_leader"},"leadership":"Leader","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"}}}
{"h_ts":"14","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"494","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0},{"idx":1,"nodes":{"0":{"address":":"},"1":{"address":":"}},"rid":1}],"event":{"component":"raft","function":"add_configuration"},"leadership":"Leader","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"}}}
{"h_ts":"16","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1034","msg":{"event":{"component":"raft","function":"send_authenticated"},"leadership":"Leader","membership":"Active","node":"0","paket":{"contains_new_view":false,"idx":0,"leader_commit_idx":0,"msg":0,"prev_idx":0,"prev_term":0,"term":1,"term_of_idx":0},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"},"to":"1","type":1}}
{"h_ts":"18","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/test/driver.h","number":"62","msg":{"data":"eyJkYXRhIjoiZXlJd0lqcDdJbUZrWkhKbGMzTWlPaUk2SW4wc0lqRWlPbnNpWVdSa2NtVnpjeUk2SWpvaWZYMD0iLCJ0eXBlIjoicmVjb25maWd1cmF0aW9uIn0=","event":{"component":"ledger","function":"append"},"index":1,"node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":0,"last_idx":0,"my_node_id":"0"},"term":1}}
{"h_ts":"20","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1034","msg":{"event":{"component":"raft","function":"send_authenticated"},"leadership":"Leader","membership":"Active","node":"0","paket":{"contains_new_view":false,"idx":1,"leader_commit_idx":0,"msg":0,"prev_idx":0,"prev_term":0,"term":1,"term_of_idx":1},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":1,"my_node_id":"0"},"to":"1","type":1}}
{"h_ts":"21","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"703","msg":{"event":{"component":"raft","function":"recv_append_entries"},"from":"0","leadership":"none","membership":"Active","node":"1","paket":{"contains_new_view":false,"idx":0,"leader_commit_idx":0,"msg":0,"prev_idx":0,"prev_term":0,"term":1,"term_of_idx":0},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":0,"last_idx":0,"my_node_id":"1"},"type":0}}
{"h_ts":"27","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1943","msg":{"configurations":[],"event":{"component":"raft","function":"become_follower"},"leadership":"Follower","membership":"Active","node":"1","oldleadership":"none","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"}}}
{"h_ts":"33","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1426","msg":{"event":{"component":"raft","function":"send_append_entries_response"},"leadership":"Follower","membership":"Active","node":"1","paket":{"last_log_idx":0,"msg":1,"success":0,"term":1},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"},"to":"0","type":1}}
{"h_ts":"34","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"703","msg":{"event":{"component":"raft","function":"recv_append_entries"},"from":"0","leadership":"Follower","membership":"Active","node":"1","paket":{"contains_new_view":false,"idx":1,"leader_commit_idx":0,"msg":0,"prev_idx":0,"prev_term":0,"term":1,"term_of_idx":1},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"},"type":0}}
{"h_ts":"39","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"494","msg":{"configurations":[{"idx":1,"nodes":{"0":{"address":":"},"1":{"address":":"}},"rid":1}],"event":{"component":"raft","function":"add_configuration"},"leadership":"Follower","membership":"Active","node":"1","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":1,"my_node_id":"1"}}}
{"h_ts":"41","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/test/driver.h","number":"62","msg":{"data":"eyJkYXRhIjoiZXlJd0lqcDdJbUZrWkhKbGMzTWlPaUk2SW4wc0lqRWlPbnNpWVdSa2NtVnpjeUk2SWpvaWZYMD0iLCJ0eXBlIjoicmVjb25maWd1cmF0aW9uIn0=","event":{"component":"ledger","function":"append"},"index":1,"node":"1","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":0,"last_idx":0,"my_node_id":"1"},"term":1}}
...

@lemmy
Copy link
Contributor

lemmy commented Apr 19, 2023

How about something like this instead? Note that the logs on followers start with a committed reconfiguration tx

InitReconfigurationVars ==
    /\ reconfigurationCount = 0
    /\ removedFromConfiguration = {}
    /\ \E c \in SUBSET Servers \ {{}}:
        configurations = [i \in Servers |-> IF i \in c THEN [ j \in {1} |-> c ] ELSE <<>> ]

InitLogVars ==
    /\ log = [i \in Servers |-> IF i \in configurations[i][1] 
       THEN << [ contentType |-> ReconfigurationType, term |-> 0, value |-> configurations[i][1] ] >>
       ELSE << >>
    /\ commitIndex  = [i \in Servers |-> IF i \in configurations[i][1] THEN 1 ELSE 0]
    /\ clientRequests = 1
    /\ committedLog = [ node |-> NodeOne, index |-> 0]
  • The invariant LogInv will have to change because it would be violated => Initialize committedLog appropriately
  • The invariant LogTypeOK will have to change because it doesn't allow a log entry's term to be 0
  • The BecomeLeader sub-action will truncate the leader's log because MaxCommitableIndex is 0 => Also append a TypeSignature to the log?
InitReconfigurationVars ==
    /\ reconfigurationCount = 0
    /\ removedFromConfiguration = {}
    /\ \E c \in SUBSET Servers \ {{}}:
        configurations = [i \in Servers |-> [ j \in {0} |-> IF i \in c THEN c ELSE {} ] ]

InitLogVars ==
    /\ log = [i \in Servers |-> IF i \in configurations[i][0] 
       THEN << [ contentType |-> TypeReconfiguration, term |-> 0, value |-> configurations[i][0] ], 
               [ contentType |-> TypeSignature, term |-> 0, value |-> Nil ] >>
       ELSE << >>]
    /\ commitIndex  = [i \in Servers |-> Len(log[i])]
    /\ clientRequests = 1
    /\ committedLog = LET someNode == CHOOSE i \in Servers: i \in configurations[i][0] \* We could define  committedLog  non-deterministically for each  i \in Servers:  where the predicate is true, but there is no point.
                      IN [ node |-> someNode, index |-> Len(log[someNode]) ]

@heidihoward
Copy link
Member Author

Sorry my TLA+ was quite sloppy (pseudo-tla+ if you will). I am trying to see if this is right approach to permitting empty configurations before making the full set of changes to the spec.

A minor point, are you planning to modify configurations to be a function of sequences rather than a function of functions?

Nope, sloppiness on my part

  • The invariant LogInv will have to change because it would be violated => Initialize committedLog appropriately

I think committedLog can be removed. I believe its function was to store the max commit index but now we have a monotonic commit index so this shouldn't be needed.

  • The invariant LogTypeOK will have to change because it doesn't allow a log entry's term to be 0

Yes, there's probably some other invariants that will be to changed too.

  • The BecomeLeader sub-action will truncate the leader's log because MaxCommitableIndex is 0 =>

Yes, will need fixing

Also append a TypeSignature to the log?

So my understanding is that in the implementation the first log entry is magically committable without a signature (@achamayou can probably confirm). The two options here are to also append a signature as it keeps the invariants in the spec simple or to relax some the invariants to allow the first entry to committed without a signature.

Aside, @lemmy is it ok to merge this PR, leaving the configuration initialization unchanged and start a new PR to handle this or does this PR as is interfere with the trace validation?

@lemmy
Copy link
Contributor

lemmy commented Apr 20, 2023

Aside, @lemmy is it ok to merge this PR, leaving the configuration initialization unchanged and start a new PR to handle this or does this PR as is interfere with the trace validation?

LGTM

@heidihoward heidihoward merged commit 2e7c823 into microsoft:main Apr 20, 2023
@heidihoward heidihoward deleted the tla-conf-consistency branch April 20, 2023 15:03
@lemmy
Copy link
Contributor

lemmy commented Apr 20, 2023

Leaving this here until the new issue gets created...

The existing trace validation fails if we initialize the log with the first config (or first config + sig), because the first AppendEntriesResponse will no longer match the real messages' prev_idx. In other words, the implementation's first AppendEntriesRequest has prev_idx = 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tla TLA+ specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants