Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MonotonicMatchIndexProp to TLA+ spec. #5324

Merged
merged 9 commits into from Jun 16, 2023

Conversation

lemmy
Copy link
Collaborator

@lemmy lemmy commented Jun 2, 2023

As @jumaffre pointed out, figure 2 on page 4 of the raft paper provides the following property:

Volatile state on leaders, reinitialized after election. For each server, index of the highest log entry known to be replicated on server. Initialized to 0, increases monotonically.

@lemmy lemmy added the tla TLA+ specifications label Jun 2, 2023
@lemmy lemmy added this to the TLA+ Trace Validation milestone Jun 2, 2023
@lemmy lemmy requested a review from heidihoward June 2, 2023 00:24
@lemmy
Copy link
Collaborator Author

lemmy commented Jun 2, 2023

@heidihoward The current spec allows matchIndex to go backwards (aside from when a node becomes leader) upon the receipt of a ACK AE response, which violates the new property.

(Spec versions as of at least bd99109 also violate the prop)

/cc @jumaffre @eddyashton

@ccf-bot
Copy link
Collaborator

ccf-bot commented Jun 2, 2023

mku-gh5057-pr5325@72004 aka 20230616.78 vs main ewma over 20 builds from 71540 to 71993

Click to see table

main

build_id build_number Commit latency factor ls_sgx_cft^ ls_sgx_cft_mem pi_ls_sgx_cft^ pi_ls_sgx_cft_mem pi_basic_sgx_cft^ pi_basic_sgx_cft_mem ls_jwt_sgx_cft^ ls_jwt_sgx_cft_mem pi_ls_jwt_sgx_cft^ pi_ls_jwt_sgx_cft_mem ls_js_sgx_cft^ ls_js_sgx_cft_mem ls_full_js_sgx_cft^ ls_full_js_sgx_cft_mem ls_js_jwt_sgx_cft^ ls_js_jwt_sgx_cft_mem hist_sgx_cft^ RB put (/s)^ CHAMP put (/s)^ RB get (/s)^ CHAMP get (/s)^ ls_virtual_cft^ pi_ls_virtual_cft^ pi_basic_virtual_cft^ ls_jwt_virtual_cft^ pi_ls_jwt_virtual_cft^ ls_js_virtual_cft^ ls_full_js_virtual_cft^ ls_js_jwt_virtual_cft^
71540 20230612.2 0.800894 20166.5 1.88908e+07 20478.2 1.25993e+07 23007.6 1.25993e+07 6715.98 1.67936e+07 6559.2 6.30784e+06 1718.8 1.05021e+07 1420.17 1.05021e+07 1405.36 1.05021e+07 48729.1 837423 1.17789e+06 8.15303e+06 3.07235e+07 45867.2 47831.4 57943.1 12546.1 12854.1 4457.17 3561.99 3258.32
71552 20230612.7 0.756965 19858.7 1.88908e+07 20231.9 1.25993e+07 22498.7 1.25993e+07 6748.57 1.67936e+07 6512.3 6.30784e+06 1718.19 1.05021e+07 1419 1.05021e+07 1403 1.05021e+07 44126.9 842869 1.179e+06 8.15063e+06 3.0855e+07 43795.5 46769.6 56069.8 12411.3 12814.9 4241.81 3502.42 3249.53
71569 20230612.11 0.789328 19942.2 1.88908e+07 20227.7 1.25993e+07 22593.8 1.25993e+07 6687.12 1.67936e+07 6565.7 6.30784e+06 1731.89 1.05021e+07 1424.73 1.05021e+07 1408.53 1.05021e+07 48738.9 827063 1.17665e+06 8.13564e+06 3.03053e+07 45863.4 46756.9 55851 12433.8 13126.3 4434.08 3564.39 3236.64
71587 20230613.1 0.766729 20004.4 1.88908e+07 20247.4 1.25993e+07 22498.9 1.25993e+07 6717.34 1.88908e+07 6514.5 6.30784e+06 1715.41 1.05021e+07 1420.39 1.05021e+07 1417.54 1.05021e+07 46359.1 827840 1.17878e+06 8.15634e+06 3.06238e+07 44183.1 46703.9 55522.8 12449.9 13013.4 4424.25 3476.38 3278.32
71619 20230614.1 0.748378 19904.3 1.88908e+07 20165.8 1.25993e+07 22580.7 1.25993e+07 6697.99 1.67936e+07 6512.2 6.30784e+06 1717.25 1.05021e+07 1418.88 1.05021e+07 1410.43 1.05021e+07 48519.8 838973 1.1814e+06 8.15559e+06 3.08234e+07 45659 47020.6 55046.3 12460.4 12817.4 4447.15 3575.93 3257.94
71651 20230614.13 0.808563 19892.4 1.88908e+07 20065.6 1.25993e+07 22529.9 1.25993e+07 6698.44 1.67936e+07 6515.3 6.30784e+06 1714.7 1.05021e+07 1422.13 1.05021e+07 1404.9 1.05021e+07 46597.8 818535 1.18199e+06 8.15147e+06 3.08239e+07 45621.6 46770.5 56116.1 12457.5 12815.3 4393.8 3445.99 3245.97
71667 20230615.4 0.748075 19971 1.88908e+07 20210.5 1.25993e+07 22540.4 1.25993e+07 6363.6 1.67936e+07 6518.1 6.30784e+06 1732.1 1.05021e+07 1414.93 1.05021e+07 1419.47 1.05021e+07 45692.3 835109 1.17787e+06 8.16932e+06 3.07e+07 43531.3 47012.6 55902.4 12633.4 11554.9 4447.22 3541.59 3376.47
71715 20230615.20 0.76549 19824 1.88908e+07 20191 1.25993e+07 22230.8 1.25993e+07 6366.36 1.67936e+07 6472.9 6.30784e+06 1705.47 1.05021e+07 1413.14 1.05021e+07 1394.17 1.05021e+07 39665.2 833667 1.18114e+06 8.15076e+06 3.2477e+07 43588.5 46052 56185.4 12509.4 12980.3 4432.27 3485.33 3291.33
71737 20230615.27 0.783566 19950.4 1.88908e+07 20126.3 1.25993e+07 22322.2 1.25993e+07 6348.9 1.67936e+07 6420.4 6.30784e+06 1707.38 1.05021e+07 1412.41 1.05021e+07 1392.56 1.05021e+07 45285 826623 1.1781e+06 8.15436e+06 3.08114e+07 45762.4 47097.9 56406.7 12463.1 12837.3 4445.37 3568.76 3254.11
71774 20230615.40 0.781603 19887.5 1.88908e+07 20097.3 1.25993e+07 22381.8 1.25993e+07 6312.55 1.67936e+07 6428.1 6.30784e+06 1705.84 1.05021e+07 1410.02 1.05021e+07 1396.23 1.05021e+07 45975.2 822905 1.17449e+06 8.17516e+06 3.07614e+07 43554.9 46699.9 54912.5 12540.9 13236.8 4414.27 3549.33 3365.45
71801 20230616.7 0.778253 19976.9 1.88908e+07 20164.2 1.25993e+07 22474.5 1.25993e+07 6711.65 1.67936e+07 6512.5 6.30784e+06 1702.98 1.05021e+07 1415.9 1.05021e+07 1396.3 1.05021e+07 44057.8 830579 1.18547e+06 8.15228e+06 3.06734e+07 45911.5 47440.8 56339.9 12452.3 12764.6 4472.44 3487.67 3264.79
71824 20230616.16 0.79349 19940.5 1.88908e+07 20063.3 1.25993e+07 22467.9 1.25993e+07 6416.86 1.67936e+07 6474.5 6.30784e+06 1706.67 1.05021e+07 1413.44 1.05021e+07 1395.45 1.05021e+07 50862.1 836163 1.18062e+06 8.15043e+06 3.08508e+07 45720.6 47664.2 56440.7 12519.1 13015.6 4417.03 3553.73 3347.62
71854 20230616.26 0.788362 19782.2 1.88908e+07 19965.5 1.25993e+07 22318.2 1.25993e+07 6314.7 1.67936e+07 6418.4 6.30784e+06 1705.85 1.05021e+07 1397.09 1.05021e+07 1376.86 1.05021e+07 49624.2 835974 1.18021e+06 8.17307e+06 3.0655e+07 45473.5 47304.2 54897.8 12423.4 13149.2 4446.25 3489.6 3278.87
71866 20230616.32 0.797912 19957.5 1.88908e+07 20280 1.25993e+07 22496.4 1.25993e+07 6362.61 1.67936e+07 6518.2 6.30784e+06 1705.28 1.05021e+07 1418.4 1.05021e+07 1396.82 1.05021e+07 51060.5 837923 1.18206e+06 8.15407e+06 3.08336e+07 45753.4 47880.1 56536.5 12480.4 13156.9 4435.35 3471.63 3230.82
71882 20230616.37 0.804001 19901.7 1.88908e+07 20221.6 1.25993e+07 22495.6 1.25993e+07 6339.77 1.67936e+07 6502.7 6.30784e+06 1714.8 1.05021e+07 1420.25 1.05021e+07 1404.56 1.05021e+07 46240.3 823708 1.1823e+06 8.02203e+06 2.9707e+07 45896.5 47260.1 55101 12483.5 13045.6 4444.29 3509.7 3272.1
71894 20230616.40 0.775622 19996.4 1.88908e+07 20153.5 1.25993e+07 22472.9 1.25993e+07 6692.61 1.67936e+07 6520.5 6.30784e+06 1711.05 1.05021e+07 1416.66 1.05021e+07 1419.5 1.05021e+07 44319.1 835470 1.18092e+06 8.15446e+06 3.11834e+07 45448.4 47538.9 55289.3 12520.9 12869 4449.46 3483.83 3289.5
71911 20230616.47 0.796597 20078.8 1.88908e+07 20152.5 1.25993e+07 22547.1 1.25993e+07 6716.43 1.67936e+07 6514.4 6.30784e+06 1714.82 1.05021e+07 1421.71 1.05021e+07 1407.41 1.05021e+07 48096.5 826761 1.17907e+06 8.15403e+06 3.08369e+07 45880.4 47624 55435.1 12455.2 12659.2 4444.28 3481.3 3268.98
71937 20230616.57 0.802376 20057.2 1.88908e+07 20321.5 1.05021e+07 22667 1.25993e+07 6723.7 1.67936e+07 6563.9 6.30784e+06 1731.14 1.05021e+07 1429.32 1.05021e+07 1404.23 1.05021e+07 48023.7 835349 1.18026e+06 8.15605e+06 3.07739e+07 45644.9 47685.6 56032.8 12504.7 12879 4315.79 3487.07 3264.58
71974 20230616.70 0.78228 19962.9 1.88908e+07 20105.2 1.25993e+07 22623.1 1.25993e+07 6673.93 1.67936e+07 6505.6 6.30784e+06 1709.24 1.05021e+07 1421.44 1.05021e+07 1399.66 1.05021e+07 45235 838973 1.1805e+06 8.17366e+06 3.08039e+07 45676.6 46989.2 54852 12436.9 12864.7 4443.9 3570.29 3259.11
71993 20230616.76 0.819908 19957.1 1.88908e+07 20151.1 1.25993e+07 22492.2 1.25993e+07 6314.79 1.88908e+07 6476.2 6.30784e+06 1710.94 1.05021e+07 1413.71 1.05021e+07 1399.15 1.05021e+07 47537.1 826899 1.17712e+06 8.15787e+06 3.08806e+07 45499.6 34847.2 54574.7 12371.2 13103.1 4462.95 3513.5 3277.12

mku-gh5057-pr5325

build_id build_number Commit latency factor ls_virtual_cft^ pi_ls_virtual_cft^ pi_basic_virtual_cft^ ls_jwt_virtual_cft^ pi_ls_jwt_virtual_cft^ ls_js_virtual_cft^ ls_full_js_virtual_cft^ ls_js_jwt_virtual_cft^ hist_sgx_cft^ ls_sgx_cft^ ls_sgx_cft_mem pi_ls_sgx_cft^ pi_ls_sgx_cft_mem pi_basic_sgx_cft^ pi_basic_sgx_cft_mem ls_jwt_sgx_cft^ ls_jwt_sgx_cft_mem pi_ls_jwt_sgx_cft^ pi_ls_jwt_sgx_cft_mem ls_js_sgx_cft^ ls_js_sgx_cft_mem ls_full_js_sgx_cft^ ls_full_js_sgx_cft_mem ls_js_jwt_sgx_cft^ ls_js_jwt_sgx_cft_mem RB put (/s)^ CHAMP put (/s)^ RB get (/s)^ CHAMP get (/s)^
71823 20230616.15 0.812572 45562.3 47812.1 55005.8 12435.6 13058.1 4443.16 3494.72 3275.49 50283.2 19989 1.88908e+07 20096.9 1.25993e+07 22471.9 1.25993e+07 6355.7 1.67936e+07 6475.5 6.30784e+06 1707.68 1.05021e+07 1415.5 1.05021e+07 1400.5 1.05021e+07 832805 1.18443e+06 8.15491e+06 3.08299e+07
71932 20230616.54 0.817343 45621.8 46919.9 54662.2 12485.4 12569.8 4457.06 3501.13 3268.64 43630.5 19977.6 1.88908e+07 20184.8 1.25993e+07 22628.9 1.25993e+07 6731.32 1.67936e+07 6519.1 6.30784e+06 1731.08 1.05021e+07 1420.06 1.05021e+07 1399.94 1.05021e+07 832331 1.17642e+06 8.02369e+06 3.13543e+07
71984 20230616.73 0.808725 44055 47636.3 55024.2 12381.3 13027.8 4469.39 3495.87 3296.99 46212.9 19841.4 1.88908e+07 20224.9 1.25993e+07 22528.5 1.25993e+07 6738.61 1.67936e+07 6508.1 6.30784e+06 1715.35 1.05021e+07 1413.92 1.05021e+07 1395.66 1.05021e+07 832405 1.17705e+06 8.17457e+06 3.08401e+07
72004 20230616.78 0.769691 45626.5 47428.9 54664.5 12386.8 13057.9 4446.17 3575.04 3255.02 50164.7 19834.6 1.88908e+07 20101 1.25993e+07 22346.1 1.25993e+07 6342.13 1.67936e+07 6479.7 6.30784e+06 1705.39 1.05021e+07 1412.4 1.05021e+07 1394.17 1.05021e+07 831566 1.17844e+06 8.15205e+06 3.08666e+07

images

@eddyashton
Copy link
Member

eddyashton commented Jun 2, 2023

I've taken a look at the implementation's use of match_idx, and can confirm that it goes backwards in both the raft_test unit test, and the suffix_collision scenarios. However I think this is just due to variable reuse, rather than a violation of Raft's concept of a matchIndex. All the instances I've found are when processing an AppendEntriesResponse with a false body (a NACK), rather than AppendEntriesResponse containing true (an ACK).

The place where we're decrementing match_idx, while processing a NACK, is implementing this optimisation described in the Raft paper (end of page 7):

If desired, the protocol can be optimized to reduce the
number of rejected AppendEntries RPCs. For example,
when rejecting an AppendEntries request, the follower
can include the term of the conflicting entry and the first
index it stores for that term. With this information, the
leader can decrement nextIndex to bypass all of the con-
flicting entries in that term; one AppendEntries RPC will
be required for each term with conflicting entries, rather
than one RPC per entry. In practice, we doubt this opti-
mization is necessary, since failures happen infrequently
and it is unlikely that there will be many inconsistent en-
tries.

There's a more detailed description of this optimisation in a comment in the etcd Raft implementation.

We currently use the same match_idx variable for this probing process (with a best-guess value that may decrement), and for normal operation (where it matches the matchIndex field described in the paper). I think we should be able to separate those variables in the implementation, introducing a new variable during the probing process, and only moving that to match_idx once its confirmed to be safe (computed from an ACK, rather than a NACK). That should then produce a validatable trace that never decrements match_idx.

Edited to add: I'm guessing the reason this then produces a QuorumLogInv is if the spec allows commit to be advanced based on the value of this probe index. In the implementation, afaict, while this value is non-authoritative the primary will never use it (incorrectly and unsafely!) to advance commit_idx. It certainly doesn't inline with receiving the NACK, but potentially it would when we receive an unrelated message later? I'll see if I can construct a scenario to confirm the behaviour there, and try to separate the variables.

@eddyashton
Copy link
Member

Have a scenario demonstrating that reuse of this match_idx variable during NACKs can cause commit to be unsafely advanced, which I think is equivalent to the QuorumLogInv violations. See #5325.

@lemmy
Copy link
Collaborator Author

lemmy commented Jun 6, 2023

Property MonotonicMatchIndexProp suggests the following additional invariant:

NextIdxMatchIdxInv ==
   \* Why would a leader ever send entries in an AppendEntriesRequests that the leader assume the follower to already have?!
    \A i,j \in Servers :
        nextIndex[i][j] >= matchIndex[i][j]

However, the invariant NextIdxMatchIdxInv only holds if we take the max and not the min when the leader handles a NACK:

diff --git a/tla/ccfraft.tla b/tla/ccfraft.tla
index 987c64ea1..a0c5ea7b4 100644
--- a/tla/ccfraft.tla
+++ b/tla/ccfraft.tla
@@ -1003,7 +1003,7 @@ HandleAppendEntriesResponse(i, j, m) ==
        \/ /\ \lnot m.success \* not successful
           /\ LET tm == FindHighestPossibleMatch(log[i], m.lastLogIndex, m.term)
              IN nextIndex' = [nextIndex EXCEPT ![i][j] =
-                               (IF matchIndex[i][j] = 0 THEN tm ELSE min(tm, matchIndex[i][j])) + 1 ]
+                               (IF matchIndex[i][j] = 0 THEN tm ELSE max(tm, matchIndex[i][j])) + 1 ]
           \* UNCHANGED matchIndex is implied by the following statement in figure 2, page 4 in the raft paper:
            \* "If AppendEntries fails because of log inconsistency: decrement nextIndex and retry"
           /\ UNCHANGED matchIndex

This change has the potential of introducing a liveness bug, but we do not check liveness as of now.

tla/ccfraft.tla Outdated Show resolved Hide resolved
@achamayou achamayou enabled auto-merge (rebase) June 16, 2023 18:20
@achamayou achamayou merged commit 7605eaa into microsoft:main Jun 16, 2023
19 checks passed
@lemmy lemmy deleted the mku-gh5057-pr5325 branch June 16, 2023 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tla TLA+ specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants