Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate etcd linearizability #14398

Merged
merged 1 commit into from
Oct 23, 2022
Merged

Conversation

serathius
Copy link
Member

@serathius serathius commented Aug 29, 2022

First draft of linearizability tests that are able to reproduce #14370 within 20 seconds with 80% accuracy. Part of #14045

This approach uses a generic way to of verifing linearizability. In this proof of concept to reproduce #14370, however for full solution scenarios should be generated randomly based on preexisting fail points.

@serathius
Copy link
Member Author

cc @ptabor @ahrtr @spzala

@serathius serathius force-pushed the linearizability branch 5 times, most recently from c5a30cb to 7615605 Compare August 30, 2022 07:53
@serathius serathius force-pushed the linearizability branch 3 times, most recently from 16d36e0 to 320ef73 Compare September 22, 2022 13:03
@serathius serathius added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 22, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2022

Codecov Report

Merging #14398 (069e26e) into main (e24402d) will decrease coverage by 0.26%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main   #14398      +/-   ##
==========================================
- Coverage   75.70%   75.44%   -0.27%     
==========================================
  Files         457      457              
  Lines       37300    37269      -31     
==========================================
- Hits        28239    28116     -123     
- Misses       7309     7380      +71     
- Partials     1752     1773      +21     
Flag Coverage Δ
all 75.44% <0.00%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/expect/expect.go 72.64% <0.00%> (-1.92%) ⬇️
client/v3/namespace/watch.go 87.87% <0.00%> (-6.07%) ⬇️
raft/rafttest/node.go 95.00% <0.00%> (-5.00%) ⬇️
client/v3/concurrency/session.go 88.63% <0.00%> (-4.55%) ⬇️
client/pkg/v3/testutil/leak.go 62.83% <0.00%> (-4.43%) ⬇️
server/proxy/grpcproxy/watch.go 92.48% <0.00%> (-4.05%) ⬇️
server/storage/mvcc/watchable_store.go 85.14% <0.00%> (-3.99%) ⬇️
server/etcdserver/api/v3rpc/member.go 93.54% <0.00%> (-3.23%) ⬇️
server/etcdserver/cluster_util.go 70.35% <0.00%> (-3.17%) ⬇️
server/etcdserver/api/v3rpc/interceptor.go 74.47% <0.00%> (-3.13%) ⬇️
... and 20 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@serathius serathius force-pushed the linearizability branch 4 times, most recently from ac43a74 to 697a691 Compare September 30, 2022 19:21
@serathius serathius force-pushed the linearizability branch 9 times, most recently from 732a65c to 1894704 Compare October 13, 2022 15:21
@serathius serathius changed the title [Draft] tests: Validate etcd linearizability Validate etcd linearizability Oct 14, 2022
@serathius
Copy link
Member Author

@ahrtr @spzala I think this is ready to review. This PR provides first scenario to new framework of linerazibility test. Using this approach I was able to reproduce both data durability and data inconsistency issues. Further work should add more diverse scenarios to cover broader types of failures and allow dynamic scenarios based on random set of actions.

Generating random scenarios should allow us to cover much broader space and should be our main goal. Current approach based on prepacked scenarios is limited to testing historical or simple scenarios. By testing a generic system property like linearizability we are no longer limited by simple got == expect validation. We need to start proactively finding issues instead of just responding to them post.

Next steps:

@serathius serathius force-pushed the linearizability branch 11 times, most recently from cdc5957 to 09909f5 Compare October 20, 2022 21:47
go.sum Outdated Show resolved Hide resolved
tests/linearizability/linearizability_test.go Outdated Show resolved Hide resolved
tests/framework/e2e/etcd_process.go Outdated Show resolved Hide resolved
tests/framework/e2e/etcd_process.go Show resolved Hide resolved
tests/linearizability/linearizability_test.go Show resolved Hide resolved
tests/linearizability/linearizability_test.go Outdated Show resolved Hide resolved
@serathius serathius force-pushed the linearizability branch 5 times, most recently from 0ce584a to b26705d Compare October 21, 2022 10:39
Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serathius great work! My only concern is that thegithub.com/anishathalye/porcupine seems developed/maintained by an individual, and there is not much to find out on dev history/PRs. But at the same time, we don't have other options and also that we are using it for test. So giving porcupine a try sounds okay. (Forking and maintaining our own copy may be too much of work specially when we don't have enough contributors so I guess we don't have that option.) I don't have any other comment besides I noticed a question from @ahrtr Thanks!

tests/linearizability/client.go Outdated Show resolved Hide resolved
tests/linearizability/client.go Outdated Show resolved Hide resolved
tests/linearizability/client.go Outdated Show resolved Hide resolved
)

var (
PutGetTraffic Traffic = putGetTraffic{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to define a global variable. We should either define a function something like NewTraffic or users use putGetTraffic{} directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to maintain a single clean place to that lists are available traffic and failpoints. I find it more clear than using structs and asking user to scan whole file where they are scattered.

tests/linearizability/linearizability_test.go Show resolved Hide resolved
tests/linearizability/failpoints.go Show resolved Hide resolved
tests/linearizability/linearizability_test.go Show resolved Hide resolved
},
{
name: "KillClusterOfSize3",
failpoint: KillFailpoint,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
failpoint: KillFailpoint,
failpoint: killFailpoint{},

traffic := trafficConfig{
minimalQPS: minimalQPS,
maximalQPS: maximalQPS,
clientCount: 8,
Copy link
Member

@ahrtr ahrtr Oct 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: please consider to make this configurable, such as adding a field clientCount into the struct tcs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struct tcs is not set in stone, we can always add it when we decide to parameterize tests based on it.

@ahrtr
Copy link
Member

ahrtr commented Oct 22, 2022

Overall looks good to me with some minor comments. Great work!

My only concern is that thegithub.com/anishathalye/porcupine seems developed/maintained by an individual, and there is not much to find out on dev history/PRs

Yes, I have the same concern, especially if we want to spend more resource/time on the linearizability test. We can discuss this separately.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

@serathius serathius merged commit e5790d2 into etcd-io:main Oct 23, 2022
@serathius serathius deleted the linearizability branch June 15, 2023 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/merge-when-tests-green
Development

Successfully merging this pull request may close these issues.

None yet

4 participants