Separate persisted responses without knowing their revision to prevent duplicating state during linearization #18214

serathius · 2024-06-22T19:41:39Z

/cc @siyuanfoundation @MadhavJivrajani @fuweid @fykaa @henrybear327

k8s-ci-robot · 2024-06-22T19:41:44Z

@serathius: GitHub didn't allow me to request PR reviews from the following users: fykaa.

Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @siyuanfoundation @MadhavJivrajani @fuweid @fykaa @henrybear327

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

serathius · 2024-06-24T07:28:52Z

I have idea for big rewrite of history patching, however it would require more tests. As this PR would also benefit from them I wait with it until we have tests.

gyuho · 2024-06-24T08:23:55Z

tests/robustness/model/deterministic.go

+// MaybeEtcdResponse extends EtcdResponse to include partial information about responses to a request.
+// Possible response state information:
+// * Normal response. Client observed response. Only EtcdResponse is set.
+// * Persisted. Client didn't observe response, but we know it was persisted by etcd. Only Persisted is set


Q. So, meaning, the PUT was finalized on the consensus + storage, but response wasn't received from a client point of view? Seems confusing, given the function above describeEtcdResponse strings this as "unknown"? Should we make the naming more consistent?

A great question, I have been thinking about changing the naming here.

So, meaning, the PUT was finalized on the consensus + storage, but response wasn't received from a client point of view?

Yes, from client perspective it got error so we don't know the response, but we know that request was persisted thanks to looking up WAL.

Seems confusing, given the function above describeEtcdResponse strings this as "unknown"? Should we make the naming more consistent?

The naming in robustness tests is constantly evolving based on feedback. Name "unknown" was meant to just mean we don't know the exact response but we knew it's revision due to matching watch response. Name "persisted" was introduced only recently into code with PR that added reading the WAL.

I think it would make sense to rename "unknown" to "persisted" in describeEtcdResponse. Does that sound good for everyone? @MadhavJivrajani @siyuanfoundation @fuweid

Thanks for the explanation. "persisted" makes sense to me. Defer to other contributors :)

"persisted" makes sense to me

+1 to "persisted"

Renaming to 'persisted' makes sense in the context of WAL.

I am trying to understand could there be any edge case where 'persisted' might not fully capture the state, or scenarios where additional qualifiers might be necessary?

I am trying to understand could there be any edge case where 'persisted' might not fully capture the state, or scenarios where additional qualifiers might be necessary?

Good question, I think the second case would be when request generates a event and we observe it on watch. This is where the "PersistedRevision" comes from. So name "persisted" might not be capturing all cases. Maybe "observed" would be capturing the whole meaning. Observed not by client, but by other side effects, like WAL or WATCH. Don't know if "observed" is better than "persisted" though.

Let's not block this PR on picking the name. If someone is interested feel free to file a separate PR to propose and discuss a new name.

siyuanfoundation · 2024-06-24T18:41:01Z

I am confused by the PR. Can you give some context why the code before this change would result in duplicating states?

serathius · 2024-06-24T18:46:42Z

I am confused by the PR. Can you give some context why the code before this change would result in duplicating states?

This code is meant to remove the duplicated states caused by apply function. When linearizing operation history, this is one of the functions used to decide, is this state transition possible (bool)? What is the new state(nonDeterministicState)? Based on the current state and the provided request it tries to answer if the provided response is possible. Depending on how much we know about the response we might be able to detect early that this linearization branch we are exploring is not possible, and this is what we want to be able to reject those branches as early as possible.

This is codes in the following function, the switch is from the most lenient to the most strict option of validating response:

func (states nonDeterministicState) apply(request EtcdRequest, response MaybeEtcdResponse) (bool, nonDeterministicState) {
	var newStates nonDeterministicState
	switch {
	case response.Error != "":
		newStates = states.applyFailedRequest(request)
	case response.Persisted && response.PersistedRevision == 0:
		newStates = states.applyPersistedRequest(request)
	case response.Persisted && response.PersistedRevision != 0:
		newStates = states.applyPersistedRequestWithRevision(request, response.PersistedRevision)
	default:
		newStates = states.applyRequestWithResponse(request, response.EtcdResponse)
	}
	return len(newStates) > 0, newStates
}

If you check the applyFailedRequest code, it duplicates the states because it doesn't know if etcd handled the request or not:

// applyFailedRequest returns both the original states and states with applied request. It considers both cases, request was persisted and request was lost.
func (states nonDeterministicState) applyFailedRequest(request EtcdRequest) nonDeterministicState {
   newStates := make(nonDeterministicState, 0, len(states)*2)
   for _, s := range states {
   	newStates = append(newStates, s)
   	newState, _ := s.Step(request)
   	if !reflect.DeepEqual(newState, s) {
   		newStates = append(newStates, newState)
   	}
   }
   return newStates
}

If we know that request was persisted by etcd WAL, then we can just assume that it executed and use different apply code:

// applyPersistedRequest applies request to all possible states.
func (states nonDeterministicState) applyPersistedRequest(request EtcdRequest) nonDeterministicState {
	newStates := make(nonDeterministicState, 0, len(states))
	for _, s := range states {
		newState, _ := s.Step(request)
		newStates = append(newStates, newState)
	}
	return newStates
}

…t duplicating state during linearization Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius · 2024-06-25T18:53:13Z

/retest

serathius · 2024-06-25T18:53:27Z

Any reviewers willing to review?

Just making sure we merge it soon as the linearization timeouts have become frequent enough that they happen on presubmit https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/18227/pull-etcd-robustness-amd64/1805320012408295424

siyuanfoundation · 2024-06-25T21:09:42Z

Member

Thanks for explaining the details! LGTM

k8s-ci-robot requested review from MadhavJivrajani, fuweid, henrybear327 and siyuanfoundation June 22, 2024 19:41

k8s-ci-robot added area/robustness-testing area/testing size/L labels Jun 22, 2024

serathius force-pushed the robustness-separate-persisted branch 5 times, most recently from 51cfad7 to df4b8bd Compare June 23, 2024 20:20

gyuho reviewed Jun 24, 2024

View reviewed changes

serathius mentioned this pull request Jun 24, 2024

robustness: reduce concurrency for HighTraffic scenario #18208

Open

serathius mentioned this pull request Jun 24, 2024

Add tests for patching history to check output and return values #18227

Merged

serathius force-pushed the robustness-separate-persisted branch 2 times, most recently from 44921d3 to 972d2be Compare June 24, 2024 19:37

k8s-ci-robot added size/XL and removed size/L labels Jun 24, 2024

Separate persisted responses without knowing their revision to preven…

1870222

…t duplicating state during linearization Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius force-pushed the robustness-separate-persisted branch from 972d2be to 1870222 Compare June 24, 2024 19:38

k8s-ci-robot added size/L and removed size/XL labels Jun 24, 2024

siyuanfoundation approved these changes Jun 25, 2024

View reviewed changes

serathius merged commit 917ded9 into etcd-io:main Jun 25, 2024
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate persisted responses without knowing their revision to prevent duplicating state during linearization #18214

Separate persisted responses without knowing their revision to prevent duplicating state during linearization #18214

serathius commented Jun 22, 2024

k8s-ci-robot commented Jun 22, 2024

serathius commented Jun 24, 2024

gyuho Jun 24, 2024

serathius Jun 24, 2024

gyuho Jun 24, 2024

siyuanfoundation Jun 24, 2024

fuweid Jun 25, 2024

fykaa Jun 25, 2024 •

edited

Loading

serathius Jun 25, 2024

siyuanfoundation commented Jun 24, 2024

serathius commented Jun 24, 2024 •

edited

Loading

serathius commented Jun 25, 2024

serathius commented Jun 25, 2024 •

edited

Loading

siyuanfoundation commented Jun 25, 2024

Separate persisted responses without knowing their revision to prevent duplicating state during linearization #18214

Separate persisted responses without knowing their revision to prevent duplicating state during linearization #18214

Conversation

serathius commented Jun 22, 2024

k8s-ci-robot commented Jun 22, 2024

serathius commented Jun 24, 2024

gyuho Jun 24, 2024

Choose a reason for hiding this comment

serathius Jun 24, 2024

Choose a reason for hiding this comment

gyuho Jun 24, 2024

Choose a reason for hiding this comment

siyuanfoundation Jun 24, 2024

Choose a reason for hiding this comment

fuweid Jun 25, 2024

Choose a reason for hiding this comment

fykaa Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

serathius Jun 25, 2024

Choose a reason for hiding this comment

siyuanfoundation commented Jun 24, 2024

serathius commented Jun 24, 2024 • edited Loading

serathius commented Jun 25, 2024

serathius commented Jun 25, 2024 • edited Loading

siyuanfoundation commented Jun 25, 2024

fykaa Jun 25, 2024 •

edited

Loading

serathius commented Jun 24, 2024 •

edited

Loading

serathius commented Jun 25, 2024 •

edited

Loading