OCPBUGS-85258: 5.0 rebase 3.6.11 by dusk125 · Pull Request #375 · openshift/etcd

dusk125 · 2026-05-07T15:26:01Z

Rebase handled by Claude

❯ ./bin/etcd --version
etcd Version: 3.6.11
Git SHA: 821d95e
Go Version: go1.25.9
Go OS/Arch: darwin/arm64

Summary by CodeRabbit

New Features
- Stronger transaction authorization checks for operations using previous values and leases.
- Added end-to-end and integration tests validating member-add and auth/transaction behaviors.
Bug Fixes
- Quorum connectivity check when adding members now requires connection to a majority of peers.
Chores
- Bumped project version to 3.6.11, Go toolchain to 1.25.9, updated build base images and several indirect Go dependencies.

In CI, the TestGateway and TestMixVersionsSnapshotByAddingMember are flaky due to the TestMixVersionsSnapshotByAddingMember test sometimes not closing the second etcd process. This happens if the second process has not had enough time to become healthy according to the logic in EtcdServer.mayRemoveMember. Fix this by retrying member removal for twice the etcdserver.HealthInterval in EtcdProcessCluster.CloseProc. Signed-off-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>

…ck-of-#20840-upstream-release-3.6 Automated cherry pick of etcd-io#20840

Signed-off-by: Wei Fu <fuweid89@gmail.com>

[release-3.6] *: bump go to 1.25.9

…r is down Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

…g member is down Assume the new member is unavailable and check whether quorum is still preserved. Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

[release-3.6] Bump golang.org/x/image to v0.39.0 to resolve GO-2026-4962

[release-3.6] Fix the issue that cannot add a new member when one member is down, even if quorum is still satisfied

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

[release-3.6] Refactor auth check for Put requests in TXN

…rbac check issue Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

…XN bypass RBAC check Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

…ck issue Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

[release-3.6] Fix read access via PrevKv or lease attachment in a Put request in etcd transactions bypass RBAC authorization checks

Signed-off-by: Ivan Valdes <iv@a.ki>

coderabbitai · 2026-05-07T15:26:19Z

Walkthrough

Upgrades Go toolchain/dependencies and OpenShift base images; refactors txn authorization by moving auth checks into the apply layer (adding exported CheckTxnAuth); and relaxes member-add quorum gating from “connected to all peers” to “connected to a majority” with related test and helper changes.

Changes

Toolchain & Build Infrastructure

Layer / File(s)	Summary
Build config `.ci-operator.yaml`	`build_root_image.tag` updated from `...-openshift-4.23` to `...-openshift-5.0`.
Docker images / Multi-stage builds `Dockerfile*` (e.g. `Dockerfile.art-cachi2`, `Dockerfile.installer`, `Dockerfile.installer.art-cachi2`, `Dockerfile.rhel`)	All builder/runtime base image tags bumped from OCP `4.23` variants to `5.0` variants.
Go toolchain pins `.go-version`, `/go.mod`, `tools//go.mod`, `tests/go.mod`	`toolchain`/`.go-version` bumped from `go1.25.8` → `go1.25.9` across modules and tools.
Module dependency bumps `go.mod`, `api/go.mod`, `client/*/go.mod`, `etcdctl/go.mod`, `etcdutl/go.mod`, `pkg/go.mod`, `server/go.mod`, `tests/go.mod`, ...	Internal module versions advanced from `v3.6.10` → `v3.6.11` and multiple `golang.org/x/*` indirect deps bumped (net, sys, text, crypto, etc.).
Version constant `api/version/version.go`	Exported `Version` constant bumped `3.6.10` → `3.6.11`.

Authorization & Transaction Handling (apply-layer takeover)

Layer / File(s)	Summary
Core apply auth implementation `server/etcdserver/apply/apply_auth.go`	Moved txn authorization into apply layer: added exported `CheckTxnAuth(as auth.AuthStore, ai auth.AuthInfo, lessor lease.Lessor, rt pb.TxnRequest) error`, plus helpers `checkPutAuth`, `checkTxnPermission`, `checkTxnReqsPermission`, `checkLeasePuts`, `checkLeasePutsKeys`; refactored `Put`, `Txn`, and `LeaseRevoke` to call helpers.
Tests for apply auth `server/etcdserver/apply/apply_auth_test.go`	Refactored tests to call new helper signatures; added `TestCheckTxnAuth` table-driven cases and `setupAuth` helper; updated `TestCheckLeasePutsKeys`.
Removed auth from txn layer `server/etcdserver/txn/txn.go`	Removed auth import and deleted previous `CheckTxnAuth` + related helpers from txn package (auth checks relocated).
Txn tests adjusted `server/etcdserver/txn/txn_test.go`	Removed auth-focused tests and imports that exercised txn-layer auth checks; retained non-auth txn tests.
Call site updated `server/etcdserver/v3_server.go`	Read-only txn auth check switched from `txn.CheckTxnAuth` to `apply2.CheckTxnAuth`.
Integration tests `tests/integration/v3_auth_test.go`	Added `perm` field to test users and two tests: `TestReadWithPrevKvInTXN` and `TestPutWithLeaseInTXN` verifying permission-denied behavior for PrevKv and lease-attached puts.

Member Addition & Quorum Logic

Layer / File(s)	Summary
Quorum helpers `server/etcdserver/util.go`	Added `isConnectedToQuorumAfterAddingNewMemberSince` and `quorum(num int)`; removed `isConnectedFullySince`; retained `isConnectedToQuorumSince`.
Member add gating `server/etcdserver/server.go`	`mayAddMember` now uses quorum-majority check (`isConnectedToQuorumAfterAddingNewMemberSince`) instead of requiring full connectivity; log/error message updated accordingly.
E2E test `tests/e2e/ctl_v3_member_test.go`	Added `TestCtlV3MemberAddAsLearnerWithOneMemberDown` covering member-add scenarios with members down across cluster sizes.
Test framework retry logic `tests/framework/e2e/cluster.go`	`CloseProc` member-removal retry loop now uses time-derived retry count based on `etcdserver.HealthInterval` and reports tries in error message.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as EtcdServer
    participant Apply as apply.CheckTxnAuth
    participant AuthStore
    participant Lessor

    Client->>Server: Txn(request)
    Server->>Apply: CheckTxnAuth(authStore, authInfo, lessor, txnReq)
    Apply->>AuthStore: Validate compare key permissions
    AuthStore-->>Apply: ok / denied
    alt compares authorized
        Apply->>Apply: checkTxnReqsPermission(successOps)
        Apply->>AuthStore: IsRangePermitted / IsPutPermitted per op
        AuthStore-->>Apply: ok / denied
    end
    alt lease-attached puts present
        Apply->>Lessor: Lookup(leaseID)
        Lessor-->>Apply: lease keys
        Apply->>AuthStore: IsPutPermitted for lease keys
        AuthStore-->>Apply: ok / denied
    end
    alt permission denied
        Apply-->>Server: ErrPermissionDenied
        Server-->>Client: PERMISSION_DENIED
    else all checks pass
        Apply-->>Server: nil
        Server->>Server: Execute Txn
        Server-->>Client: Txn result
    end

sequenceDiagram
    participant Admin
    participant Server as EtcdServer
    participant MemberAdd as mayAddMember
    participant Util as util.isConnectedToQuorumAfterAddingNewMemberSince
    participant Transporter

    Admin->>Server: AddMember(newMember)
    Server->>MemberAdd: mayAddMember(newMember)
    MemberAdd->>Util: isConnectedToQuorumAfterAddingNewMemberSince(transport, since, self, members)
    Util->>Util: compute quorum(currentMembers + 1)
    Util->>Transporter: check connectivity to peers
    Transporter-->>Util: active peers list
    alt connected to majority after add
        Util-->>MemberAdd: true
        MemberAdd-->>Server: proceed
        Server-->>Admin: Member added
    else not connected to majority
        Util-->>MemberAdd: false
        MemberAdd-->>Server: error - would break active quorum
        Server-->>Admin: FAILED - not connected to majority
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (11 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title references OCPBUGS-85258 and mentions '5.0 rebase 3.6.11', which directly corresponds to the PR's primary changes: updating to etcd v3.6.11 with OpenShift 5.0 base images and Go 1.25.9.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test names are stable and deterministic. New tests use static strings with no dynamic information like timestamps, UUIDs, or generated identifiers.
Test Structure And Quality	✅ Passed	Custom check is not applicable. PR contains only standard Go testing framework tests (func Test*), not Ginkgo tests. Check requires assessment of Ginkgo DSL test code.
Microshift Test Compatibility	✅ Passed	The PR adds new tests but none use Ginkgo patterns (It, Describe, Context, When). The check applies only to Ginkgo e2e tests, so it is not applicable here.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	Custom check not applicable. PR adds etcd native tests using standard Go testing.T, not OpenShift Ginkgo e2e tests. The check specifically targets new Ginkgo tests.
Topology-Aware Scheduling Compatibility	✅ Passed	This PR updates etcd to v3.6.11 with source code refactoring, build configuration updates, and dependency versions. No Kubernetes manifests, operator code, or scheduling constraints were introduced.
Ote Binary Stdout Contract	✅ Passed	No stdout writes in process-level code. No fmt.Print, log.Print, println, or BeforeSuite functions found. Changes are config updates and standard tests only.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Custom check is for Ginkgo e2e tests only. New tests use standard Go testing.T, not Ginkgo. Repository has no Ginkgo framework usage.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-05-07T15:28:36Z

@dusk125: This pull request references Jira Issue OCPBUGS-85258, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sandeepknd

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

❯ ./bin/etcd --version
etcd Version: 3.6.11
Git SHA: 821d95e
Go Version: go1.25.9
Go OS/Arch: darwin/arm64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-07T15:32:35Z

@dusk125: This pull request references Jira Issue OCPBUGS-85258, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sandeepknd

Details

In response to this:

❯ ./bin/etcd --version
etcd Version: 3.6.11
Git SHA: 821d95e
Go Version: go1.25.9
Go OS/Arch: darwin/arm64

Summary by CodeRabbit

Release Notes

New Features

Enhanced transaction authorization checks for operations with previous values and leases.

Added end-to-end test coverage for member addition resilience.

Bug Fixes

Improved quorum connectivity validation when adding new members; now requires majority connection instead of full connectivity.

Chores

Updated Go toolchain to 1.25.9 and updated build base images.

Bumped indirect Go dependencies (golang.org/x/crypto, golang.org/x/net, golang.org/x/sys, golang.org/x/text).

Version bumped to 3.6.11.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

etcdutl/go.mod (1)
73-81: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Upgrade go.opentelemetry.io/otel to v1.41.0 to address HIGH severity DoS (GHSA-mh2q-q3fh-2475 / CVE-2026-29181).

go.opentelemetry.io/otel versions v1.36.0–v1.40.0 are affected; the fix is v1.41.0. The vulnerability allows attackers to amplify CPU and allocations by sending many baggage: header lines, even when each individual value is within the per-value parse limit. CVSS score is 7.5 HIGH (AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H).

This dependency appears at v1.40.0 in etcdutl/go.mod, go.mod, server/go.mod, and tests/go.mod. All affected go.mod files should be updated together to v1.41.0 or above.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@etcdutl/go.mod` around lines 73 - 81, The import entry for
go.opentelemetry.io/otel is pinned to v1.40.0 (e.g., the line
"go.opentelemetry.io/otel v1.40.0 // indirect") and must be upgraded to v1.41.0
to fix CVE-2026-29181; update that module line in all affected go.mod files
(etcdutl/go.mod, root go.mod, server/go.mod, tests/go.mod) to v1.41.0 (or
later), run "go get go.opentelemetry.io/otel@v1.41.0" and "go mod tidy" in each
module to refresh transitive deps, then run the project's tests/build to ensure
nothing breaks.

🧹 Nitpick comments (2)

tests/e2e/ctl_v3_member_test.go (2)
660-660: ⚡ Quick win

time.Sleep after Kill() does not wait for a new leader to be elected.

If the killed member(s) include the current leader, the remaining cluster must hold a new election before MemberAddAsLearner can succeed. etcdserver.HealthInterval + 2*time.Second (≈ 2.5 s) is usually enough, but it's unconditional and timing-sensitive. A WaitLeader call on the surviving members before proceeding would make the test deterministic without being slower on average.
🔧 Suggested fix
 			time.Sleep(etcdserver.HealthInterval + 2*time.Second)
+			epc.WaitLeader(t)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/ctl_v3_member_test.go` at line 660, Replace the unconditional
time.Sleep after Kill() with a deterministic wait for a new leader by invoking
WaitLeader on the surviving member(s) before calling MemberAddAsLearner;
specifically, remove the time.Sleep(etcdserver.HealthInterval + 2*time.Second)
and call the cluster/peer helper like survivingMember.WaitLeader(ctx, timeout)
(or the existing test helper used elsewhere) so the test only proceeds once a
leader is elected and MemberAddAsLearner is invoked reliably.
649-651: 💤 Low value

Consider logging or propagating epc.Close() errors.

Silently discarding the close error with _ = epc.Close() can hide resource-cleanup failures between sub-test iterations. Other defers in this file use require.NoError(t, epc.Close()). At minimum, a t.Logf on error would help diagnose flaky teardowns.
🔧 Suggested fix
-			defer func() {
-				_ = epc.Close()
-			}()
+			defer func() {
+				require.NoError(t, epc.Close())
+			}()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/ctl_v3_member_test.go` around lines 649 - 651, The defer is
silently discarding epc.Close() errors; replace the anonymous defer using `_ =
epc.Close()` with an explicit error check so teardown failures are
surfaced—e.g., in the defer func() call epc.Close(), capture its error and call
require.NoError(t, err) (or if outside a subtest context use t.Logf("epc.Close
error: %v", err) then fail as appropriate). Update the defer that wraps
epc.Close() to use that explicit check instead of silencing the return value.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/etcdserver/apply/apply_auth_test.go`:
- Around line 1049-1052: The test hardcodes auth.AuthInfo{Username: "foo",
Revision: 8} which can mismatch the authorizer's current revision and produce
ErrAuthOldRevision; update the table test to use the authorizer's current
revision by replacing the literal revision with as.Revision() when calling
CheckTxnAuth (i.e., pass &auth.AuthInfo{Username: "foo", Revision:
as.Revision()}), so CheckTxnAuth and the permission logic are validated against
the current auth revision.

In `@server/etcdserver/server.go`:
- Around line 1431-1432: The quorum-safety check currently always treats the new
member as a voting member by calling
isConnectedToQuorumAfterAddingNewMemberSince(s.r.transport, ..., s.MemberID(),
s.cluster.VotingMembers()); change this so the "post-add quorum bump" is only
applied when the member being added is a voting member — for learner additions
do NOT call the "AfterAddingNewMember" variant and instead check quorum against
the current voting set (i.e., use the non-post-add quorum check such as
isConnectedToQuorumSince or the equivalent check with s.cluster.VotingMembers()
that does not include the new member). Ensure you detect learner vs voter using
the member type information available in the add path and switch between
isConnectedToQuorumAfterAddingNewMemberSince(...) for voters and the non-bumped
quorum check for learners.

In `@tests/framework/e2e/cluster.go`:
- Around line 849-863: The loop that calls memberCtl.MemberRemove is treating
only the "member not found" error as success and ignores the successful case
(err == nil); update the loop in the function containing memberCtl.MemberRemove
so that if MemberRemove returns nil you set memberRemoved = true and break
immediately, otherwise keep the existing check for strings.Contains(err.Error(),
"member not found") to mark success; this change ensures both a successful
removal (nil error) and an already-removed state mark memberRemoved and stop
retrying.

In `@tests/integration/v3_auth_test.go`:
- Around line 384-390: The test currently silently falls back to READWRITE when
user.perm is non-empty but not a valid key in authpb.Permission_Type_value;
change the logic in the test setup so that when len(user.perm) > 0 and the
lookup into authpb.Permission_Type_value yields ok == false, the test fails
immediately (e.g., call t.Fatalf or require.FailNow) with a clear message
referencing the invalid user.perm, instead of assigning permType :=
authpb.READWRITE; otherwise keep the default behavior when user.perm is empty.

---

Outside diff comments:
In `@etcdutl/go.mod`:
- Around line 73-81: The import entry for go.opentelemetry.io/otel is pinned to
v1.40.0 (e.g., the line "go.opentelemetry.io/otel v1.40.0 // indirect") and must
be upgraded to v1.41.0 to fix CVE-2026-29181; update that module line in all
affected go.mod files (etcdutl/go.mod, root go.mod, server/go.mod, tests/go.mod)
to v1.41.0 (or later), run "go get go.opentelemetry.io/otel@v1.41.0" and "go mod
tidy" in each module to refresh transitive deps, then run the project's
tests/build to ensure nothing breaks.

---

Nitpick comments:
In `@tests/e2e/ctl_v3_member_test.go`:
- Line 660: Replace the unconditional time.Sleep after Kill() with a
deterministic wait for a new leader by invoking WaitLeader on the surviving
member(s) before calling MemberAddAsLearner; specifically, remove the
time.Sleep(etcdserver.HealthInterval + 2*time.Second) and call the cluster/peer
helper like survivingMember.WaitLeader(ctx, timeout) (or the existing test
helper used elsewhere) so the test only proceeds once a leader is elected and
MemberAddAsLearner is invoked reliably.
- Around line 649-651: The defer is silently discarding epc.Close() errors;
replace the anonymous defer using `_ = epc.Close()` with an explicit error check
so teardown failures are surfaced—e.g., in the defer func() call epc.Close(),
capture its error and call require.NoError(t, err) (or if outside a subtest
context use t.Logf("epc.Close error: %v", err) then fail as appropriate). Update
the defer that wraps epc.Close() to use that explicit check instead of silencing
the return value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 577368ab-5095-4462-b0cf-12e5afe14f82

📥 Commits

Reviewing files that changed from the base of the PR and between 7fff975 and 821d95e.

⛔ Files ignored due to path filters (12)

api/go.sum is excluded by !**/*.sum
client/pkg/go.sum is excluded by !**/*.sum
client/v3/go.sum is excluded by !**/*.sum
etcdctl/go.sum is excluded by !**/*.sum
etcdutl/go.sum is excluded by !**/*.sum
go.sum is excluded by !**/*.sum
pkg/go.sum is excluded by !**/*.sum
server/go.sum is excluded by !**/*.sum
tests/go.sum is excluded by !**/*.sum
tools/mod/go.sum is excluded by !**/*.sum
tools/rw-heatmaps/go.sum is excluded by !**/*.sum
tools/testgrid-analysis/go.sum is excluded by !**/*.sum

📒 Files selected for processing (29)

.ci-operator.yaml
.go-version
Dockerfile.art-cachi2
Dockerfile.installer
Dockerfile.installer.art-cachi2
Dockerfile.rhel
api/go.mod
api/version/version.go
client/pkg/go.mod
client/v3/go.mod
etcdctl/go.mod
etcdutl/go.mod
go.mod
pkg/go.mod
server/etcdserver/apply/apply_auth.go
server/etcdserver/apply/apply_auth_test.go
server/etcdserver/server.go
server/etcdserver/txn/txn.go
server/etcdserver/txn/txn_test.go
server/etcdserver/util.go
server/etcdserver/v3_server.go
server/go.mod
tests/e2e/ctl_v3_member_test.go
tests/framework/e2e/cluster.go
tests/go.mod
tests/integration/v3_auth_test.go
tools/mod/go.mod
tools/rw-heatmaps/go.mod
tools/testgrid-analysis/go.mod

💤 Files with no reviewable changes (2)

server/etcdserver/txn/txn.go
server/etcdserver/txn/txn_test.go

coderabbitai · 2026-05-07T15:37:31Z

+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			err := CheckTxnAuth(as, &auth.AuthInfo{Username: "foo", Revision: 8}, &lease.FakeLessor{}, tt.txnRequest)
+			assert.Equal(t, tt.err, err)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the current auth revision in the table test.

Line 1051 hardcodes Revision: 8, so any extra setup write will flip these cases to ErrAuthOldRevision and stop validating the permission logic you actually care about. Use as.Revision() here instead.

Suggested fix

- err := CheckTxnAuth(as, &auth.AuthInfo{Username: "foo", Revision: 8}, &lease.FakeLessor{}, tt.txnRequest) + err := CheckTxnAuth(as, &auth.AuthInfo{Username: "foo", Revision: as.Revision()}, &lease.FakeLessor{}, tt.txnRequest)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for _, tt := range tests {

t.Run(tt.name, func(t *testing.T) {

err := CheckTxnAuth(as, &auth.AuthInfo{Username: "foo", Revision: 8}, &lease.FakeLessor{}, tt.txnRequest)

assert.Equal(t, tt.err, err)

for _, tt := range tests {

t.Run(tt.name, func(t *testing.T) {

err := CheckTxnAuth(as, &auth.AuthInfo{Username: "foo", Revision: as.Revision()}, &lease.FakeLessor{}, tt.txnRequest)

assert.Equal(t, tt.err, err)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/etcdserver/apply/apply_auth_test.go` around lines 1049 - 1052, The test hardcodes auth.AuthInfo{Username: "foo", Revision: 8} which can mismatch the authorizer's current revision and produce ErrAuthOldRevision; update the table test to use the authorizer's current revision by replacing the literal revision with as.Revision() when calling CheckTxnAuth (i.e., pass &auth.AuthInfo{Username: "foo", Revision: as.Revision()}), so CheckTxnAuth and the permission logic are validated against the current auth revision.

coderabbitai · 2026-05-07T15:37:31Z

+	// Treat the new member as unavailable when checking quorum safety.
+	if !isConnectedToQuorumAfterAddingNewMemberSince(s.r.transport, time.Now().Add(-HealthInterval), s.MemberID(), s.cluster.VotingMembers()) {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't apply the post-add quorum bump to learner additions.

Line 1432 uses the “after adding a new member” quorum calculation for every add. That is correct for voting members, but not for learners: adding a learner does not increase the voting quorum. In a 3-voter cluster with one voter down, this now rejects an add-as-learner even though the current voting quorum is still intact.

Suggested fix

- // Treat the new member as unavailable when checking quorum safety. - if !isConnectedToQuorumAfterAddingNewMemberSince(s.r.transport, time.Now().Add(-HealthInterval), s.MemberID(), s.cluster.VotingMembers()) { + // Treat a new voting member as unavailable when checking quorum safety. + since := time.Now().Add(-HealthInterval) + connectedToRequiredQuorum := isConnectedToQuorumSince(s.r.transport, since, s.MemberID(), s.cluster.VotingMembers()) + if !memb.IsLearner { + connectedToRequiredQuorum = isConnectedToQuorumAfterAddingNewMemberSince(s.r.transport, since, s.MemberID(), s.cluster.VotingMembers()) + } + if !connectedToRequiredQuorum {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/etcdserver/server.go` around lines 1431 - 1432, The quorum-safety check currently always treats the new member as a voting member by calling isConnectedToQuorumAfterAddingNewMemberSince(s.r.transport, ..., s.MemberID(), s.cluster.VotingMembers()); change this so the "post-add quorum bump" is only applied when the member being added is a voting member — for learner additions do NOT call the "AfterAddingNewMember" variant and instead check quorum against the current voting set (i.e., use the non-post-add quorum check such as isConnectedToQuorumSince or the equivalent check with s.cluster.VotingMembers() that does not include the new member). Ensure you detect learner vs voter using the member type information available in the add path and switch between isConnectedToQuorumAfterAddingNewMemberSince(...) for voters and the non-bumped quorum check for learners.

coderabbitai · 2026-05-07T15:37:31Z

+	sleepDuration := 500 * time.Millisecond
+	maxRetries := int((2 * etcdserver.HealthInterval) / sleepDuration)
 	memberRemoved := false
-	for i := 0; i < 10; i++ {
+	for i := 0; i < maxRetries; i++ {
 		_, err := memberCtl.MemberRemove(ctx, memberID)
 		if err != nil && strings.Contains(err.Error(), "member not found") {
 			memberRemoved = true
 			break
 		}

-		time.Sleep(500 * time.Millisecond)
+		time.Sleep(sleepDuration)
 	}

 	if !memberRemoved {
-		return errors.New("failed to remove member after 10 tries")
+		return fmt.Errorf("failed to remove member after %d tries", maxRetries)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mark successful member removal as success immediately.

At Line 853, a successful MemberRemove (err == nil) does not set memberRemoved, so the code retries unnecessarily and can fail later despite already removing the member.

💡 Proposed fix

sleepDuration := 500 * time.Millisecond maxRetries := int((2 * etcdserver.HealthInterval) / sleepDuration) memberRemoved := false for i := 0; i < maxRetries; i++ { _, err := memberCtl.MemberRemove(ctx, memberID) - if err != nil && strings.Contains(err.Error(), "member not found") { + if err == nil || (err != nil && strings.Contains(err.Error(), "member not found")) { memberRemoved = true break } time.Sleep(sleepDuration) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sleepDuration := 500 * time.Millisecond

maxRetries := int((2 * etcdserver.HealthInterval) / sleepDuration)

memberRemoved := false

for i := 0; i < 10; i++ {

for i := 0; i < maxRetries; i++ {

_, err := memberCtl.MemberRemove(ctx, memberID)

if err != nil && strings.Contains(err.Error(), "member not found") {

memberRemoved = true

break

}

time.Sleep(500 * time.Millisecond)

time.Sleep(sleepDuration)

}

if !memberRemoved {

return errors.New("failed to remove member after 10 tries")

return fmt.Errorf("failed to remove member after %d tries", maxRetries)

sleepDuration := 500 * time.Millisecond

maxRetries := int((2 * etcdserver.HealthInterval) / sleepDuration)

memberRemoved := false

for i := 0; i < maxRetries; i++ {

_, err := memberCtl.MemberRemove(ctx, memberID)

if err == nil || (err != nil && strings.Contains(err.Error(), "member not found")) {

memberRemoved = true

break

}

time.Sleep(sleepDuration)

}

if !memberRemoved {

return fmt.Errorf("failed to remove member after %d tries", maxRetries)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/framework/e2e/cluster.go` around lines 849 - 863, The loop that calls memberCtl.MemberRemove is treating only the "member not found" error as success and ignores the successful case (err == nil); update the loop in the function containing memberCtl.MemberRemove so that if MemberRemove returns nil you set memberRemoved = true and break immediately, otherwise keep the existing check for strings.Contains(err.Error(), "member not found") to mark success; this change ensures both a successful removal (nil error) and an already-removed state mark memberRemoved and stop retrying.

coderabbitai · 2026-05-07T15:37:31Z

+		permType := authpb.READWRITE
+		if len(user.perm) > 0 {
+			val, ok := authpb.Permission_Type_value[strings.ToUpper(user.perm)]
+			if ok {
+				permType = authpb.Permission_Type(val)
+			}
+		}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast on invalid permission strings in test setup.

Line [386] currently falls back to READWRITE when user.perm is invalid. That can accidentally over-grant and hide auth test mistakes; make invalid values fail the test instead.

Proposed fix

permType := authpb.READWRITE if len(user.perm) > 0 { val, ok := authpb.Permission_Type_value[strings.ToUpper(user.perm)] - if ok { - permType = authpb.Permission_Type(val) - } + require.Truef(t, ok, "invalid permission type %q for user %s", user.perm, user.name) + permType = authpb.Permission_Type(val) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

permType := authpb.READWRITE

if len(user.perm) > 0 {

val, ok := authpb.Permission_Type_value[strings.ToUpper(user.perm)]

if ok {

permType = authpb.Permission_Type(val)

}

}

permType := authpb.READWRITE

if len(user.perm) > 0 {

val, ok := authpb.Permission_Type_value[strings.ToUpper(user.perm)]

require.Truef(t, ok, "invalid permission type %q for user %s", user.perm, user.name)

permType = authpb.Permission_Type(val)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/v3_auth_test.go` around lines 384 - 390, The test currently silently falls back to READWRITE when user.perm is non-empty but not a valid key in authpb.Permission_Type_value; change the logic in the test setup so that when len(user.perm) > 0 and the lookup into authpb.Permission_Type_value yields ok == false, the test fails immediately (e.g., call t.Fatalf or require.FailNow) with a clear message referencing the invalid user.perm, instead of assigning permType := authpb.READWRITE; otherwise keep the default behavior when user.perm is empty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tjungblu · 2026-05-08T09:28:39Z

/lgtm

openshift-ci · 2026-05-08T09:29:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dusk125, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [dusk125,tjungblu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tjungblu · 2026-05-08T09:29:06Z

/retest

tjungblu · 2026-05-08T09:29:11Z

/verified by @tjungblu

openshift-ci-robot · 2026-05-08T09:29:23Z

@tjungblu: This PR has been marked as verified by @tjungblu.

Details

In response to this:

/verified by @tjungblu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-bot · 2026-05-08T13:01:05Z

/retest-required

Remaining retests: 0 against base HEAD 7fff975 and 2 for PR HEAD 67297a5 in total

dusk125 · 2026-05-08T16:39:13Z

/retest-required

openshift-ci · 2026-05-08T19:37:09Z

@dusk125: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/upstream-e2e	`67297a5`	link	false	`/test upstream-e2e`
ci/prow/upstream-integration	`67297a5`	link	false	`/test upstream-integration`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jonathan-albrecht-ibm and others added 18 commits April 5, 2026 20:24

Merge pull request etcd-io#21572 from pjsharath28/automated-cherry-pi…

f31179a

…ck-of-#20840-upstream-release-3.6 Automated cherry pick of etcd-io#20840

*: bump go to 1.25.9

e25d480

Signed-off-by: Wei Fu <fuweid89@gmail.com>

Merge pull request etcd-io#21586 from fuweid/bump-go-to-1.25.9-36

f239789

[release-3.6] *: bump go to 1.25.9

Add an e2e test to reproduce the adding member failure when one membe…

9daef7f

…r is down Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Fix the issue of not being able to adding new member when one existin…

e989219

…g member is down Assume the new member is unavailable and check whether quorum is still preserved. Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Bump golang.org/x/image to v0.39.0 to resolve GO-2026-4962

3c521c3

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Merge pull request etcd-io#21668 from ahrtr/20260426_dep_3.6

bc2482b

[release-3.6] Bump golang.org/x/image to v0.39.0 to resolve GO-2026-4962

Merge pull request etcd-io#21667 from ahrtr/20260426_add_member

7d4b175

[release-3.6] Fix the issue that cannot add a new member when one member is down, even if quorum is still satisfied

move function CheckTxnAuth from package txn to apply

20e6f23

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Get all Put related auth check into a separate function 'checkPutAuth'

c387fa5

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Merge pull request etcd-io#21681 from ahrtr/20260428_auth_refactor

16a8a36

[release-3.6] Refactor auth check for Put requests in TXN

Add an integration test case to reproduce the read via PrevKv bypass …

3fe5746

…rbac check issue Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Add an integration test to reproduce the issue of PutWithLease in a T…

fbbd0a1

…XN bypass RBAC check Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Fix the 'read via PrevKv' and 'Put with lease' in TXN bypass rbac che…

633de82

…ck issue Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>

Merge pull request etcd-io#21685 from ahrtr/20260429_auth_3.6

d671fd0

[release-3.6] Fix read access via PrevKv or lease attachment in a Put request in etcd transactions bypass RBAC authorization checks

version: bump up to 3.6.11

ec166e2

Signed-off-by: Ivan Valdes <iv@a.ki>

Merge remote-tracking branch 'openshift/main' into 5.0-rebase-3.6.11

fbb3ece

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 7, 2026

openshift-ci Bot requested review from deads2k, sandeepknd and tjungblu May 7, 2026 15:28

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2026

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

DOWNSTREAM: <drop>: update images

67297a5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dusk125 force-pushed the 5.0-rebase-3.6.11 branch from 821d95e to 67297a5 Compare May 7, 2026 15:40

openshift-ci Bot assigned tjungblu May 8, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 8, 2026

openshift-merge-bot Bot merged commit c543fe1 into openshift:main May 8, 2026
9 of 11 checks passed

dusk125 deleted the 5.0-rebase-3.6.11 branch May 8, 2026 19:46

		// Treat the new member as unavailable when checking quorum safety.
		if !isConnectedToQuorumAfterAddingNewMemberSince(s.r.transport, time.Now().Add(-HealthInterval), s.MemberID(), s.cluster.VotingMembers()) {

Conversation

dusk125 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci-robot commented May 7, 2026

Uh oh!

openshift-ci-robot commented May 7, 2026

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

tjungblu commented May 8, 2026

Uh oh!

openshift-ci Bot commented May 8, 2026

Uh oh!

tjungblu commented May 8, 2026

Uh oh!

tjungblu commented May 8, 2026

Uh oh!

openshift-ci-robot commented May 8, 2026

Uh oh!

openshift-merge-bot Bot commented May 8, 2026

Uh oh!

dusk125 commented May 8, 2026

Uh oh!

openshift-ci Bot commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dusk125 commented May 7, 2026 •

edited

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading