Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: fix nil pointer panic for readonly txn #14895

Merged
merged 1 commit into from Dec 6, 2022

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Dec 5, 2022

Fix #14891

Please see #14891 (comment)

Signed-off-by: Benjamin Wang wachao@vmware.com

I need to add unit test to verify it.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

cc @ptabor @serathius

@ahrtr
Copy link
Member Author

ahrtr commented Dec 5, 2022

The old behavior of etcd was it always panic when the application process fails. But we changed the behavior for readonly txn in #14149. Readonly txn doesn't panic any more, but it causes this issue.

I am thinking probably we should rollback the change in #14149, so as to keep the original behavior. In short, we should intentionally panic/crash etcd if the application fails, no matter it's readonly or not. More simpler, more stable.

WDYT? @ptabor @serathius @spzala

@ahrtr ahrtr force-pushed the fix_readyonly_txn_panic_20221205 branch from 48d6322 to a1110c3 Compare December 5, 2022 12:15
@codecov-commenter
Copy link

Codecov Report

Merging #14895 (a1110c3) into main (5baf837) will increase coverage by 0.07%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main   #14895      +/-   ##
==========================================
+ Coverage   74.52%   74.60%   +0.07%     
==========================================
  Files         415      415              
  Lines       34335    34331       -4     
==========================================
+ Hits        25589    25611      +22     
+ Misses       7104     7078      -26     
  Partials     1642     1642              
Flag Coverage Δ
all 74.60% <0.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/etcdserver/txn/util.go 71.42% <0.00%> (-4.05%) ⬇️
server/etcdserver/api/v3rpc/util.go 74.19% <0.00%> (-6.46%) ⬇️
client/v3/concurrency/session.go 88.63% <0.00%> (-4.55%) ⬇️
server/storage/mvcc/kvstore_compaction.go 95.65% <0.00%> (-4.35%) ⬇️
client/v3/leasing/txn.go 88.09% <0.00%> (-3.18%) ⬇️
server/etcdserver/cluster_util.go 70.35% <0.00%> (-3.17%) ⬇️
client/v3/concurrency/election.go 79.68% <0.00%> (-2.35%) ⬇️
server/etcdserver/api/rafthttp/peer.go 85.06% <0.00%> (-1.95%) ⬇️
server/proxy/grpcproxy/watch.go 93.64% <0.00%> (-1.74%) ⬇️
server/storage/mvcc/kvstore.go 89.00% <0.00%> (-1.42%) ⬇️
... and 22 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lavacat
Copy link

lavacat commented Dec 5, 2022

For ref, panic was originally added here

we should check potential errors before executing the transaction. we cannot abort executed commands.

This comment wasn't fully implemented. All potential errors in txn are from kv layer. For example ErrFutureRev, ErrCompacted. None of backend interface methods return errors.

I think #14149 fixes legit use-case (see PR description). Context checking in range requests was added long after original panic logic.
Potential improvement is to use RangeResult.More from https://github.com/etcd-io/etcd/pull/14810/files#diff-3b5c2c015ac7291d97cfb1daa9254c763300fc1cc0dfe668b3deed0080dffafbR37 instead of error when context is done.

@ahrtr ahrtr force-pushed the fix_readyonly_txn_panic_20221205 branch 2 times, most recently from 6c28b75 to ec3ec0c Compare December 6, 2022 06:51
FYI. etcd-io#14891 (comment)

Signed-off-by: Benjamin Wang <wachao@vmware.com>
@ahrtr ahrtr force-pushed the fix_readyonly_txn_panic_20221205 branch from ec3ec0c to daad3a2 Compare December 6, 2022 06:53
@ahrtr
Copy link
Member Author

ahrtr commented Dec 6, 2022

Potential improvement is to use RangeResult.More from https://github.com/etcd-io/etcd/pull/14810/files#diff-3b5c2c015ac7291d97cfb1daa9254c763300fc1cc0dfe668b3deed0080dffafbR37 instead of error when context is done.

This isn't correct. RangeResult.More means read successfully, but only partial data is returned. In this case, an error will explicitly returned to client in this case.

Added an unit test case. PTAL @ptabor @serathius @spzala

@lavacat
Copy link

lavacat commented Dec 6, 2022

This isn't correct. RangeResult.More means read successfully, but only partial data is returned. In this case, an error will explicitly returned to client in this case.

I didn't explain well. The idea is to return partial result and not fail on cancel/timeout when inside txn. It's not the best solution, because user expects full result. But I think it's better than panic inside txn range that will cause node to stop.

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for fixing the panic. Don't have enough context about reverting the PR.

@ahrtr ahrtr merged commit e6ef3c0 into etcd-io:main Dec 6, 2022
ahrtr added a commit to ahrtr/etcd that referenced this pull request Dec 6, 2022
Backporting etcd-io#14895

Signed-off-by: Benjamin Wang <wachao@vmware.com>
ahrtr added a commit to ahrtr/etcd that referenced this pull request Dec 6, 2022
Backporting etcd-io#14895

Signed-off-by: Benjamin Wang <wachao@vmware.com>
ahrtr added a commit to ahrtr/etcd that referenced this pull request Dec 6, 2022
Backporting etcd-io#14895

Signed-off-by: Benjamin Wang <wachao@vmware.com>
tjungblu pushed a commit to tjungblu/etcd that referenced this pull request Jul 26, 2023
Backporting etcd-io#14895

Signed-off-by: Benjamin Wang <wachao@vmware.com>
tjungblu pushed a commit to tjungblu/etcd that referenced this pull request Jul 26, 2023
Backporting etcd-io#14895

Signed-off-by: Benjamin Wang <wachao@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Segfault in warnOfExpensiveReadOnlyTxnRequest due to process pause
4 participants