Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance check functionality to support checking starting from a pageId #659

Merged
merged 3 commits into from
Feb 4, 2024

Conversation

Elbehery
Copy link
Member

@Elbehery Elbehery commented Jan 1, 2024

This PR add test to changes in #651

cc @ahrtr @fuweid

@Elbehery
Copy link
Member Author

Elbehery commented Jan 1, 2024

@ahrtr so after corrupting a leaf page as discussed on #580 (comment) , running a Tx to execute tc.Check() raises a Panic

this is due to the assertions run at

func (p *Page) FastCheck(id Pgid) {

tx_test.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Jan 7, 2024

We need to verify that checking all other leaf pages should be successful.

For example, a branch page contains 3 items, pointing to three different leaf pages. If we intentionally corrupt one leaf page, then checking that page should fail; but checking the other two pages should be successful.

It's OK to intentionally change one field like this PR does, but suggest to also support randomly corrupting some bytes in a leaf page.

tx_test.go Outdated Show resolved Hide resolved
tx_test.go Outdated Show resolved Hide resolved
tx_test.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Jan 8, 2024

The high level workflow/steps:

  • Create a sample db. Ensure there are at least one branch page, and multiple leaf pages included in the branch page.
  • Find a branch page (search starting from the meta page: root);
    • Corrupt a random page pointed by the branch page. Remember the pageId, and also other page IDs. For example, a branch page contains 3 pages, which are 10, 15, 20. You corrupt page 15, then remember page ID 15, and page IDs (10, 20)
  • Verify: either receiving an error from the channel or panicking means checking failed.
    • Checking page 15 should be failed
    • Checking page 10 and 20 should be all sucessful

A related comment:
The PR iterates all pages starting from page ID4 may not be good. A page being a leaf page doesn't mean that it's being used by the db, because it may be a free page.

@Elbehery Elbehery force-pushed the add_test_check_page branch 2 times, most recently from 7a1bba7 to 51d5cc1 Compare January 14, 2024 18:41
tx_test.go Outdated Show resolved Hide resolved
@Elbehery Elbehery force-pushed the add_test_check_page branch 2 times, most recently from b7cd5e6 to c4f168e Compare January 16, 2024 14:52
@Elbehery
Copy link
Member Author

ptal @fuweid @ahrtr

db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
tx_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
@Elbehery Elbehery force-pushed the add_test_check_page branch 3 times, most recently from 97a2486 to bd415e9 Compare January 18, 2024 14:16
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
@Elbehery Elbehery force-pushed the add_test_check_page branch 2 times, most recently from ea86554 to 5169315 Compare January 19, 2024 18:49
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
@Elbehery
Copy link
Member Author

Elbehery commented Jan 24, 2024

@ahrtr

I suggest to split TestTx_Check_CorruptPage into two separate & simpler test cases instead of using subtest.

fixed 👍🏽

Please try not to use hard coded magic number, e.g. 2200.

fixed 👍🏽

Please resolve all comments which you have already resolved.

fixed 👍🏽

db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
@Elbehery Elbehery force-pushed the add_test_check_page branch 3 times, most recently from c0d651b to 7d7267b Compare January 26, 2024 14:25
@Elbehery
Copy link
Member Author

@ahrtr the failing test is due to encodingToHex

db_whitebox_test.go:233: Check corrupted page.
unexpected fault address 0xffff4cca0000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0xffff4cca0000 pc=0x1bf2a8]

goroutine 19 [running]:
runtime.throw({0x25ffa2?, 0x0?})
	/home/runner/actions-runner/_work/_tool/go/1.21.6/arm64/src/runtime/panic.go:1077 +0x40 fp=0x4000125610 sp=0x40001255e0 pc=0x48ae0
runtime.sigpanic()
	/home/runner/actions-runner/_work/_tool/go/1.21.6/arm64/src/runtime/signal_unix.go:875 +0x22c fp=0x4000125670 sp=0x4000125610 pc=0x6085c
encoding/hex.Encode(...)
	/home/runner/actions-runner/_work/_tool/go/1.21.6/arm64/src/encoding/hex/hex.go:46
encoding/hex.EncodeToString(...)
	/home/runner/actions-runner/_work/_tool/go/1.21.6/arm64/src/encoding/hex/hex.go:108
go.etcd.io/bbolt.hexKvStringer.KeyToString(...)
	/home/runner/actions-runner/_work/bbolt/bbolt/tx_check.go:285
go.etcd.io/bbolt.(*hexKvStringer).KeyToString(0x4000125788?, {0xffff4c7afff9, 0x5d90d0, 0x1d854?})
	<autogenerated>:1 +0x78 fp=0x40001256b0 sp=0x4000125[68](https://github.com/etcd-io/bbolt/actions/runs/7669309227/job/20903141081?pr=659#step:6:69)0 pc=0x1bf2a8
go.etcd.io/bbolt.KVStringer.KeyToString-fm({0xffff4c7afff9?, 0x4000178000?, 0x4000125[76](https://github.com/etcd-io/bbolt/actions/runs/7669309227/job/20903141081?pr=659#step:6:77)8?})
	<autogenerated>:1 +0x44 fp=0x40001256e0 sp=0x40001256b0 pc=0x1bf9d4
go.etcd.io/bbolt.verifyKeyOrder(0x[79](https://github.com/etcd-io/bbolt/actions/runs/7669309227/job/20903141081?pr=659#step:6:80)8f4?, {0x25fc96, 0x4}, 0x1, {0xffff4c7afff9, 0x5d90d0, 0x5d90d0}, {0xffff4[80](https://github.com/etcd-io/bbolt/actions/runs/7669309227/job/20903141081?pr=659#step:6:81)307b0, 0x7, 0x7}, ...)

this is being using within verifyKeyOrder

I dump random bytes to corrupt the page, it is expected to panic,

I am stuck :/

@ahrtr
Copy link
Member

ahrtr commented Jan 26, 2024

@ahrtr the failing test is due to encodingToHex

It seems that we can't randomly corrupt the data. We can only change the key's data to break the btree's invariant property. For example, assuming the key is "0x 12 34 56 78", we intentionally change it to "0x aa bb cc dd".

@Elbehery
Copy link
Member Author

@ahrtr the failing test is due to encodingToHex

It seems that we can't randomly corrupt the data. We can only change the key's data to break the btree's invariant property. For example, assuming the key is "0x 12 34 56 78", we intentionally change it to "0x aa bb cc dd".

but how it worked for all other tests, only this one fails !!!

ok i can change the test to read a key and change it, write it again .. but this way the test will be computationally expensive, which is not the test case should be iiuc

@ahrtr
Copy link
Member

ahrtr commented Jan 26, 2024

Or we can just remove TestTx_Check_CorruptPage_DumpRandomBytes for now

@Elbehery
Copy link
Member Author

done 👍🏽

db_whitebox_test.go Outdated Show resolved Hide resolved
db_whitebox_test.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Jan 27, 2024

Previously the reason why we moved the test case into db_whitebox_test.go was that we need to catch the exception (panicking). Now the test case won't be panicking anymore, so we need to call the exported Check instead of the un-exported check, and move the test into tx_check_test.go.

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
@Elbehery
Copy link
Member Author

@ahrtr updated 👍🏽

@ahrtr ahrtr changed the title add test check page Enhance check functionality to support checking starting from a pageId Jan 27, 2024
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@ahrtr
Copy link
Member

ahrtr commented Feb 3, 2024

Just refactored/simplified the test case in the third commit.

@fuweid @Elbehery @ishan16696

@Elbehery
Copy link
Member Author

Elbehery commented Feb 3, 2024

LGTM

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahrtr
Copy link
Member

ahrtr commented Feb 4, 2024

Actually the implementation has a bug, let me fix it in a separate PR.

@ahrtr ahrtr merged commit da5975b into etcd-io:main Feb 4, 2024
16 checks passed
@ahrtr
Copy link
Member

ahrtr commented Feb 4, 2024

Linked to #580

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants