New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] timestamp or checksum not matched in test_snapshot_hash_detect_corruption test case #6145
Comments
IIRC, we don't change the data plane logic, so snapshots might somehow be corrupted by other mechanisms or changes. BTW, this is good that the checksum logic successfully caught the corrupted snapshots. |
Also found the grpc error when
|
Does this mean this test case failed just before corrupting the snapshot? then it means the test case itself was not actually tested yet. /manager/integration/tests/test_snapshot.py#L328-L339 # Step 2
create_snapshots(client, volume, 1536, 3)
# Step 3
assert check_snapshot_checksums_and_change_timestamps(volume) # <---- failed here
# Step 4
snapshot_name = get_available_snapshot(volume)
assert snapshot_name != ""
assert corrupt_snapshot_on_local_host(volume, snapshot_name) @derekbit Would be it possible the checksum and ctime saved in checksum inconsistent from the snapshot at runtime somehow? I thought the checksum file should be updated after the snapshot is immutably ready. |
Probably we need to review the test case or see if there would be a chance the checksum file inconsistent with the snapshot disk file at runtime. |
The test case is okay. I've found the root cause. Will update later. |
Pre Ready-For-Testing Checklist
|
No, a snapshot's checksum or ctime should be immutable, or it indicates there are bugs in the data engine. |
Describe the bug (馃悰 if you encounter this issue)
In test case
test_snapshot_hash_detect_corruption_in_global_fast_check_mode
ortest_snapshot_hash_detect_corruption_in_global_enabled_mode
, it tries to check the checksum value and ctime of the checksum file incheck_snapshot_checksums_and_change_timestamps
before corrupting the snapshot:But this check randomly failed. It could be the checksum not matched:
https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/524/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
https://ci.longhorn.io/job/public/job/master/job/rhel/job/amd64/job/longhorn-tests-rhel-amd64/64/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
Or the ctime of the checksum file not matched:
https://ci.longhorn.io/job/public/job/master/job/rhel/job/amd64/job/longhorn-tests-rhel-amd64/59/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_enabled_mode/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-arm64/15/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_enabled_mode/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/6/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_enabled_mode/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/12/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
https://ci.longhorn.io/job/public/job/master/job/rhel/job/amd64/job/longhorn-tests-rhel-amd64/62/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
It could be hard to manually reproduce because of its tedious and time-consuming test setup, and there's another issue also happening to this test case: #6129. So if the test case failed, it could be due to either issue addressed in this ticket or the issue addressed in #6129.
This issue could be introduced after
v1.5.0-rc2
, at least we didn't observe this inv1.5.0-rc1
.To Reproduce
Run test case
test_snapshot_hash_detect_corruption_in_global_fast_check_mode
ortest_snapshot_hash_detect_corruption_in_global_enabled_mode
Expected behavior
A clear and concise description of what you expected to happen.
Log or Support bundle
If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Environment
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: