You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I run the ssd_perf/graph_cache_leader workload for 64 navy writer threads and 64 navy reader threads. After 50 minutes, the terminal output is flooded with the item header checksum mismatch error.
Expected behavior
Output statistics for every minute.
Screenshots
E0417 20:53:13.873994 878573 BlockCache.cpp:402] Item header checksum mismatch. Region 5229 is likely corrupted. Expected:1667855729, Actual: 564302642. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:14.049113 878575 BlockCache.cpp:402] Item header checksum mismatch. Region 5225 is likely corrupted. Expected:1634476133, Actual: 1712872625. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:14.211392 878577 BlockCache.cpp:402] Item header checksum mismatch. Region 5228 is likely corrupted. Expected:1868832889, Actual: 3472898925. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:14.413154 878579 BlockCache.cpp:402] Item header checksum mismatch. Region 5221 is likely corrupted. Expected:1948283493, Actual: 367335369. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:14.565924 878581 BlockCache.cpp:402] Item header checksum mismatch. Region 5230 is likely corrupted. Expected:1919033451, Actual: 3110670073. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:14.705444 878583 BlockCache.cpp:402] Item header checksum mismatch. Region 5219 is likely corrupted. Expected:1634476133, Actual: 1712872625. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:14.863216 878585 BlockCache.cpp:402] Item header checksum mismatch. Region 5226 is likely corrupted. Expected:1667855729, Actual: 564302642. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:15.022443 878587 BlockCache.cpp:402] Item header checksum mismatch. Region 5223 is likely corrupted. Expected:539912047, Actual: 1512533508. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:15.154551 878589 BlockCache.cpp:402] Item header checksum mismatch. Region 5245 is likely corrupted. Expected:1864397680, Actual: 29886577. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:15.297171 878591 BlockCache.cpp:402] Item header checksum mismatch. Region 5222 is likely corrupted. Expected:1713401463, Actual: 3011049547. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:15.436860 878593 BlockCache.cpp:402] Item header checksum mismatch. Region 5224 is likely corrupted. Expected:543516756, Actual: 4032079696. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:15.586964 878595 BlockCache.cpp:402] Item header checksum mismatch. Region 5243 is likely corrupted. Expected:544763750, Actual: 614490409. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:15.722491 878597 BlockCache.cpp:402] Item header checksum mismatch. Region 5239 is likely corrupted. Expected:1735353376, Actual: 2068717658. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:15.886340 878599 BlockCache.cpp:402] Item header checksum mismatch. Region 5232 is likely corrupted. Expected:1864397680, Actual: 29886577. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:16.040464 878601 BlockCache.cpp:402] Item header checksum mismatch. Region 5236 is likely corrupted. Expected:2053205024, Actual: 884959818. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:16.173985 878603 BlockCache.cpp:402] Item header checksum mismatch. Region 5231 is likely corrupted. Expected:543516756, Actual: 4032079696. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:16.327759 878605 BlockCache.cpp:402] Item header checksum mismatch. Region 5227 is likely corrupted. Expected:1814062440, Actual: 1958191057. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
E0417 20:53:16.474437 878607 BlockCache.cpp:402] Item header checksum mismatch. Region 5234 is likely corrupted. Expected:544763750, Actual: 614490409. Aborting reclaim. Remaining items in the region will not be cleaned up (destructor won't be invoked).
Server:
Hardware: Intel Ice Lake, 72 core, 256GB RAM, 1.6TB NVMe
Question:
Can anyone help me analyze the reason for the crash? Is this because of the large number of navy threads in use and potentially a CacheLib error?
The text was updated successfully, but these errors were encountered:
You are getting the checksum error from the data read from SSD, and it is highly like caused by the device error. Can you retry on another SSD? or just DRAM for debugging?
You are getting the checksum error from the data read from SSD, and it is highly like caused by the device error. Can you retry on another SSD? or just DRAM for debugging?
Hi, I find the error may be caused by a previous cachebench process that is not fully cleaned. We can close the issue now.
Thanks.
Describe the bug
I run the ssd_perf/graph_cache_leader workload for 64 navy writer threads and 64 navy reader threads. After 50 minutes, the terminal output is flooded with the
item header checksum mismatch
error.To Reproduce
The configuration file I use:
Expected behavior
Output statistics for every minute.
Screenshots
Server:
Question:
Can anyone help me analyze the reason for the crash? Is this because of the large number of navy threads in use and potentially a CacheLib error?
The text was updated successfully, but these errors were encountered: