You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
This is Somnath and I am a Ceph developer working on integrating rocksdb as a metadata store for Ceph's upcoming new backend. In the process we found this bug within rocksdb. If I enable log recycling with the rocksdb (recycle_log_file_num=16) after ~5-6 hours of run (1M seq but can happen to any workload I guess)) I am getting an error during db->WriteBatch(). Digging down further I got the following anomaly in the rocksdb log.
It seems the bug is around this portion.
2016-08-25 00:44:03.348710 7f7c117ff700 4 rocksdb: adding log 254 to recycle list
2016-08-25 00:44:03.348722 7f7c117ff700 4 rocksdb: adding log 256 to recycle list
bluefs part of the log is within ceph.
So, log 254 is added to the recycle list and at the same time it is added for deletion. The following log entry is in response to rocksdb delete call.
2016-08-25 00:44:03.348794 7f7c117ff700 10 bluefs unlink db.wal/000254.log
It seems there is a race condition in this portion (?).
I was going through the rocksdb code and I found the following.
DBImpl::FindObsoleteFiles is the one that is responsible for populating log_recycle_files and log_delete_files. It is also deleting entries from alive_log_files_. But, this is always under mutex_ lock.
Log is deleted from DBImpl::DeleteObsoleteFileImpl which is not under lock , but iterating over log_delete_files. This is fishy but it shouldn't be the reason for same log number end up in two list.
I saw all the places but the following place alive_log_files_ (within DBImpl::WriteImpl) is accessed without lock.
If log recycling is enabled with the rocksdb (recycle_log_file_num=16)
db->Writebatch is erroring out with keynotfound after ~5-6 hours of run
(1M seq but can happen to any workload I guess).See my detailed bug
report here (facebook#1303).
This commit is the fix for this, a check is been added not to delete
the log file if it is already there in the recycle list.
Test Plan:
Unit tested it and ran the similar profile. Not reproducing anymore.
If log recycling is enabled with the rocksdb (recycle_log_file_num=16)
db->Writebatch is erroring out with keynotfound after ~5-6 hours of run
(1M seq but can happen to any workload I guess).See my detailed bug
report here (#1303).
This commit is the fix for this, a check is been added not to delete
the log file if it is already there in the recycle list.
Test Plan:
Unit tested it and ran the similar profile. Not reproducing anymore.
Hi,
This is Somnath and I am a Ceph developer working on integrating rocksdb as a metadata store for Ceph's upcoming new backend. In the process we found this bug within rocksdb. If I enable log recycling with the rocksdb (recycle_log_file_num=16) after ~5-6 hours of run (1M seq but can happen to any workload I guess)) I am getting an error during db->WriteBatch(). Digging down further I got the following anomaly in the rocksdb log.
It seems the bug is around this portion.
2016-08-25 00:44:03.348710 7f7c117ff700 4 rocksdb: adding log 254 to recycle list
2016-08-25 00:44:03.348722 7f7c117ff700 4 rocksdb: adding log 256 to recycle list
2016-08-25 00:44:03.348725 7f7c117ff700 4 rocksdb: (Original Log Time 2016/08/25-00:44:03.347467) [default] Level-0 commit table #258 started
2016-08-25 00:44:03.348727 7f7c117ff700 4 rocksdb: (Original Log Time 2016/08/25-00:44:03.348225) [default] Level-0 commit table #258: memtable #1 done
2016-08-25 00:44:03.348729 7f7c117ff700 4 rocksdb: (Original Log Time 2016/08/25-00:44:03.348227) [default] Level-0 commit table #258: memtable #2 done
2016-08-25 00:44:03.348730 7f7c117ff700 4 rocksdb: (Original Log Time 2016/08/25-00:44:03.348239) EVENT_LOG_v1 {"time_micros": 1472111043348233, "job": 88, "event": "flush_finished", "lsm_state": [3, 4, 0, 0, 0, 0, 0], "immutable_memtables": 0}
2016-08-25 00:44:03.348735 7f7c117ff700 4 rocksdb: (Original Log Time 2016/08/25-00:44:03.348297) [default] Level summary: base level 1 max bytes base 5368709120 files[3 4 0 0 0 0 0] max score 0.75
2016-08-25 00:44:03.348751 7f7c117ff700 4 rocksdb: [JOB 88] Try to delete WAL files size 131512372, prev total WAL file size 131834601, number of live WAL files 3.
2016-08-25 00:44:03.348761 7f7c117ff700 10 bluefs unlink db.wal/000256.log
2016-08-25 00:44:03.348766 7f7c117ff700 20 bluefs _drop_link had refs 1 on file(ino 19 size 0x3f3ddd9 mtime 2016-08-25 00:41:26.298423 bdev 0 extents [0:0xc500000+200000,0:0xcb00000+800000,0:0xd700000+700000,0:0xe200000+800000,0:0xee00000+800000,0:0xfa00000+800000,0:0x10600000+700000,0:0x11100000+700000,0:0x11c00000+800000,0:0x12800000+100000])
2016-08-25 00:44:03.348775 7f7c117ff700 20 bluefs _drop_link destroying file(ino 19 size 0x3f3ddd9 mtime 2016-08-25 00:41:26.298423 bdev 0 extents [0:0xc500000+200000,0:0xcb00000+800000,0:0xd700000+700000,0:0xe200000+800000,0:0xee00000+800000,0:0xfa00000+800000,0:0x10600000+700000,0:0x11100000+700000,0:0x11c00000+800000,0:0x12800000+100000])
2016-08-25 00:44:03.348794 7f7c117ff700 10 bluefs unlink db.wal/000254.log
2016-08-25 00:44:03.348796 7f7c117ff700 20 bluefs _drop_link had refs 1 on file(ino 18 size 0x3f3d402 mtime 2016-08-25 00:41:26.299110 bdev 0 extents [0:0x6500000+400000,0:0x6d00000+700000,0:0x7800000+800000,0:0x8400000+800000,0:0x9000000+800000,0:0x9c00000+700000,0:0xa700000+900000,0:0xb400000+800000,0:0xc000000+500000])
2016-08-25 00:44:03.348803 7f7c117ff700 20 bluefs _drop_link destroying file(ino 18 size 0x3f3d402 mtime 2016-08-25 00:41:26.299110 bdev 0 extents [0:0x6500000+400000,0:0x6d00000+700000,0:0x7800000+800000,0:0x8400000+800000,0:0x9000000+800000,0:0x9c00000+700000,0:0xa700000+900000,0:0xb400000+800000,0:0xc000000+500000])
bluefs part of the log is within ceph.
So, log 254 is added to the recycle list and at the same time it is added for deletion. The following log entry is in response to rocksdb delete call.
2016-08-25 00:44:03.348794 7f7c117ff700 10 bluefs unlink db.wal/000254.log
It seems there is a race condition in this portion (?).
I was going through the rocksdb code and I found the following.
4625 alive_log_files_.back().AddSize(log_entry.size());
Can it be reintroducing the same log number (254) , I am not sure.
If I run disabling rocksdb log recyling it is going through fine for me, no db error.
Disabling log recycling is probably not an option for us as I am seeing it is introducing spikes and the output is not very stable.
Appreciate any help resolving this issue.
Thanks & Regards
Somnath
The text was updated successfully, but these errors were encountered: