Skip Wal Recovery on SecondaryDB Open if for Remote Compaction#14462
Skip Wal Recovery on SecondaryDB Open if for Remote Compaction#14462jaykorean wants to merge 9 commits into
Conversation
✅ clang-tidy: No findings on changed linesCompleted in 243.2s. |
44fa6bd to
630b2a0
Compare
|
@jaykorean has imported this pull request. If you are a Meta employee, you can view this in D96788211. |
| // Track whether FindAndRecoverLogFiles is called during compaction | ||
| std::atomic_bool wal_recovery_called{false}; | ||
| SyncPoint::GetInstance()->SetCallBack( | ||
| "DBImplSecondary::FindAndRecoverLogFiles:Begin", |
There was a problem hiding this comment.
A better test is to not add sync point when we could invoke real DB behavior relatively easily. Even arbitrarily corrupting the WAL and seeing if the remote compaction is intact, compared before and after this PR seems better than adding a sync point. Alternatively, some statistics about WAL read IO would be ideal though I am not sure if it exists. Maybe that's a gap. FS wrapper counting reads can be used for testing here too.
Whenever we can use and observe the actual DB behavior, we should avoid adding sync point to make future refactoring easier
There was a problem hiding this comment.
Looks like there's no other good option. Claude is suggesting to create a new FileSystemWrapper implementation for just WalrReadCounting purpose - WalReadCountingFS 😆 which I feel is overkill.
Maybe stress test is good enough.
| // When true, WAL replay is skipped during Recover(). Used internally by | ||
| // OpenAndCompact() which only needs LSM state from MANIFEST. | ||
| bool skip_wal_recovery_ = false; | ||
|
|
||
| // Internal helper for opening a secondary instance, with an option to skip | ||
| // WAL recovery. Used by DB::OpenAsSecondary() and DB::OpenAndCompact(). | ||
| static Status OpenAsSecondaryImpl( | ||
| const DBOptions& db_options, const std::string& dbname, | ||
| const std::string& secondary_path, | ||
| const std::vector<ColumnFamilyDescriptor>& column_families, | ||
| std::vector<ColumnFamilyHandle*>* handles, std::unique_ptr<DB>* dbptr, | ||
| bool skip_wal_recovery); | ||
|
|
There was a problem hiding this comment.
I wonder if the friend class and member variable can be avoided if we carve out the WAL recovery from secondary db's Recover() function to be its own since it's optional for secondary db's recovery.
DB::OpenAsSecondary() [public, unchanged signature]
└── DB::OpenAsSecondaryCore() [private static on DB]
├── creates DBImplSecondary
├── calls Recover() ← MANIFEST only (WAL removed)
└── sets up handles, superversions
└── impl->FindAndRecoverLogFiles() ← added after Core returns
DB::OpenAndCompact()
└── DB::OpenAsSecondaryCore() ← no WAL recovery, done
There was a problem hiding this comment.
Definitely doable. And cleaner, too I think. It just requires a bit more changes. Let me put up a change for this. This won't require bool skip_wal_recovery_ at all.
a5875aa to
3e2365e
Compare
|
@jaykorean has imported this pull request. If you are a Meta employee, you can view this in D96788211. |
ac50fae to
ce8600a
Compare
|
@jaykorean has imported this pull request. If you are a Meta employee, you can view this in D96788211. |
|
@jaykorean merged this pull request in 89322fd. |
…ook#14462) Summary: Skip WAL recovery when opening a secondary DB instance in OpenAndCompact() for remote compaction. WAL replay is unnecessary in this flow since only LSM state from MANIFEST is needed. Pull Request resolved: facebook#14462 Test Plan: - make -j db_secondary_test && ./db_secondary_test — 35/35 passed - make -j compaction_service_test && ./compaction_service_test — 43/43 passed (includes new SkipWALRecoveryInOpenAndCompact test) - make -j options_settable_test && ./options_settable_test --gtest_filter="*DBOptionsAllFieldsSettable*" — 1/1 passed - Removed temporary hack in stress test that disables WAL Reviewed By: hx235 Differential Revision: D96788211 Pulled By: jaykorean fbshipit-source-id: f91a2f861f2450ebc83423ed4c6f5b70da7d9e8b
Summary:
Skip WAL recovery when opening a secondary DB instance in OpenAndCompact() for remote compaction. WAL replay is unnecessary in this flow since only LSM state from MANIFEST is needed.
Test Plan: