io: add fault injection helpers to persistence layer #16215

dotnwat · 2024-01-22T01:06:43Z

io: add fault injection helpers to persistence layer

Backports Required

Release Notes

none

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

rockwotj

LGTM, a suggestions to make tests a little more readable.

rockwotj · 2024-01-22T02:45:44Z

src/v/io/tests/persistence_test.cc

+class fault_injection : std::exception {};
+
+TYPED_TEST(PersistenceTest, ThrowCreate) {
+    this->fs.fail_next_create(std::make_exception_ptr(fault_injection()));


nit: there is a fair amount of noise with the std::make_exception_ptr(fault_injection()). Can we either:

Consolidate these into a helper function or create a helper method to call make_exeception_ptr like seastar does in a couple of places? Ex: https://github.com/scylladb/seastar/blob/4b01caf666187ebae8344b8a0b6b73974b0876d9/include/seastar/core/abort_source.hh#L172-L175

rockwotj · 2024-01-22T02:47:27Z

src/v/io/persistence.cc

+    return maybe_fail_open().then([path = std::move(path)] {
    const auto flags = seastar::open_flags::rw;
-    auto file = co_await seastar::open_file_dma(path.string(), flags);
-    co_return seastar::make_shared<disk_file>(std::move(file));
+    return seastar::open_file_dma(path.string(), flags).then([](auto file) {
+    return seastar::make_ready_future<seastar::shared_ptr<persistence::file>>(seastar::make_shared<disk_file>(std::move(file)));
+    });
+    });


What happened to coro4lyfe?

haha. given this is the bottom of the stack, every I/O will incur the performance overhead of coroutine (see Travis' talk a couple months ago).

nvartolomei · 2024-01-22T12:35:53Z

Quickly skimmed the code and wondering if you have considered adding a wrapper over persistence which does the failure injecting instead of modifying the actual persistence implementations? Something like

class fij_persistence : persistence {
  fij_persistence(persistence p);
}

auto p = fij_persistence{disk_persistence{}};

You wouldn't have to duplicate the failure injection logic then in every single persistence implementation (we don't expect many though)

dotnwat · 2024-01-22T15:43:56Z

@nvartolomei

Quickly skimmed the code and wondering if you have considered adding a wrapper over persistence which does the failure injecting instead of modifying the actual persistence implementations? Something like

Great idea. I did this originally and it worked ok but I am still on the fence if one is better than the other. The downside of this is that since the infrastructure I'm building up still operated on the super class persistence* i had this annoying need to dynamic_cast into the fault-injection wrapper type so I could inject the failures.

Then I thought, well, is this just for testing? If so, then that's fine. But ultimately it seems like it might be beneficial to simply have fault injection be built-in so it's easy to use rather than needing to swap out implementations etc...

Any thoughts on how to do that better?

rockwotj · 2024-01-22T15:52:41Z

More complex, but you can always decouple injecting from the persistence wrapper. Example usage:

auto [failable_persistence, failure_injector] =
        wrap_in_failure_injector(std::move(persistence));
failure_injector.fail_next_open();
EXPECT_THROWS(co_await failable_persistence.open());

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

rockwotj

LGTM, happy to review a switch to decoupling failure injecting from persistence itself.

dotnwat · 2024-01-22T16:13:05Z

LGTM, happy to review a switch to decoupling failure injecting from persistence itself.

Thanks. I'm definitely on the same page with your and Nicoale that there are a few ways to approach fault injection. There is totally some aspect of fault injection related to the interfaces for injection and wiring things up into a higher level (a few cases exist in the tree). However at the lowest level I'm not sure there is a benefit to a wrappers/inheritance approach if we have the goal of being able to inject faults into a release build and injecting faults outside of a unit test context--swapping out an implementation at that lowest level seems problematic?

rockwotj · 2024-01-22T17:08:30Z

swapping out an implementation at that lowest level seems problematic?

I don't think so if the interface is well defined? What problems do you foresee?

benefit to a wrappers/inheritance approach

I am ambivalent about the decorator pattern here especially because we're only repeating the error injecting twice (my DRY alarm is set to 3), more of a benefit in my mind is decoupling fault injection, because now it's impossible at a typesystem level for us to need to pass the persistence layer to a component that doesn't need it (ie. admin api to inject failures or something) and now lower-level bits in the io subsystem can't have access to injecting failures (structurally) without explicitly passing in the injector object.

Anyways, I don't feel strongly about this (I did approve after all), but I kind of like the decoupling for the reasons I mention.

dotnwat added 3 commits January 21, 2024 17:05

io: remove unnecessary namespace prefix

49f1f76

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

io: add persistence fault injection

0e1497b

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

chore: apply clang format

a7d8012

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

github-actions bot added the area/redpanda label Jan 22, 2024

dotnwat requested review from rockwotj, Lazin, nvartolomei and andrwng January 22, 2024 01:07

rockwotj previously approved these changes Jan 22, 2024

View reviewed changes

io: accept exception or exception_ptr fault injection

853f709

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>

dotnwat dismissed rockwotj’s stale review via 853f709 January 22, 2024 16:00

rockwotj approved these changes Jan 22, 2024

View reviewed changes

andrwng approved these changes Jan 22, 2024

View reviewed changes

dotnwat merged commit e261aa8 into redpanda-data:dev Jan 22, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io: add fault injection helpers to persistence layer #16215

io: add fault injection helpers to persistence layer #16215

dotnwat commented Jan 22, 2024

rockwotj left a comment

rockwotj Jan 22, 2024

rockwotj Jan 22, 2024

dotnwat Jan 22, 2024 •

edited

nvartolomei commented Jan 22, 2024

dotnwat commented Jan 22, 2024

rockwotj commented Jan 22, 2024

rockwotj left a comment

dotnwat commented Jan 22, 2024 •

edited

rockwotj commented Jan 22, 2024

io: add fault injection helpers to persistence layer #16215

io: add fault injection helpers to persistence layer #16215

Conversation

dotnwat commented Jan 22, 2024

Backports Required

Release Notes

rockwotj left a comment

Choose a reason for hiding this comment

rockwotj Jan 22, 2024

Choose a reason for hiding this comment

rockwotj Jan 22, 2024

Choose a reason for hiding this comment

dotnwat Jan 22, 2024 • edited

Choose a reason for hiding this comment

nvartolomei commented Jan 22, 2024

dotnwat commented Jan 22, 2024

rockwotj commented Jan 22, 2024

rockwotj left a comment

Choose a reason for hiding this comment

dotnwat commented Jan 22, 2024 • edited

rockwotj commented Jan 22, 2024

dotnwat Jan 22, 2024 •

edited

dotnwat commented Jan 22, 2024 •

edited