Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: add fault injection helpers to persistence layer #16215

Merged
merged 4 commits into from
Jan 22, 2024

Conversation

dotnwat
Copy link
Member

@dotnwat dotnwat commented Jan 22, 2024

io: add fault injection helpers to persistence layer

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
rockwotj
rockwotj previously approved these changes Jan 22, 2024
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a suggestions to make tests a little more readable.

class fault_injection : std::exception {};

TYPED_TEST(PersistenceTest, ThrowCreate) {
this->fs.fail_next_create(std::make_exception_ptr(fault_injection()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is a fair amount of noise with the std::make_exception_ptr(fault_injection()). Can we either:

Consolidate these into a helper function or create a helper method to call make_exeception_ptr like seastar does in a couple of places? Ex: https://github.com/scylladb/seastar/blob/4b01caf666187ebae8344b8a0b6b73974b0876d9/include/seastar/core/abort_source.hh#L172-L175

Comment on lines 179 to 184
return maybe_fail_open().then([path = std::move(path)] {
const auto flags = seastar::open_flags::rw;
auto file = co_await seastar::open_file_dma(path.string(), flags);
co_return seastar::make_shared<disk_file>(std::move(file));
return seastar::open_file_dma(path.string(), flags).then([](auto file) {
return seastar::make_ready_future<seastar::shared_ptr<persistence::file>>(seastar::make_shared<disk_file>(std::move(file)));
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to coro4lyfe?

Copy link
Member Author

@dotnwat dotnwat Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha. given this is the bottom of the stack, every I/O will incur the performance overhead of coroutine (see Travis' talk a couple months ago).

@nvartolomei
Copy link
Contributor

Quickly skimmed the code and wondering if you have considered adding a wrapper over persistence which does the failure injecting instead of modifying the actual persistence implementations? Something like

class fij_persistence : persistence {
  fij_persistence(persistence p);
}

auto p = fij_persistence{disk_persistence{}};

You wouldn't have to duplicate the failure injection logic then in every single persistence implementation (we don't expect many though)

@dotnwat
Copy link
Member Author

dotnwat commented Jan 22, 2024

@nvartolomei

Quickly skimmed the code and wondering if you have considered adding a wrapper over persistence which does the failure injecting instead of modifying the actual persistence implementations? Something like

Great idea. I did this originally and it worked ok but I am still on the fence if one is better than the other. The downside of this is that since the infrastructure I'm building up still operated on the super class persistence* i had this annoying need to dynamic_cast into the fault-injection wrapper type so I could inject the failures.

Then I thought, well, is this just for testing? If so, then that's fine. But ultimately it seems like it might be beneficial to simply have fault injection be built-in so it's easy to use rather than needing to swap out implementations etc...

Any thoughts on how to do that better?

@rockwotj
Copy link
Contributor

More complex, but you can always decouple injecting from the persistence wrapper. Example usage:

auto [failable_persistence, failure_injector] =
        wrap_in_failure_injector(std::move(persistence));
failure_injector.fail_next_open();
EXPECT_THROWS(co_await failable_persistence.open());

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, happy to review a switch to decoupling failure injecting from persistence itself.

@dotnwat
Copy link
Member Author

dotnwat commented Jan 22, 2024

LGTM, happy to review a switch to decoupling failure injecting from persistence itself.

Thanks. I'm definitely on the same page with your and Nicoale that there are a few ways to approach fault injection. There is totally some aspect of fault injection related to the interfaces for injection and wiring things up into a higher level (a few cases exist in the tree). However at the lowest level I'm not sure there is a benefit to a wrappers/inheritance approach if we have the goal of being able to inject faults into a release build and injecting faults outside of a unit test context--swapping out an implementation at that lowest level seems problematic?

@rockwotj
Copy link
Contributor

swapping out an implementation at that lowest level seems problematic?

I don't think so if the interface is well defined? What problems do you foresee?

benefit to a wrappers/inheritance approach

I am ambivalent about the decorator pattern here especially because we're only repeating the error injecting twice (my DRY alarm is set to 3), more of a benefit in my mind is decoupling fault injection, because now it's impossible at a typesystem level for us to need to pass the persistence layer to a component that doesn't need it (ie. admin api to inject failures or something) and now lower-level bits in the io subsystem can't have access to injecting failures (structurally) without explicitly passing in the injector object.

Anyways, I don't feel strongly about this (I did approve after all), but I kind of like the decoupling for the reasons I mention.

@dotnwat dotnwat merged commit e261aa8 into redpanda-data:dev Jan 22, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants