Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssx/work_queue: support recursive tasks #15939

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Jan 3, 2024

There are situations where we need to support recursive tasks for the
transform subsystem, and in order to do that we can't keep a single
variable of the tail, as it's possible for recursive tasks to try to
tail off a single future. Fix this by tracking the pending work in an
explicit data structure.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Bug Fixes

  • Fixes a crash if a WebAssembly function is deployed that immediately crashes.

There are situations where we need to support recursive tasks for the
transform subsystem, and in order to do that we can't keep a single
variable of the tail, as it's possible for recursive tasks to try to
tail off a single future. Fix this by tracking the pending work in an
explicit data structure.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj rockwotj self-assigned this Jan 3, 2024
@rockwotj rockwotj added this to the v23.3.x-next milestone Jan 3, 2024
Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, though I'm not sure I fully understand what the faulty behavior.

it's possible for recursive tasks to try to tail off a single future

like multiple .then calls on the same tail future somehow? I can't picture it 😕

@@ -146,4 +146,21 @@ TEST(WorkQueue, CanShutdownWithDelayedTasks) {
EXPECT_FALSE(a2);
}

TEST(WorkQueue, RecursiveSubmit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what would the result of this test have been prior to this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It segfaulted in release mode only

@piyushredpanda piyushredpanda modified the milestones: v23.3.x-next, v23.3.2 Jan 4, 2024
@rockwotj
Copy link
Contributor Author

rockwotj commented Jan 4, 2024

It's a little tricky, I can try and explain.

In release mode, seastar can run the functions provided to .then before .then returns under the condition there is time budget left in the current task.

So when we call .then we can run a function that ends up calling submit on the queue, however because .then has not returned up the callstack, we're calling .then a future twice (future reuse) before _tail has been reassigned.

It's certainly very subtle. If that doesn't make sense, let me explain on a huddle or something in a few minutes.

The fix is to explicitly track futures in a structure instead of doing this chaining in submit, and manage the queue's fiber more explicitly in the loop.

@rockwotj
Copy link
Contributor Author

rockwotj commented Jan 4, 2024

In the interest of fixes, I'm going to merge. But lets still sync up, it's very subtle and I doubt my explanation above is enough

@rockwotj rockwotj merged commit a870f12 into redpanda-data:dev Jan 4, 2024
20 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@oleiman
Copy link
Member

oleiman commented Jan 4, 2024

In release mode, seastar can run the functions provided to .then before .then returns under the condition there is time budget left in the current task.

That's plenty; I didn't know that. Pretty wild...

Thanks!

@rockwotj rockwotj deleted the work-queue-letrec branch January 4, 2024 15:07
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants