Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject PP/SR HTTP requests if cross shard semaphore has been exhausted #15977

Merged
merged 8 commits into from
Jan 16, 2024

Conversation

michael-redpanda
Copy link
Contributor

Fixes: #14116

Cross shard calls in SR/PP utilize an SMP resource group to limit the number of cross shard calls that
are made. However, if the semaphore has been fully consumed, the call simply hangs. Customers have
experienced issues where, under heavy load, SR appears to hang and not response. Now, instead, once the
semaphore has been completely exhausted, new HTTP requests that attempt cross shard calls will be
rejected with a 500 error.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Improvements

  • SR/PP will now reply with a 500 error if an internal service semaphore has been completely exhausted

Comment on lines 2816 to 2839
, pp_sr_smp_max_non_local_requests(
*this,
"pp_sr_smp_max_non_local_requests",
"Maximum number of x-core requests pending in Panda Proxy and Schema "
"Registry seastar::smp group. (for more details look at "
"`seastar::smp_service_group` documentation)",
{.needs_restart = needs_restart::yes, .visibility = visibility::user},
std::nullopt)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we backport things if there's a new cluster config (right?). If we do want this backported, I can drop the commits that add this and submit the change in a separate PR.

Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
@graphcareful
Copy link
Contributor

Do we need the SMP resource group anymore?

rockwotj
rockwotj previously approved these changes Jan 10, 2024
src/v/pandaproxy/schema_registry/util.h Outdated Show resolved Hide resolved
@michael-redpanda
Copy link
Contributor Author

Do we need the SMP resource group anymore?

I'd like @BenPope 's take on this question, I think it's a valid one.

@BenPope
Copy link
Member

BenPope commented Jan 11, 2024

Do we need the SMP resource group anymore?

I'd like @BenPope 's take on this question, I think it's a valid one.

I'm not sure what the suggested alternative is?

src/v/pandaproxy/server.h Outdated Show resolved Hide resolved
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to limit the number of in-flight requests; so does it fix #14116?

I.e., would it be better to limit the number of in-flight requests by applying backpressure rather than rejecting valid requests?

@michael-redpanda
Copy link
Contributor Author

michael-redpanda commented Jan 11, 2024

This doesn't seem to limit the number of in-flight requests; so does it fix #14116?

The issue that we had in relation to the incident in #14116 was that new requests were just hanging out in the semaphores wait list but we had no idea that that was happening. By checking whether the semaphore has been exhausted before initiating the cross-shard call, we can reject the new request.

I.e., would it be better to limit the number of in-flight requests by applying backpressure rather than rejecting valid requests?

I'm not sure how the behavior to the client would be any different. How would you limit the number of in-flight requests without rejecting new ones?

@BenPope
Copy link
Member

BenPope commented Jan 11, 2024

This doesn't seem to limit the number of in-flight requests; so does it fix #14116?

The issue that we had in relation to the incident in #14116 was that new requests were just hanging out in the semaphores wait list but we had no idea that that was happening. By checking whether the semaphore has been exhausted before initiating the cross-shard call, we can reject the new request.

It's not just new requests, though, it's requests that may be half-processed.

I.e., would it be better to limit the number of in-flight requests by applying backpressure rather than rejecting valid requests?

I'm not sure how the behavior to the client would be any different. How would you limit the number of in-flight requests without rejecting new ones?

By not processing them until semaphore can be obtained (or timed-out), Unfortunately, we don't yet stream the body, so it's not ideal for something like a produce right now.

Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
If inflight semaphore is exhausted, then return a 429 error.

Signed-off-by: Michael Boquard <michael@redpanda.com>
@michael-redpanda
Copy link
Contributor Author

michael-redpanda commented Jan 12, 2024

Force push 5a4d0ed:

  • Refactored solution to add new adjustable sempahore that prevents too many inflight requests, per shard

@graphcareful
Copy link
Contributor

graphcareful commented Jan 16, 2024

Do we need the SMP resource group anymore?

I'd like @BenPope 's take on this question, I think it's a valid one.

I'm not sure what the suggested alternative is?

Within the smp resource config there is an option:

max_nonlocal_requests: The maximum number of non-local requests that execute on a shard concurrently

Correct me if i'm wrong but doesn't this configurable option solve the same issue that Mike is attempting to solve with the adjustable semaphore?

EDIT: I guess i'll try to answer my own question, maybe the logical difference here is the smp group enforces the max is number of futures vs what this PR is attempting to solve which is the max number of "requests"

@BenPope
Copy link
Member

BenPope commented Jan 16, 2024

Within the smp resource config there is an option:

max_nonlocal_requests: The maximum number of non-local requests that execute on a shard concurrently

Correct me if i'm wrong but doesn't this configurable option solve the same issue that Mike is attempting to solve with the adjustable semaphore?

EDIT: I guess i'll try to answer my own question, maybe the logical difference here is the smp group enforces the max is number of futures vs what this PR is attempting to solve which is the max number of "requests"

I suspect that the problem is that cross-core requests are being made several times without "unwinding" first; a request can get stuck half-way waiting on the semaphore. Timing it out isn't ideal as only some of the work may have been applied.

Limiting requests much lower than the nonlocal limit should resolve this problem.

Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Comment on lines +132 to +133
_inflight_config_binding.watch(
[this]() { _inflight_sem.set_capacity(_inflight_config_binding()); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_inflight_sem could be destructed at this point (I don't see another mechanism that unsubscribes the watch, or otherwise sequences the destruction order.

Comment on lines +530 to +531
_inflight_config_binding.watch(
[this]() { _inflight_sem.set_capacity(_inflight_config_binding()); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_inflight_sem could be destructed at this point (I don't see another mechanism that unsubscribes the watch, or otherwise sequences the destruction order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't destruction sequence order dictated by the class? _inflight_config_binding is constructed after _inflight_sem and therefor should be destructed before the semaphore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I find relying on destructor order is incredibly sneaky and usually deserves a comment somewhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't destruction sequence order dictated by the class? _inflight_config_binding is constructed after _inflight_sem and therefor should be destructed before the semaphore?

You're correct.

@BenPope BenPope self-requested a review January 16, 2024 19:32
@michael-redpanda michael-redpanda merged commit a99419e into redpanda-data:dev Jan 16, 2024
19 checks passed
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@BenPope
Copy link
Member

BenPope commented Jan 17, 2024

Why is this considered "not a bug fix" and not backported?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit number of inflight HTTP requests
5 participants