Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ACK deadline set for received messages can be too low #416

Merged
merged 6 commits into from May 26, 2021

Conversation

@plamut
Copy link
Contributor

@plamut plamut commented May 21, 2021

Fixes #413.

This PR makes sure that the ACK deadline set for the received messages is always consistent with what the leaser uses internally when extending the ACK deadlines for the leased messages.

See the issue description and a comment explaining a possible sequence of events that lead to a bug.

PR checklist

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
When a message is received, the initial MODACK value used should be
the same value currently used by leaser. This makes sure the leaser
will wake up and extend the ACK deadline before the initial one
expires.
@plamut plamut requested a review from as a code owner May 21, 2021
@google-cla google-cla bot added the cla: yes label May 21, 2021
If flow_control.max_duration_per_lease_extension is set to too low a
value, it is adjusted to the minimum ACK deadline.
@plamut
Copy link
Contributor Author

@plamut plamut commented May 21, 2021

Hmm... this PR fixes the issue caused by directly using the p99 value that can be different from manager.ack_deadline used by the leaser, but even if using manager.ack_deadline, a message can still expire.

Consider - manager.ack_deadline is currently 600, because processing on of the messages took 1000 = minutes. The leaser goes to sleep for 0.9 * 600 = 540 seconds. While the leaser is asleep, more messages are received and processed quickly, dropping the histogram's p99 value to 10 seconds, consequently adjusting the value of manager._ack_deadline, too.

When yet more messages arrive, the shortened ACK deadline will be used for them while the leaser is still asleep (due to the previous "long" deadline). The leaser will not wake in time to extend the deadlines of these new messages.

Bottom line - the ACK deadline should probably not change while the leaser is asleep so that _on_response() does not use a deadline that could be shorter than the current sleep period of the leaser.

Loading

The ACK deadline is dynamically being adjusted based on ACK data,
but an updated deadline value must not be used for the new messages
that arrive while the leaser thread is still sleeping.

Doing so could result in the messages' ACK deadlines expiring before
the leaser wakes up and has a chance of extending the deadlines for
these messages. An updated ACK deadline value for new messages is thus
only used when the leaser starts using it, too.
@plamut
Copy link
Contributor Author

@plamut plamut commented May 21, 2021

Fixed, only the leaser now updates the manager.ack_deadline value. This means that the leased messages will never use an ACK deadline that is shorter than the duration of the leaser's current sleep period.

Loading

@gcf-merge-on-green gcf-merge-on-green bot merged commit e907f6e into googleapis:master May 26, 2021
8 checks passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants