Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pubsub): set default stream ACK deadline to subscriptions' #9268

Merged
merged 2 commits into from
Sep 27, 2019

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Sep 23, 2019

Closes #9252.

This PR mitigates the reported issue by using the subscription's max ACK deadline when establishing a streaming pull, instead of the previous optimistic ACK deadline of 10 seconds.

(when creating a streaming pull, the ACK timing histogram has no data yet, thus its 99 % value defaults to 10 seconds... also see the commit message for some extra info)

Additionally, it adds a debug log message with the ACK deadline used for the streaming pull, as that important piece of information was missing previously.

How to test

Run the example from the issue description, and verify that the backend does not send any duplicate messages, and the subscription's ACK deadline is respected (300 s in that particular case).

Details

The cause of the bug was that the client overrode the max ACK deadline on the subscription with the default minimum deadline of 10 seconds.

This became problematic for the received messages that exceeded the FlowControl.max_messages setting, and were thus put on hold in a buffer. Since these messages were not lease-managed, and their ACK deadlines were not being extended, the overridden ACK deadline of 10 seconds was easily hit, resulting in re-delivery.

There are several options on how to deal with these on hold messages:

  • Discard them immediately, and set their ACK deadlines to 0.
    This would cause the backend to re-send them immediately, which would work well with multiple subscribers. However, if using a single subscriber, low max_messages setting, and processing each message takes a lot of time, large published message batches would cause a lot of unnecessary traffic.

    The excessive messages in a batch (those above max_messages) would be discarded and re-delivered over and over again, because each time the client would dispatch only a small portion of the batch to user's callbacks.

  • Add all received messages to lease management, including those put on hold.
    This would avoid ACK timeouts and message re-delivery, but could shoot the leaser's load well over 1.0. When user code would process and acknowledge some of the messages, the client would remove them from the lease management, but the load could still stay above 1.0

    Since the leaser would still be overloaded, no additional messages would be dispatched to the callbacks, and the stream would not be resumed --> deadlock (there would be nothing to trigger resuming the streaming pull).

    While there could be a way around that, it would likely require rewriting significant chunks of the streaming pull logic, risking new bugs.

  • Ignoring the messages on hold until the load allows processing them.
    This is the path chosen. Only the messages that are not put on hold are added to the lease management, while the rest sit in the buffer, waiting to be released when the load allows for it. If they are released soon enough, the auto-lease management steps in.

    While these messages could still remain in the buffer for too long, missing the ACK deadline, those messages would be re-delivered to other subscribers, as the overloaded subscriber's stream would be paused in the meantime (the subscriber does not resume the stream, if it has any messages on hold).

    In the case of a single subscriber client, the messages would indeed be re-delivered to it, but on the other hand the ACK histogram times would pick that up and gradually extend the ACK deadlines that the auto-leaser sets for the messages.

@plamut plamut added the api: pubsub Issues related to the Pub/Sub API. label Sep 23, 2019
@plamut plamut requested a review from pradn September 23, 2019 16:03
@plamut
Copy link
Contributor Author

plamut commented Sep 23, 2019

@pradn Assigned to you, but feel free to delegate to anybody else from the team that will take over the PubSub lib. Also, feel free to ping me for any information about the inner workings of the streaming pull manager.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Sep 23, 2019
@plamut plamut changed the title PubSub: Set default stream ACK deadline to subscriptions' fix(pubsub): set default stream ACK deadline to subscriptions' Sep 24, 2019
@plamut plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 24, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 24, 2019
@plamut plamut force-pushed the iss-9252 branch 2 times, most recently from 86c50cd to dfc21f1 Compare September 24, 2019 07:20
Copy link
Contributor

@pradn pradn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

nit: spelling in commit message ("streamimng")

pubsub/tests/system.py Outdated Show resolved Hide resolved
When subscribing, it makes sense to use the configured subscription's
maximum ACK deadline for the streaming pull, instead of an optimistic
minimum of 10 seconds.

Using an optimistic deadline affects messages that are put on hold and
are not lease managed, because by the time the client dispatches them
to the user's callback, the optimistic ACK deadline could have already
been missed, resulting in the backend unnecessary re-sending those
messages, even if the subscription's ACK deadline has not been hit yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pub/Sub: buffered messages delivered multiple times ignoring acknowledgment deadline
5 participants