Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsub: dynamically choose the number of messages for ReceiveBatch #1200

Merged
merged 6 commits into from
Jan 31, 2019

Commits on Jan 26, 2019

  1. pubsub: dynamically choose the number of messages for ReceiveBatch

    To decide how many messages to pull at a time, we aim for the
    in-memory queue of messages to be a certain size. That gives us a
    buffer of messages to draw from, ensuring high throughput, without
    pulling so many messages that the unconsumed ones languish.
    
    We measure the size by time instead of message count. Time is more
    relevant, because ack deadlines are expressed in time, and it's easier
    to think about lost work (in the event of a crash) in terms of time
    lost rather than messages lost.
    
    We keep track of the average time it takes to process a message. Then
    we can convert a queue size in time to a number of messages.
    
    We compute processing time by measuring the time between when Receive
    returns and when it is next called. Although this is incorrect in the
    short term, because multiple goroutines may call Receive at the same time,
    in the long run it is accurate enough.
    
    We rejected the obvious alternative, measuring time from Receive to
    Ack, because not every message will be acked. It is perfectly
    reasonable for a subscriber to nack (or fail to ack) a significant
    fraction of the messages it receives, but processing time for those
    unacked messages should still be included in the calculation of how
    many messages to pull.
    
    This change significantly improves the Receive benchmark -- messages
    per second is more than quadrupled. But there is more work to do. We
    should pre-emptively pull messages when the queue size gets low, and
    we should issue multiple ReceiveBatch calls concurrently.
    
    Besides performance, this change also improves behavior over current
    master at very low processing rates. Currently we pull a constant 10
    messages per ReceiveBatch. If it takes a long time to process one
    message, then the other 9 will sit in RAM and may expire. With this
    change, we will pull just one message at a time if need be.
    
    Addresses google#691.
    jba committed Jan 26, 2019
    Configuration menu
    Copy the full SHA
    c1cb764 View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2019

  1. reviewer comments

    jba committed Jan 28, 2019
    Configuration menu
    Copy the full SHA
    d0e8dc0 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2019

  1. switched to simple moving average

    jba committed Jan 29, 2019
    Configuration menu
    Copy the full SHA
    97a23fa View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2019

  1. Configuration menu
    Copy the full SHA
    60b6aef View commit details
    Browse the repository at this point in the history
  2. measure process time by ack

    jba committed Jan 30, 2019
    Configuration menu
    Copy the full SHA
    c839c14 View commit details
    Browse the repository at this point in the history
  3. init avg time to first point

    jba committed Jan 30, 2019
    Configuration menu
    Copy the full SHA
    891db08 View commit details
    Browse the repository at this point in the history