Skip to content

StreamingPull: new subscribers get zero messages - existing subscribers hoard entire backlog with default flow control #2715

@jeyapaul1990

Description

@jeyapaul1990

Summary

When using StreamingPull with the default flow control settings (1000 outstanding messages), existing subscribers hoard messages in their client buffer and gRPC stream buffer. When new subscribers connect (e.g., via Kubernetes HPA autoscaling), they receive zero messages because all messages are locked inside existing subscribers' memory.

Scaling to zero and scaling back up fixes the problem immediately — all subscribers receive messages evenly. This confirms the issue is with how messages are distributed to existing vs new subscribers.

Environment

  • Client library: google-cloud-pubsub (Java)
  • Subscription type: StreamingPull
  • Infrastructure: GKE (Google Kubernetes Engine) with HPA (Horizontal Pod Autoscaler)
  • Processing: Single-threaded message processing (1 executor thread), each message triggers a large file download (~5 MB), so processing is slow (~30 seconds per message)
  • Configuration: parallelPullCount = 2, executorProviderThreadCount = 1, no explicit setFlowControlSettings()

Problem Description

Setup

  • 5 subscriber pods running continuously (HPA min replicas = 5, max = 50)
  • Each pod uses StreamingPull with default flow control (1000 messages outstanding)
  • Each pod processes 1 message at a time (single threaded)

What happens

  1. A burst of messages arrives (e.g., 75,000 messages)
  2. The 5 existing pods receive all messages into their buffers
  3. HPA detects high lag and scales to 50 pods
  4. The new 45 pods connect but receive ZERO messages
  5. Only the original 5 pods process messages
  6. Lag keeps increasing despite 50 pods running

What fixes it

  • Scale to 0 (all pods die, all messages released)
  • Scale back to 50
  • All 50 pods now receive messages evenly
  • Lag drops to 0 within 15-30 minutes

Root Cause Analysis

1. Two buffers inside each subscriber pod

Based on the explanation by a Google engineer in googleapis/python-pubsub#237, there are two separate buffers inside each subscriber:

Buffer Location Size Ack deadline extended?
Client buffer Inside the application (RAM) Controlled by maxOutstandingElementCount (default: 1000) Yes - client library extends automatically
gRPC stream buffer Between gRPC layer and application (RAM) Server sends messages in packets up to 10 MB No - messages can expire here

The Google engineer stated:

"Unless the flow control is being applied at the gRPC level, messages will be buffered in the gRPC stream, where more than the specified number of messages can be present due to the Pub/Sub server sending out messages in packets ≤ 10 MiB in size. We currently don't support server-side flow control."

2. Message hoarding calculation

With a message size of ~1 KB:

What Calculation Result
Messages in one 10 MB gRPC packet 10,000 KB / 1 KB ~10,000 messages
gRPC buffer per pod (2 streams) 2 × 10,000 ~20,000 messages
Client buffer per pod (default) 1000 1,000 messages
Total per pod 20,000 + 1,000 ~21,000 messages
Total for 5 pods 5 × 21,000 ~105,000 messages

With a lag of 75,000 messages, all messages fit inside the 5 existing pods' buffers. Zero messages are available for new pods.

3. Server routing favors existing subscribers

From Google's troubleshooting documentation:

"The system dynamically directs more messages towards consumers that demonstrate higher capacity — those that acknowledge messages quickly and are not constrained by their flow control settings."

"Messages could be outstanding to clients already, and a backlog of unacknowledged messages doesn't necessarily mean you'll receive those messages on your next pull request."

Existing pods have ack history and appear to have high capacity (1000 outstanding messages, not constrained by flow control). New pods have zero history. The server keeps routing to existing pods.

4. Why the client buffer is the critical problem

  • Client buffer (1000 messages): Ack deadline is extended automatically by the client library. These messages are permanently locked to the pod until processed. With 1 thread and ~30 seconds per message, processing 1000 messages takes ~8 hours.

  • gRPC buffer (~20,000 messages): Ack deadline is NOT extended. These messages expire and return to the server. However, the server often re-delivers them to the same existing pods (because those pods appear to have higher capacity).

The combination of 1000 permanently-locked messages per pod + server routing preference = new subscribers are starved.

Workaround / Fix

Setting maxOutstandingElementCount to a low value (e.g., 10) significantly improves message distribution:

Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
    .setFlowControlSettings(
        FlowControlSettings.newBuilder()
            .setMaxOutstandingElementCount(10L)
            .setLimitExceededBehavior(FlowController.LimitExceededBehavior.Block)
            .build()
    )
    .build();

With maxOutstandingElementCount = 10:

  • Each pod locks only 10 messages (instead of 1000) with extended ack deadlines
  • Pods become "constrained by flow control" almost instantly
  • Server routing shifts to new pods that are "not constrained"
  • New subscribers receive messages

However, the gRPC stream buffer still receives ~10,000 messages in the initial 10 MB packet (since there is no server-side flow control). These messages expire and return to the server, but this creates unnecessary churn.

Feature Request

Server-side flow control for StreamingPull

The Google engineer in #237 stated: "We currently don't support server-side flow control."

If the server respected the client's maxOutstandingElementCount when sending messages (i.e., not sending more than the client's limit), it would:

  1. Prevent gRPC buffer flooding (no unnecessary 10 MB packets)
  2. Reduce message expiry and re-delivery churn
  3. Improve message distribution to new subscribers immediately
  4. Make Pub/Sub work better with Kubernetes HPA autoscaling

Documentation improvement

The current documentation does not clearly explain:

  1. The two-buffer model (client buffer + gRPC stream buffer)
  2. How default flow control (1000 messages) causes message hoarding with slow processors
  3. The impact on Kubernetes HPA autoscaling (new pods getting zero messages)
  4. The recommended flow control settings for slow-processing subscribers

Adding a section to the StreamingPull documentation or troubleshooting guide about this scenario would help many users running Pub/Sub on Kubernetes with HPA.

References

  1. googleapis/python-pubsub#237 - Google engineer confirms two-buffer behavior
  2. Troubleshooting pull subscriptions - Server routing based on flow control constraints
  3. Testing Cloud Pub/Sub clients - Default flow control is 1000 messages

Metadata

Metadata

Assignees

Labels

api: pubsubIssues related to the googleapis/java-pubsub API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions