Skip to content

Commit

Permalink
Wait to send more pushes until receive has ACKed
Browse files Browse the repository at this point in the history
Fixes: istio#25685

Istio suffers from a problem at large scale (800+ services sequentially
created) with significant churn that Envoy becomes overloaded and
produces a zigzaw pattern in acking results. This slows Istio down by
using a semaphore to signal when a receive has occured and wait for the
semaphore prior to new pushes.

Co-Authored-By: John Howard <howardjohn@google.com>
  • Loading branch information
Steven Dake and howardjohn committed Oct 22, 2020
1 parent ddf8cc4 commit c28f58c
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
9 changes: 9 additions & 0 deletions pilot/pkg/features/pilot.go
Expand Up @@ -301,6 +301,15 @@ var (
"If enabled, pilot will set the incremental flag of the options in the mcp controller "+
"to true, and then galley may push data incrementally, it depends on whether the "+
"resource supports incremental. By default, this is false.").Get()

EnableFlowControl = env.RegisterBoolVar(
"PILOT_ENABLE_FLOW_CONTROL",
false,
"If enabled, pilot will wait for the completion of a receive operation before" +
"executing a push operation. This is a form of flow control and is useful in" +
"environments with high rates of push requests to each gateway. By default," +
"this is false.").Get()

// CentralIstioD will be Deprecated: TODO remove in 1.9 in favor of `ExternalIstioD`
CentralIstioD = env.RegisterBoolVar("CENTRAL_ISTIOD", false,
"If this is set to true, one Istiod will control remote clusters including CA.").Get()
Expand Down
2 changes: 1 addition & 1 deletion pilot/pkg/xds/ads.go
Expand Up @@ -223,6 +223,7 @@ func (s *DiscoveryServer) StreamAggregatedResources(stream discovery.AggregatedD
return err
}
con := newConnection(peerAddr, stream)

con.Identities = ids

// Do not call: defer close(con.pushChannel). The push channel will be garbage collected
Expand Down Expand Up @@ -766,7 +767,6 @@ func (conn *Connection) send(res *discovery.DiscoveryResponse) error {

select {
case <-t.C:
// TODO: wait for ACK
adsLog.Infof("Timeout writing %s", conn.ConID)
xdsResponseWriteTimeouts.Increment()
return status.Errorf(codes.DeadlineExceeded, "timeout sending")
Expand Down

0 comments on commit c28f58c

Please sign in to comment.