-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestReachability flake; cluster warming #39720
Comments
Investigated this one a bit. warm: 2022-06-30T17:32:43Z/226 pushes look same for good and bad bad disconnects at 17:32:55.308978Z, reconects 2s later reconnect has EDS first ????. Pushed resources seem the same on reconnect all the ready ones are vm, fake stateful, headless VM is eds but empty? So a few conclusions
|
https://prow.istio.io/view/gs/istio-prow/logs/integ-security-multicluster_istio_postsubmit/1542825359978795008 is another example with eds request first |
can reproduce with diff --git a/pilot/pkg/networking/core/v1alpha3/cluster.go b/pilot/pkg/networking/core/v1alpha3/cluster.go
index a1c4dd9f9b..8e1a639587 100644
--- a/pilot/pkg/networking/core/v1alpha3/cluster.go
+++ b/pilot/pkg/networking/core/v1alpha3/cluster.go
@@ -135,52 +135,60 @@ func isClusterForServiceRemoved(cluster string, hostName string, svc *model.Serv
// buildClusters builds clusters for the proxy with the services passed.
func (configgen *ConfigGeneratorImpl) buildClusters(proxy *model.Proxy, req *model.PushRequest,
- services []*model.Service) ([]*discovery.Resource, model.XdsLogDetails) {
+ services []*model.Service,
+) ([]*discovery.Resource, model.XdsLogDetails) {
clusters := make([]*cluster.Cluster, 0)
- resources := model.Resources{}
- envoyFilterPatches := req.Push.EnvoyFilters(proxy)
- cb := NewClusterBuilder(proxy, req, configgen.Cache)
- instances := proxy.ServiceInstances
- cacheStats := cacheStats{}
- switch proxy.Type {
- case model.SidecarProxy:
- // Setup outbound clusters
- outboundPatcher := clusterPatcher{efw: envoyFilterPatches, pctx: networking.EnvoyFilter_SIDECAR_OUTBOUND}
- ob, cs := configgen.buildOutboundClusters(cb, proxy, outboundPatcher, services)
- cacheStats = cacheStats.merge(cs)
- resources = append(resources, ob...)
- // Add a blackhole and passthrough cluster for catching traffic to unresolved routes
- clusters = outboundPatcher.conditionallyAppend(clusters, nil, cb.buildBlackHoleCluster(), cb.buildDefaultPassthroughCluster())
- clusters = append(clusters, outboundPatcher.insertedClusters()...)
-
- // Setup inbound clusters
- inboundPatcher := clusterPatcher{efw: envoyFilterPatches, pctx: networking.EnvoyFilter_SIDECAR_INBOUND}
- clusters = append(clusters, configgen.buildInboundClusters(cb, proxy, instances, inboundPatcher)...)
- // Pass through clusters for inbound traffic. These cluster bind loopback-ish src address to access node local service.
- clusters = inboundPatcher.conditionallyAppend(clusters, nil, cb.buildInboundPassthroughClusters()...)
- clusters = append(clusters, inboundPatcher.insertedClusters()...)
- default: // Gateways
- patcher := clusterPatcher{efw: envoyFilterPatches, pctx: networking.EnvoyFilter_GATEWAY}
- ob, cs := configgen.buildOutboundClusters(cb, proxy, patcher, services)
- cacheStats = cacheStats.merge(cs)
- resources = append(resources, ob...)
- // Gateways do not require the default passthrough cluster as they do not have original dst listeners.
- clusters = patcher.conditionallyAppend(clusters, nil, cb.buildBlackHoleCluster())
- if proxy.Type == model.Router && proxy.MergedGateway != nil && proxy.MergedGateway.ContainsAutoPassthroughGateways {
- clusters = append(clusters, configgen.buildOutboundSniDnatClusters(proxy, req, patcher)...)
+
+ for i := 0; i < 100000; i++ {
+ c := &cluster.Cluster{
+ Name: fmt.Sprint(i),
+ AltStatName: req.Push.PushVersion,
+ ClusterDiscoveryType: &cluster.Cluster_Type{Type: cluster.Cluster_EDS},
}
- clusters = append(clusters, patcher.insertedClusters()...)
+ maybeApplyEdsConfig(c)
+ clusters = append(clusters, c)
}
+ resources := model.Resources{}
+ //envoyFilterPatches := req.Push.EnvoyFilters(proxy)
+ //cb := NewClusterBuilder(proxy, req, configgen.Cache)
+ //instances := proxy.ServiceInstances
+ //cacheStats := cacheStats{}
+ //switch proxy.Type {
+ //case model.SidecarProxy:
+ // // Setup outbound clusters
+ // outboundPatcher := clusterPatcher{efw: envoyFilterPatches, pctx: networking.EnvoyFilter_SIDECAR_OUTBOUND}
+ // ob, cs := configgen.buildOutboundClusters(cb, proxy, outboundPatcher, services)
+ // cacheStats = cacheStats.merge(cs)
+ // resources = append(resources, ob...)
+ // // Add a blackhole and passthrough cluster for catching traffic to unresolved routes
+ // clusters = outboundPatcher.conditionallyAppend(clusters, nil, cb.buildBlackHoleCluster(), cb.buildDefaultPassthroughCluster())
+ // clusters = append(clusters, outboundPatcher.insertedClusters()...)
+ //
+ // // Setup inbound clusters
+ // inboundPatcher := clusterPatcher{efw: envoyFilterPatches, pctx: networking.EnvoyFilter_SIDECAR_INBOUND}
+ // clusters = append(clusters, configgen.buildInboundClusters(cb, proxy, instances, inboundPatcher)...)
+ // // Pass through clusters for inbound traffic. These cluster bind loopback-ish src address to access node local service.
+ // clusters = inboundPatcher.conditionallyAppend(clusters, nil, cb.buildInboundPassthroughClusters()...)
+ // clusters = append(clusters, inboundPatcher.insertedClusters()...)
+ //default: // Gateways
+ // patcher := clusterPatcher{efw: envoyFilterPatches, pctx: networking.EnvoyFilter_GATEWAY}
+ // ob, cs := configgen.buildOutboundClusters(cb, proxy, patcher, services)
+ // cacheStats = cacheStats.merge(cs)
+ // resources = append(resources, ob...)
+ // // Gateways do not require the default passthrough cluster as they do not have original dst listeners.
+ // clusters = patcher.conditionallyAppend(clusters, nil, cb.buildBlackHoleCluster())
+ // if proxy.Type == model.Router && proxy.MergedGateway != nil && proxy.MergedGateway.ContainsAutoPassthroughGateways {
+ // clusters = append(clusters, configgen.buildOutboundSniDnatClusters(proxy, req, patcher)...)
+ // }
+ // clusters = append(clusters, patcher.insertedClusters()...)
+ //}
for _, c := range clusters {
resources = append(resources, &discovery.Resource{Name: c.Name, Resource: util.MessageToAny(c)})
}
- resources = cb.normalizeClusters(resources)
+ //resources = cb.normalizeClusters(resources)
- if cacheStats.empty() {
- return resources, model.DefaultXdsLogDetails
- }
- return resources, model.XdsLogDetails{AdditionalInfo: fmt.Sprintf("cached:%v/%v", cacheStats.hits, cacheStats.hits+cacheStats.miss)}
+ return resources, model.XdsLogDetails{}
}
func shouldUseDelta(updates *model.PushRequest) bool {
diff --git a/pilot/pkg/xds/ads.go b/pilot/pkg/xds/ads.go
index e0cfa8ccff..a66e7df456 100644
--- a/pilot/pkg/xds/ads.go
+++ b/pilot/pkg/xds/ads.go
@@ -162,6 +162,9 @@ func (s *DiscoveryServer) receive(con *Connection, identities []string) {
}
// This should be only set for the first request. The node id may not be set - for example malicious clients.
if firstRequest {
+ if req.TypeUrl != v3.ClusterType && req.TypeUrl != v3.SecretType {
+ log.Fatalf("unexpected type %v", req.TypeUrl)
+ }
// probe happens before envoy sends first xDS request
if req.TypeUrl == v3.HealthInfoType {
log.Warnf("ADS: %q %s send health check probe before normal xDS request", con.PeerAddr, con.ConID)
Limit envoy cpu: send lots of pushes: |
See envoyproxy/envoy#13009 for details Fixes istio#38709 (previously 'fixed', but really the fix was a workaround) Fixes istio#39720
* xds: respond to requests previously miscategorized as ACKs See envoyproxy/envoy#13009 for details Fixes #38709 (previously 'fixed', but really the fix was a workaround) Fixes #39720 * fix tests * Make integ tests more aggressive
Currently, we do not log incremental pushes at Info level. The intent behind this is to avoid spam when we have large endpoint churn. However, because we also do incremental pushes for Full pushes now, we are also hiding these logs. These logs are both critical to debugging (things like istio#39720, etc) and not spammy -- while the `Full=false` pushes may add thousands of messages, this change only adds at most 1 log per push/proxy. For these types of pushes I don't see a benefit to excluding only EDS. Additionally, fix SDS to correctly assert it is incremental (when it is).
Currently, we do not log incremental pushes at Info level. The intent behind this is to avoid spam when we have large endpoint churn. However, because we also do incremental pushes for Full pushes now, we are also hiding these logs. These logs are both critical to debugging (things like #39720, etc) and not spammy -- while the `Full=false` pushes may add thousands of messages, this change only adds at most 1 log per push/proxy. For these types of pushes I don't see a benefit to excluding only EDS. Additionally, fix SDS to correctly assert it is incremental (when it is).
Might still be broken after #39937: https://prow.istio.io/view/gs/istio-prow/pr-logs/pull/istio_istio/39945/integ-security-multicluster_istio/1547894269065302016. Hard to tell without debug logs, need to trigger it in my pr with debug enabled |
Same here, in order to find out what is warming. I searched the proxy config logs, but could not find one.
|
The test framework dumps at multiple points. For example, if we have suite In this case, the warming is for the top level |
Confusing log if it is a temp output |
yes I'm hoping to resolve that issue
…On Mon, Jul 18, 2022, 6:10 PM Zhonghu Xu ***@***.***> wrote:
Confusing log if it is a temp output
—
Reply to this email directly, view it on GitHub
<#39720 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXJEXHBH6ZAMOZ3VYILVUX565ANCNFSM52KILHYA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
🤔 ❄️ Hey there's been no update for this test flakes for 3 days. Courtesy of your friendly test flake nag. |
AFAIK this issue is fixed now... closing for now. LMK if we see anything else |
https://prow.istio.io/view/gs/istio-prow/logs/integ-security-multicluster_istio_postsubmit/1542555639413215232
Still debugging, but looks like all clusters are warming
The text was updated successfully, but these errors were encountered: