Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: TimerAndWatchStateOnErrorCallback #7321

Open
easwars opened this issue Jun 11, 2024 · 4 comments
Open

Flaky test: TimerAndWatchStateOnErrorCallback #7321

easwars opened this issue Jun 11, 2024 · 4 comments

Comments

@easwars
Copy link
Contributor

easwars commented Jun 11, 2024

--- FAIL: Test (1.56s)
    --- FAIL: Test/TimerAndWatchStateOnErrorCallback (0.03s)
        tlogger.go:116: INFO server.go:132 [xds-e2e] Created new snapshot cache...  (t=+252.631µs)
        tlogger.go:116: INFO server.go:685 [core] [Server #44]Server created  (t=+1.247297ms)
        tlogger.go:116: INFO server.go:158 [xds-e2e] Registered Aggregated Discovery Service (ADS)...  (t=+3.857605ms)
        tlogger.go:116: INFO server.go:177 [xds-e2e] xDS management server serving at: 127.0.0.1:4[233](https://github.com/grpc/grpc-go/actions/runs/9473148966/job/26100074724?pr=7320#step:8:234)7...  (t=+4.007795ms)
        tlogger.go:116: INFO server.go:881 [core] [Server #44 ListenSocket #45]ListenSocket created  (t=+6.05881ms)
        tlogger.go:116: INFO bootstrap.go:569 [xds] [xds-bootstrap] Bootstrap config for creating xds-client: &{xDSServers:[0x40007a07e0] cpcs:map[client-side-certificate-provider-instance:{PluginName:file_watcher Config:[123 10 34 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 67 108 105 101 110 116 83 105 100 101 88 68 83 51 54 49 52 48 57 55 57 49 49 47 99 101 114 116 46 112 101 109 34 44 10 34 112 114 105 118 97 116 101 95 107 101 121 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 67 108 105 101 110 116 83 105 100 101 88 68 83 51 54 49 52 48 57 55 57 49 49 47 107 101 121 46 112 101 109 34 44 10 34 99 97 95 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 67 108 105 101 110 116 83 105 100 101 88 68 83 51 54 49 52 48 57 55 57 49 49 47 99 97 46 112 101 109 34 44 10 34 114 101 102 114 101 115 104 95 105 110 116 101 114 118 97 108 34 58 32 34 54 48 48 115 34 10 125]} server-side-certificate-provider-instance:{PluginName:file_watcher Config:[123 10 34 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 83 101 114 118 101 114 83 105 100 101 88 68 83 50 57 48 50 54 50 52 50 54 50 47 99 101 114 116 46 112 101 109 34 44 10 34 112 114 105 118 97 116 101 95 107 101 121 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 83 101 114 118 101 114 83 105 100 101 88 68 83 50 57 48 50 54 50 52 50 54 50 47 107 101 121 46 112 101 109 34 44 10 34 99 97 95 99 101 114 116 105 102 105 99 97 116 101 95 102 105 108 101 34 58 32 34 47 116 109 112 47 116 101 115 116 83 101 114 118 101 114 83 105 100 101 88 68 83 50 57 48 50 54 50 52 50 54 50 47 99 97 46 112 101 109 34 44 10 34 114 101 102 114 101 115 104 95 105 110 116 101 114 118 97 108 34 58 32 34 54 48 48 115 34 10 125]}] serverListenerResourceNameTemplate:grpc/server?xds.resource.listening_address=%s clientDefaultListenerResourceNameTemplate:%s authorities:map[:0x4000389020] node:{ID:e4644a06-8290-4722-a2ac-94ac68668693 Cluster: Locality:{Region: Zone: SubZone:} Metadata:<nil> userAgentName:gRPC Go userAgentVersionType:{UserAgentVersion:1.66.0-dev} clientFeatures:[envoy.lb.does_not_support_overprovisioning xds.config.resource-in-sotw]} certProviderConfigs:map[client-side-certificate-provider-instance:0x400078ee40 server-side-certificate-provider-instance:0x400078ef00]}  (t=+7.663132ms)
        tlogger.go:116: INFO clientconn.go:1684 [core] original dial target is: "127.0.0.1:42337"  (t=+8.068748ms)
        tlogger.go:116: INFO clientconn.go:306 [core] [Channel #46]Channel created  (t=+8.287637ms)
        tlogger.go:116: INFO clientconn.go:188 [core] [Channel #46]parsed dial target is: resolver.Target{URL:url.URL{Scheme:"dns", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/127.0.0.1:42337", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}  (t=+8.489964ms)
        tlogger.go:116: INFO clientconn.go:189 [core] [Channel #46]Channel authority set to "127.0.0.1:42337"  (t=+8.653599ms)
        tlogger.go:116: INFO resolver_wrapper.go:197 [core] [Channel #46]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:42337",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:42337",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+9.289575ms)
        tlogger.go:116: INFO balancer_wrapper.go:103 [core] [Channel #46]Channel switches to new LB policy "pick_first"  (t=+9.597169ms)
        tlogger.go:116: INFO clientconn.go:849 [core] [Channel #46 SubChannel #47]Subchannel created  (t=+9.872452ms)
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to CONNECTING  (t=+10.032371ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to CONNECTING  (t=+10.442045ms)
        tlogger.go:116: INFO clientconn.go:1326 [core] [Channel #46 SubChannel #47]Subchannel picks a new address "127.0.0.1:42337" to connect  (t=+10.736173ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to READY  (t=+13.151778ms)
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to READY  (t=+13.39[236](https://github.com/grpc/grpc-go/actions/runs/9473148966/job/26100074724?pr=7320#step:8:237)7ms)
        tlogger.go:116: INFO clientconn.go:306 [core] [Channel #46]Channel exiting idle mode  (t=+13.568766ms)
        tlogger.go:116: INFO transport.go:238 Created transport to server "127.0.0.1:42337"  (t=+14.91149ms)
        tlogger.go:116: INFO transport.go:337 ADS stream created  (t=+15.146929ms)
        tlogger.go:116: INFO simple.go:441 [xds-e2e] open watch 1 for type.googleapis.com/envoy.config.listener.v3.Listener[xdsclient-test-lds-resource] from nodeID "e4644a06-8290-4722-a2ac-94ac68668693", version ""  (t=+16.170659ms)
        tlogger.go:116: WARNING transport.go:479 ADS stream closed: EOF  (t=+18.954451ms)
        tlogger.go:116: INFO server.go:817 [core] [Server #44 ListenSocket #45]ListenSocket deleted  (t=+19.525345ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to IDLE  (t=+20.375501ms)
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to IDLE  (t=+20.759577ms)
        authority_test.go:170: Got 1. Want: 0
        tlogger.go:116: INFO clientconn.go:536 [core] [Channel #46]Channel Connectivity change to SHUTDOWN  (t=+22.0211ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #46]Closing the name resolver  (t=+22.220993ms)
        tlogger.go:116: INFO balancer_wrapper.go:135 [core] [Channel #46]ccBalancerWrapper: closing  (t=+22.357758ms)
        tlogger.go:116: WARNING transport.go:412 Sending ADS request for resources: [], url: "type.googleapis.com/envoy.config.listener.v3.Listener", version: "", nonce: "" failed: EOF  (t=+22.578209ms)
        tlogger.go:116: INFO clientconn.go:1210 [core] [Channel #46 SubChannel #47]Subchannel Connectivity change to SHUTDOWN  (t=+22.757052ms)
        tlogger.go:116: INFO clientconn.go:1557 [core] [Channel #46 SubChannel #47]Subchannel deleted  (t=+22.910058ms)
        tlogger.go:116: INFO clientconn.go:306 [core] [Channel #46]Channel deleted  (t=+23.020063ms)
FAIL
FAIL	google.golang.org/grpc/xds/internal/xdsclient	5.375s

https://github.com/grpc/grpc-go/actions/runs/9473148966/job/26100074724?pr=7320

@SpiffyEight77
Copy link

SpiffyEight77 commented Jun 29, 2024

Hey @easwars, I'd like to work on this issue. Could you please assign it to me?

@SpiffyEight77
Copy link

@easwars cc @aranjans
Sorry to bother you at this moment.

This is my first time addressing this type of issue, and it seems difficult to reproduce locally. I've run the following commands; am I missing anything?

Details

$ go version
go version go1.22.4 darwin/arm64


$ pwd
~/Developer/SpiffyEight77/grpc-go/xds/internal/xdsclient

$ go test -race -count=100000 -failfast -run ^Test/TimerAndWatchStateOnErrorCallback$
PASS
ok      google.golang.org/grpc/xds/internal/xdsclient   429.233s

@purnesh42H
Copy link
Contributor

@SpiffyEight77 thanks for doing go test -race. Yeah flaky tests are hard to repro. We will add more instances to this issue if encounter any in future. Meanwhile if you want you can try to trace the code to figure the root cause and fix if any

@SpiffyEight77
Copy link

@SpiffyEight77 thanks for doing go test -race. Yeah flaky tests are hard to repro. We will add more instances to this issue if encounter any in future. Meanwhile if you want you can try to trace the code to figure the root cause and fix if any

@purnesh42H Thank you so much for your reply, I'll take time to read the code and figure the root cause and fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants