Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate: Wait(n=1) would exceed context deadline #362

Closed
LionelJouin opened this issue Jan 16, 2023 · 6 comments
Closed

rate: Wait(n=1) would exceed context deadline #362

LionelJouin opened this issue Jan 16, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@LionelJouin
Copy link
Member

LionelJouin commented Jan 16, 2023

Expected Behavior

The k8s registry should work as the memory registry, it should be able to handle many nodes, and many NSEs/NSCs.

Current Behavior

I am not sure yet if it is due to the amount of NSEs and NSCs, or if it the number of Kubernetes worker, or if it is the combination of multiple factors, but using the K8S-registry (no problem with the memory-registry), some of my NSEs are failing to register / some of my NSCs are failing the request and I believe it is due to the registry since I can see logs in nsmgr and forwarder referring to the registry.

Failure Information (for bugs)

The issue is not happening all the time, sometimes I managed to deploy with no problem.

Here is some logs coming from the registry:

2023-01-16T10:14:18.929232608Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 1) ⎆ sdk/pkg/registry/common/begin/beginNSEServer.Find()
2023-01-16T10:14:18.929254518Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 1.1)   find={"network_service_endpoint":{"name":"stateless-lb-frontend-attractor-a-747f4f9cdd-kfcs4"}}
2023-01-16T10:14:18.929263095Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 2)  ⎆ sdk/pkg/registry/common/authorize/authorizeNSEServer.Find()
2023-01-16T10:14:18.929266149Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 2.1)    find={"network_service_endpoint":{"name":"stateless-lb-frontend-attractor-a-747f4f9cdd-kfcs4"}}
2023-01-16T10:14:18.929268945Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 3)   ⎆ sdk/pkg/registry/switchcase/switchCaseNSEServer.Find()
2023-01-16T10:14:18.929271752Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 3.1)     find={"network_service_endpoint":{"name":"stateless-lb-frontend-attractor-a-747f4f9cdd-kfcs4"}}
2023-01-16T10:14:18.929274992Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 4)    ⎆ sdk/pkg/registry/common/setregistrationtime/setregtimeNSEServer.Find()
2023-01-16T10:14:18.92927794Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 4.1)      find={"network_service_endpoint":{"name":"stateless-lb-frontend-attractor-a-747f4f9cdd-kfcs4"}}
2023-01-16T10:14:18.929283228Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 5)     ⎆ sdk/pkg/registry/common/expire/expireNSEServer.Find()
2023-01-16T10:14:18.929286322Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 5.1)       find={"network_service_endpoint":{"name":"stateless-lb-frontend-attractor-a-747f4f9cdd-kfcs4"}}
2023-01-16T10:14:18.9292956Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 6)      ⎆ sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Find()
2023-01-16T10:14:18.929302163Z stderr F Jan 16 10:14:18.929 [TRAC] [type:registry] 6.1)        find={"network_service_endpoint":{"name":"stateless-lb-frontend-attractor-a-747f4f9cdd-kfcs4"}}
2023-01-16T10:14:18.929412731Z stderr F Jan 16 10:14:18.929 [ERRO] [type:registry] 6.2)        rate: Wait(n=1) would exceed context deadline;	Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Find;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.logError;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/trace/common.go:38;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.(*traceNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/trace/nse_registry.go:148;	github.com/networkservicemesh/sdk/pkg/registry/core/next.(*nextNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/next/nse_registry_server.go:64;	github.com/networkservicemesh/sdk/pkg/registry/common/expire.(*expireNSEServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/common/expire/nse_server.go:96;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.(*traceNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/trace/nse_registry.go:146;	github.com/networkservicemesh/sdk/pkg/registry/core/next.(*nextNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/next/nse_registry_server.go:64;	github.com/networkservicemesh/sdk/pkg/registry/common/setregistrationtime.(*setregtimeNSEServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/common/setregistrationtime/nse_server.go:47;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.(*traceNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/trace/nse_registry.go:146;	github.com/networkservicemesh/sdk/pkg/registry/core/next.(*nextNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/next/nse_registry_server.go:64;	github.com/networkservicemesh/sdk/pkg/registry/switchcase.(*switchCaseNSEServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/switchcase/nse_server.go:51;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.(*traceNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/trace/nse_registry.go:146;	github.com/networkservicemesh/sdk/pkg/registry/core/next.(*nextNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/next/nse_registry_server.go:64;	github.com/networkservicemesh/sdk/pkg/registry/common/authorize.(*authorizeNSEServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/common/authorize/nse_server.go:83;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.(*traceNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/trace/nse_registry.go:146;	github.com/networkservicemesh/sdk/pkg/registry/core/next.(*nextNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/next/nse_registry_server.go:64;	github.com/networkservicemesh/sdk/pkg/registry/common/begin.(*beginNSEServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/common/begin/nse_server.go:82;	github.com/networkservicemesh/sdk/pkg/registry/core/trace.(*traceNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/trace/nse_registry.go:146;	github.com/networkservicemesh/sdk/pkg/registry/core/next.(*nextNetworkServiceEndpointRegistryServer).Find;		/go/pkg/mod/github.com/networkservicemesh/sdk@v1.6.1/pkg/registry/core/next/nse_registry_server.go:64;	github.com/networkservicemesh/api/pkg/api/registry._NetworkServiceEndpointRegistry_Find_Handler;		/go/pkg/mod/github.com/networkservicemesh/api@v1.6.1/pkg/api/registry/registry.pb.go:1122;	google.golang.org/grpc.(*Server).processStreamingRPC;		/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1564;	google.golang.org/grpc.(*Server).handleStream;		/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:1646;	google.golang.org/grpc.(*Server).serveStreams.func1.2;		/go/pkg/mod/google.golang.org/grpc@v1.49.0/server.go:938;	runtime.goexit;		/usr/local/go/src/runtime/asm_amd64.s:1571;	
2023-01-16T10:14:18.929420887Z stderr F Jan 16 10:14:18.929 [ERRO] [type:registry] 5.2)       Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Find: rate: Wait(n=1) would exceed context deadline
2023-01-16T10:14:18.92943615Z stderr F Jan 16 10:14:18.929 [ERRO] [type:registry] 4.2)      Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Find: rate: Wait(n=1) would exceed context deadline
2023-01-16T10:14:18.929452239Z stderr F Jan 16 10:14:18.929 [ERRO] [type:registry] 3.2)     Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Find: rate: Wait(n=1) would exceed context deadline
2023-01-16T10:14:18.92945736Z stderr F Jan 16 10:14:18.929 [ERRO] [type:registry] 2.2)    Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Find: rate: Wait(n=1) would exceed context deadline
2023-01-16T10:14:18.929463615Z stderr F Jan 16 10:14:18.929 [ERRO] [type:registry] 1.2)   Error returned from sdk-k8s/pkg/registry/etcd/etcdNSERegistryServer.Find: rate: Wait(n=1) would exceed context deadline
2023-01-16T10:14:19.118074074Z stderr F W0116 10:14:19.118014       1 warnings.go:70] unknown field "status"
2023-01-16T10:14:19.21606373Z stderr F Jan 16 10:14:19.215 [WARN] [cmd:/bin/cmd-registry-k8s] [etcdNSERegistryServer:handleWatcher] watcher is closed, retrying

Steps to Reproduce

No instructions yet, I had this problem using my project. I will try to check if I can reproduce using the example provided on the deployment-k8s.

Context

Number of NS: 3
Number of NSC: 26
Number of NSE: 13
Number of Kubernetes worker: 10
Number of registry: 1 (I haven't noticed any problem with more replicas)

NSM: v1.7.1 (Tried with v1.6.1 and v1.5.0 and the issue was existing)
Kubernetes: v1.25 (Kind) / v1.24 (Real cluster)

Failure Logs

registry.log

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Jan 17, 2023
@denis-tingaikin
Copy link
Member

denis-tingaikin commented Jan 17, 2023

@LionelJouin Thanks for the great report. We'll take care of it in v1.8.0.

Also, we found something similar during scale testing networkservicemesh/deployments-k8s#5494 (comment)

Just for now, I recommend increasing CPU limits for registry-k8s.

@bharath-avesha
Copy link

Hi @denis-tingaikin, i tried applying the workaround of increasing the cpu and memory limits on registry-k8s but it did not work. And also, the networkserviceendpoint objects seem to be in a flux, a number of them get deleted and recreated in a loop. Is the root cause known for this issue?

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Feb 6, 2023

@bharath-avesha We've found a root cause and the problem seems fixed. For now, the latest main version of deployments-k8s repository should be fine. We also will have pinged you when v1.8.0-rc.1 will be available 😉

@denis-tingaikin
Copy link
Member

@LionelJouin Could you please test ghcr.io/networkservicemesh/cmd-registry-k8s:v1.8.0-rc.1?

@LionelJouin
Copy link
Member Author

Sure I will

@LionelJouin
Copy link
Member Author

I just tried with 76 NSCs and 26 NSEs, the problem seems fixed with v1.8.0-rc.1, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

4 participants