Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[consul Registry]: When grpc enters idle mode, “last connection error” occurs on the client and cannot be recovered #3185

Closed
duc-cnzj opened this issue Feb 1, 2024 · 1 comment · Fixed by #3162
Labels
bug Something isn't working

Comments

@duc-cnzj
Copy link

duc-cnzj commented Feb 1, 2024

What happened:

When grpc enters idle mode, A last connection error occurs on the client and cannot be recovered.

What you expected to happen:

needs to continue to watch the latest instance of the service

How to reproduce it (as minimally and precisely as possible):

Bff grpc client adds WithIdleTimeout option
image

after 5 seconds, Restart the server grpc service, pod ip changes,But the client is no longer updated

image-20240119215331303

Anything else we need to know?:

Before entering idle
image-20240119104636822

After entering idle

image-20240119131138921

When grpc enters idle mode, watcher.Stop() Called and <-ctx.Done()

image-20240119214610867

Environment:

  • k8s
  • kratos: v2.7.2
  • consul: v1.10.3
  • Kratos version (use kratos -v):
  • go version go1.21.1 darwin/arm64

PR

fix: 当grpc结束闲置模式的时候,需要继续去获取服务的最新实例 #3162

@duc-cnzj duc-cnzj added the bug Something isn't working label Feb 1, 2024
@duc-cnzj duc-cnzj changed the title [consul Registry] When grpc enters idle mode, “last connection error” occurs on the client and cannot be recovered [consul Registry]: When grpc enters idle mode, “last connection error” occurs on the client and cannot be recovered Feb 1, 2024
Copy link

dosubot bot commented Feb 1, 2024

🤖

The issue you're experiencing might be due to the gRPC connection going into the IDLE state and not being able to recover when the service instance changes. This could be a problem with the gRPC resolver not being able to update the connection when the service instance changes.

To resolve this issue, you can try to add a custom gRPC resolver that can handle the service instance changes more effectively. Here's an example of how you can do this:

package main

import (
	"github.com/go-kratos/kratos/v2/transport/grpc/resolver"
	"google.golang.org/grpc/resolver"
)

type customBuilder struct {
	discovery registry.Discovery
}

func (b *customBuilder) Build(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
	r := &customResolver{
		target: target,
		cc:     cc,
		d:      b.discovery,
	}
	r.start()
	return r, nil
}

func (b *customBuilder) Scheme() string {
	return "custom"
}

type customResolver struct {
	target resolver.Target
	cc     resolver.ClientConn
	d      registry.Discovery
}

func (r *customResolver) start() {
	// Implement your logic to watch the service instance changes and update the gRPC connection accordingly.
}

func (r *customResolver) ResolveNow(resolver.ResolveNowOptions) {}

func (r *customResolver) Close() {}

func main() {
	// Register your custom resolver builder.
	resolver.Register(&customBuilder{discovery: yourDiscovery})

	// Use your custom resolver when dialing.
	conn, err := grpc.Dial("custom:///your-service-name", grpc.WithInsecure())
}

In the start method of the customResolver, you should implement your logic to watch the service instance changes from the registry.Discovery and call resolver.ClientConn.UpdateState to update the gRPC connection accordingly.

Please replace yourDiscovery and your-service-name with your actual discovery instance and service name.

This way, when the service instance changes, your custom resolver can update the gRPC connection to point to the latest service instance, which should resolve the issue you're experiencing.

You can refer to the transport/grpc/client.go for more details on how gRPC connections are handled in Kratos.

As for how Kratos handles gRPC connections entering idle mode, I wasn't able to find specific information within the repository. It's possible that this behavior is managed by the underlying gRPC library rather than Kratos itself. If you have more specific questions or need further clarification, please let me know!

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant