Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refresh-period not triggering any periodic discovery #373

Closed
weand opened this issue Sep 30, 2022 · 11 comments
Closed

refresh-period not triggering any periodic discovery #373

weand opened this issue Sep 30, 2022 · 11 comments
Assignees
Labels
bug Bug or Bug Fixes
Milestone

Comments

@weand
Copy link

weand commented Sep 30, 2022

Hi,

we use Stork on Quarkus with Kubernetes Service Discovery.

It turns out, that Stork does not consider refresh-period properly, when using a configuration like

quarkus.grpc.clients.fooService.host=foo
quarkus.grpc.clients.fooService.port=9000
quarkus.grpc.clients.fooService.name-resolver=stork


quarkus.stork.foo.service-discovery.type=kubernetes
quarkus.stork.foo.service-discovery.k8s-namespace=${jkube.namespace}
quarkus.stork.foo.service-discovery.application=foo-service-grpc
quarkus.stork.foo.service-discovery.refresh-period=5s

We debugged the application remotely in k8s, and see that
KubernetesServiceDiscovery.fetchNewServiceInstances(List previousInstances) is not invoked regulary. Even not when having traffic on the application.

We expected that new services get discovered (after refresh-period) automatically, e.g. when scaling up a client pod, or restarting a pod. But thats obviously not the case.

Latest Quarkus Release uses Stork 1.1.2, but there don't seem any change on CachingServiceDiscovery in Stork 1.2.0.

Any advice ?

Best regards
Andreas

@aureamunoz
Copy link
Collaborator

Hi Andreas,
Thanks for reaching out, it should refresh the cache after the refreshing period, do you have a reproducer?

I don't see anything wrong in the config.

@weand
Copy link
Author

weand commented Sep 30, 2022

I've got nothing ready for github.

But we basically follow this approach:

Inject a channel object as described here

@GrpcClient("fooService")
Channel channel;

Then on application start we create one async stub on the channel:

FooServiceStub fooService = FooServiceGrpc.newStub(channel)

The boundary of the application contains an JAX-RS service. On every REST call, we have some bidirectional communication on the stub:

fooService.someMethod(responseStreamObserver)

@aureamunoz
Copy link
Collaborator

Ok, can you confirm that the problem is about the refreshing of the cache? I mean that the service instances are collected the first time and cache is gathered with them or it is never fetching the instances from the cluster?

I will try to set up an scenario equivalent in order to see if there is a problem but we have not detected anything recently. Can you please confirm the Quarkus and Stork versions used?

I'll be off to an event 2 days next week so you can expect some delay in my investigations.

@aureamunoz
Copy link
Collaborator

aureamunoz commented Sep 30, 2022

If you have access to the cluster, Could you please verify that the following command returns the instances that you expect Stork to discover?
kubectl get endpoints foo-service-grpc -n ${jkube.namespace}

@weand
Copy link
Author

weand commented Oct 4, 2022

Ok, can you confirm that the problem is about the refreshing of the cache? I mean that the service instances are collected the first time and cache is gathered with them or it is never fetching the instances from the cluster?

Yes, its about refreshing the caches view, e.g. after scaling up pods. So basic reproducer is:

  • grpc service started with 2 pods initially
  • start application with grpc client: it load balances properly (in round robin manner) between both service instances
  • scale grpc service up to 3 pods total: client won't ever call the new pod.
# service with 2 pods started
$ oc get endpoints foo-service-grpc
NAME                               ENDPOINTS                             AGE
foo-service-grpc   10.128.4.110:9000,10.130.4.248:9000   6m6s

# scale up to 3 pods
$ oc get endpoints foo-service-grpc
NAME                               ENDPOINTS                                               AGE
foo-service-grpc   10.128.4.110:9000,10.130.4.248:9000,10.131.2.171:9000   8m46s

@cescoffier cescoffier added the bug Bug or Bug Fixes label Nov 2, 2022
@weand
Copy link
Author

weand commented Nov 27, 2022

After recent PR #374 we hoped for some improvement, but the issue remains. Tested with latest Quarkus 2.14.2.

Any workaround idea for how to get service instances discovered properly after scaling up ?

Thanks in advance.

@jdussouillez
Copy link

jdussouillez commented Jun 20, 2023

Similar issue with Quarkus 3.0.4.

When I deploy a new version, new pods are created and the old ones destroyed. The client application detects the new pods but then use the old ones again.

  • Start the client, it connects to pod 2 and 3.
  • Deploy the new version of the gRPC service, pod 4 and 5 are created. Pod 2 and 3 are destroyed.

Here the logs on the client side (logs are old, with Quarkus v2 but same issue with Quarkus v3) :

stork

@aureamunoz
Copy link
Collaborator

Hi, sorry for the delay, I haven't been able to take a look on this because of traveling, I will do later on this week and try to provide a response.

@aureamunoz
Copy link
Collaborator

Hi here! I finally was able to reproduce this issue. I will do a fix beginning of next week.

aureamunoz added a commit to aureamunoz/smallrye-stork that referenced this issue Jul 11, 2023
aureamunoz added a commit to aureamunoz/smallrye-stork that referenced this issue Jul 11, 2023
aureamunoz added a commit to aureamunoz/smallrye-stork that referenced this issue Jul 11, 2023
@cescoffier cescoffier added this to the 2.3.1 milestone Jul 12, 2023
aureamunoz added a commit to aureamunoz/smallrye-stork that referenced this issue Jul 12, 2023
@aureamunoz
Copy link
Collaborator

Fixed by #611

@jdussouillez
Copy link

jdussouillez commented Jul 26, 2023

Just to let you know that it also fixes the similar bug I had using Quarkus v3.2.1

Thanks a lot for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug or Bug Fixes
Projects
None yet
Development

No branches or pull requests

4 participants