Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liqo Gateway restarting in remote cluster #1988

Open
Sharathmk99 opened this issue Sep 7, 2023 · 1 comment
Open

Liqo Gateway restarting in remote cluster #1988

Sharathmk99 opened this issue Sep 7, 2023 · 1 comment
Labels
kind/bug Something isn't working

Comments

@Sharathmk99
Copy link
Contributor

What happened:

In our production environment we are observing Liqo Gateway pod getting restarted very frequency(~2h) with below error.
We are suspecting due this we are facing network error for offloaded services. Not sure if network error occurred during the restarts, will try to find it.

I0907 20:52:57.045174       1 driver.go:282] <local cluster> -> starting conncheck sender
I0907 20:52:57.045225       1 tunnel-operator.go:316] prod-mgmt-wrn -> vpn connection correctly established
I0907 20:52:57.045612       1 conncheck.go:84] conncheck sender 516d15ef-b74e-48b8-8180-ffb0dd9bc7d0 starting
I0907 20:52:57.045839       1 tunnel-operator.go:200] <local cluster> -> route for destination {10.244.0.0/16} correctly configured
I0907 20:52:57.085435       1 iptables.go:697] Inserting rule '-d 10.243.0.1 -j DNAT --to-destination 169.254.0.1' in chain LIQO-PRRT-MAP-CLS-516d15ef (table nat)
I0907 20:53:07.858808       1 tunnel-operator.go:355] <local cluster> -> changing status to Error "No network connectivity towards remote cluster"
I0907 20:53:11.074021       1 tunnel-operator.go:355] <local cluster> -> changing status to Connected "VPN connection established"
E0907 22:38:17.450836       1 leaderelection.go:364] Failed to update lock: Put "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/liqo/leases/1d5hml1.gateway.net.liqo.io": context deadline exceeded
I0907 22:38:17.450889       1 leaderelection.go:280] failed to renew lease liqo/1d5hml1.gateway.net.liqo.io: timed out waiting for the condition
E0907 22:38:20.066954       1 gateway-operator.go:188] unable to start tunnel controller: leader election lost

What you expected to happen:

Liqo Gateway not restarting as the result network connecting between local and remote cluster is stable. Also is there any possibility to setup High Availability of Liqo Gateway as if gateway goes down entire network between local and remote cluster is broken.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Liqo version: v0.9.1
  • Liqoctl version: v0.9.1
  • Kubernetes version (use kubectl version): 1.21.x
  • Cloud provider or hardware configuration: Private AKS
  • Node image:
  • Network plugin and version:
  • Install tools:
  • Others:
@Sharathmk99 Sharathmk99 added the kind/bug Something isn't working label Sep 7, 2023
@Sharathmk99
Copy link
Contributor Author

I see 1 restarts in local cluster Liqo Gateway as well with below error,

E0906 02:53:16.049752       1 leaderelection.go:327] error retrieving resource lock liqo/1d5hml1.gateway.net.liqo.io: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/liqo/leases/1d5hml1.gateway.net.liqo.io": context deadline exceeded
I0906 02:53:16.049792       1 leaderelection.go:280] failed to renew lease liqo/1d5hml1.gateway.net.liqo.io: timed out waiting for the condition
E0906 02:53:16.060612       1 gateway-operator.go:188] unable to start tunnel controller: leader election lost
I0906 02:53:16.060635       1 internal.go:581] "Stopping and waiting for non leader election runnables"
I0906 02:53:16.060655       1 internal.go:585] "Stopping and waiting for leader election runnables"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant