Liqo Gateway restarting in remote cluster #1988

Sharathmk99 · 2023-09-07T23:10:50Z

What happened:

In our production environment we are observing Liqo Gateway pod getting restarted very frequency(~2h) with below error.
We are suspecting due this we are facing network error for offloaded services. Not sure if network error occurred during the restarts, will try to find it.

I0907 20:52:57.045174       1 driver.go:282] <local cluster> -> starting conncheck sender
I0907 20:52:57.045225       1 tunnel-operator.go:316] prod-mgmt-wrn -> vpn connection correctly established
I0907 20:52:57.045612       1 conncheck.go:84] conncheck sender 516d15ef-b74e-48b8-8180-ffb0dd9bc7d0 starting
I0907 20:52:57.045839       1 tunnel-operator.go:200] <local cluster> -> route for destination {10.244.0.0/16} correctly configured
I0907 20:52:57.085435       1 iptables.go:697] Inserting rule '-d 10.243.0.1 -j DNAT --to-destination 169.254.0.1' in chain LIQO-PRRT-MAP-CLS-516d15ef (table nat)
I0907 20:53:07.858808       1 tunnel-operator.go:355] <local cluster> -> changing status to Error "No network connectivity towards remote cluster"
I0907 20:53:11.074021       1 tunnel-operator.go:355] <local cluster> -> changing status to Connected "VPN connection established"
E0907 22:38:17.450836       1 leaderelection.go:364] Failed to update lock: Put "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/liqo/leases/1d5hml1.gateway.net.liqo.io": context deadline exceeded
I0907 22:38:17.450889       1 leaderelection.go:280] failed to renew lease liqo/1d5hml1.gateway.net.liqo.io: timed out waiting for the condition
E0907 22:38:20.066954       1 gateway-operator.go:188] unable to start tunnel controller: leader election lost

What you expected to happen:

Liqo Gateway not restarting as the result network connecting between local and remote cluster is stable. Also is there any possibility to setup High Availability of Liqo Gateway as if gateway goes down entire network between local and remote cluster is broken.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Liqo version: v0.9.1
Liqoctl version: v0.9.1
Kubernetes version (use kubectl version): 1.21.x
Cloud provider or hardware configuration: Private AKS
Node image:
Network plugin and version:
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

Sharathmk99 · 2023-09-07T23:13:17Z

I see 1 restarts in local cluster Liqo Gateway as well with below error,

E0906 02:53:16.049752       1 leaderelection.go:327] error retrieving resource lock liqo/1d5hml1.gateway.net.liqo.io: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/liqo/leases/1d5hml1.gateway.net.liqo.io": context deadline exceeded
I0906 02:53:16.049792       1 leaderelection.go:280] failed to renew lease liqo/1d5hml1.gateway.net.liqo.io: timed out waiting for the condition
E0906 02:53:16.060612       1 gateway-operator.go:188] unable to start tunnel controller: leader election lost
I0906 02:53:16.060635       1 internal.go:581] "Stopping and waiting for non leader election runnables"
I0906 02:53:16.060655       1 internal.go:585] "Stopping and waiting for leader election runnables"

Sharathmk99 added the kind/bug Something isn't working label Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liqo Gateway restarting in remote cluster #1988

Liqo Gateway restarting in remote cluster #1988

Sharathmk99 commented Sep 7, 2023

Sharathmk99 commented Sep 7, 2023

Liqo Gateway restarting in remote cluster #1988

Liqo Gateway restarting in remote cluster #1988

Comments

Sharathmk99 commented Sep 7, 2023

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Sharathmk99 commented Sep 7, 2023