Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.16.0 - Websocket error #135

Closed
dbason opened this issue Mar 25, 2024 · 3 comments
Closed

v0.16.0 - Websocket error #135

dbason opened this issue Mar 25, 2024 · 3 comments

Comments

@dbason
Copy link

dbason commented Mar 25, 2024

We are seeing the following errors looping in the logs:

19:46:39.420701 client.go:359: config-poller INFO: connection sucessfully opened to config discovery server at "ws://10.155.11.217:13478/api/v1/config/watch?id=test%2Fstunner-udp-gateway"
19:46:39.421414 reconcile.go:113: stunner INFO: setting loglevel to "all:INFO"
19:46:39.421605 reconcile.go:177: stunner INFO: reconciliation ready: new objects: 0, changed objects: 1, deleted objects: 0, started objects: 0, restarted objects: 0
19:46:39.421653 reconcile.go:181: stunner INFO: status: READY, realm: stunner.l7mp.io, authentication: longterm, listeners: test/stunner-udp-gateway/udp-listener: [turn-udp://10.x.x.x:3478<32768:65535>], active allocations: 0
19:46:40.422774 client.go:334: config-poller ERROR: config file discovery service: websocket: close 1006 (abnormal closure): unexpected EOF

We are currently running v0.16.0 as our current GKE cluster doesn't support the v1 Gateway API. It looks like the CDS process has been rewritten in later versions, but just wondering if there are any workarounds in the meantime.

@rg0now rg0now added the type: bug Something isn't working label Mar 25, 2024
@rg0now
Copy link
Member

rg0now commented Mar 25, 2024

Not that I know of. Managed mode was fairly new in v0.16 and we have made an almost complete rewrite during the last two releases exactly to eliminate the instability you are experiencing.
I see two alternatives for now:

  • Revert to the unmanaged (legacy) dataplane mode: this used to be the default in v0.16 anyway and it was rock solid at that point. This will most probably require a full reinstall though: https://docs.l7mp.io/en/v0.16.0/INSTALL/#basic-installation. Plus, compared to the now-default managed mode it's a massive step back, but it's at least super-reliable: I know of a lot of users who still run v0.16 exactly due to this.
  • Move from GKE Autopilot to GKE standard mode clusters and upgrade to STUNner v0.18. Standard-mode clusters do not auto-enable Google's own version of the Gateway API so you can safely install v0.18 there (make sure to untick the Gateway API checkbox on provisioning the cluster). Depending on how much you rely on the pricing model in Autopilot and how many services you already run in your cluster, this may be the better option for now to get the freshest of STUNner.

We're terribly sorry for this situation, we understand how unpleasant this state-of-affairs is to our users. We are at Google's mercy at this point: let's hope they quickly upgrade to v1. Good news is that we're not alone.

@dbason
Copy link
Author

dbason commented Mar 25, 2024

Thanks for the advice. I'm not sure if #136 is related to this as well. I will move to standalone mode and confirm if that resolves both issues

@dbason
Copy link
Author

dbason commented Apr 1, 2024

This looks to have resolved the issue so I'm going to close it off.

@dbason dbason closed this as completed Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants