Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fabio routing to wrong back end #421

Closed
craigday opened this issue Jan 18, 2018 · 12 comments
Closed

Fabio routing to wrong back end #421

craigday opened this issue Jan 18, 2018 · 12 comments
Milestone

Comments

@craigday
Copy link
Contributor

craigday commented Jan 18, 2018

We are experiencing multiple instances of Fabio routing requests to the wrong backend service. Once it starts happening it persists, typically until some change is made to the routing table by a service restart or similar. It's quite catastrophic because once it starts happening it's as if a whole bunch of URL/endpoints just disappear and start returning 404 not found because the requests are landing on a backend that doesn't serve them.

We have traced using tcpdump the requests coming in and out of Fabio and have proved beyond doubt that it is making the wrong routing decisions.

The attached dump shows a request coming into fabio for d2mx-prod-admin.dionglobal.com and then leaving fabio destined for a service on port 30398 but the routing table/consul indicates that the service on 30398 is analyser.sequoiadirect.com.au.

image002

image003

tcpro-tcpro-tcpro passing 172.25.135.14 30398 [urlprefix-analyser.sequoiadirect.com.au:9999/, urlprefix-analyser.boursedata.com.au:9999/, urlprefix -analyser.d2mx.com.au:9999/

We were initially running on fabio-1.5.0-go1.8.3-linux_amd64 but moved to fabio-1.5.4-go1.9.2-linux_amd64 to see if fixed the problem, but it hasnt.

@magiconair
Copy link
Contributor

magiconair commented Jan 18, 2018 via email

@craigday
Copy link
Contributor Author

craigday commented Feb 2, 2018

Thank's Frank. What's the best way or format to capture the routing table? The table doesn't actually change very often, and AFAIK, the table as it is now will be the same as when the failure starts to occur.

@magiconair
Copy link
Contributor

@craigday try curl localhost:9998/api/routes?raw or set log.routes.format = all

@magiconair
Copy link
Contributor

@craigday Is this still an issue?

@craigday
Copy link
Contributor Author

craigday commented Feb 8, 2018

Yep. Have sent you the routing table just now.

@craigday
Copy link
Contributor Author

This hit us again this morning. Is there any further info we can provide? Can you enumerate any possible theories or code paths that might be suspect, so we can help with the analysis?

@magiconair
Copy link
Contributor

Looking.

@magiconair
Copy link
Contributor

@craigday I'm awake now and DM'ing you on Twitter.

@atillamas
Copy link

atillamas commented Mar 19, 2018

Got hit by this as well running fabio 1.5.7

Requests to all our services started to get routed intermittent to our hash-ui service. Causing lots of weirdness. Stopping/Purging the hashi-ui service and starting it again made the problem dissapear for now.

@craigday
Copy link
Contributor Author

@atillamas FYI we believe we know what is causing this issue. Websocket requests that fail upgrade are left open and connected to their original back end. Nnginx out front is pooling these and sending requests straight through to this backend, completely bypassing the Fabio routing. Our workaround, for now, was to isolate the websocket onto an isolated fabio cluster.

I believe Fabio should be detecting these failed upgrades and closing the connections.

magiconair added a commit that referenced this issue Mar 22, 2018
The websocket proxy is implemented as a raw tcp proxy
which relies on the client and server to close the
connection. When a websocket upgrade fails the upstream
server may keep the connection open.

If a proxy like nginx is used in front of fabio
it will keep its connection to fabio open effectively
establishing a direct channel between nginx and the
upstream server which will be used for any request
forwarded by nginx to fabio.

Adding a 'Connection: close' header to the upstream
request should indicate to the server to close the
connection. If that works then we can keep the raw
tcp proxy for websockets. Otherwise, fabio needs
to handle the handshake and close the connection
itself.

Fixes #421
@magiconair
Copy link
Contributor

The way fabio currently handles websockets (via a raw tcp connection) makes detection a bit difficult. We can go back to a protocol proxy which relays WS messages instead. However, a first attempt is to inject a Connection: close header into the upgrade request since with HTTP/1.1 all connections are permanent unless declared otherwise.

@magiconair
Copy link
Contributor

@craigday Can you try that patch and see if it works? I'll add an integration test to simulate that behavior later.

magiconair added a commit that referenced this issue Mar 22, 2018
The websocket proxy is implemented as a raw tcp proxy
which relies on the client and server to close the
connection. When a websocket upgrade fails the upstream
server may keep the connection open.

If a proxy like nginx is used in front of fabio
it will keep its connection to fabio open effectively
establishing a direct channel between nginx and the
upstream server which will be used for any request
forwarded by nginx to fabio.

Adding a 'Connection: close' header to the upstream
request should indicate to the server to close the
connection. If that works then we can keep the raw
tcp proxy for websockets. Otherwise, fabio needs
to handle the handshake and close the connection
itself.

Fixes #421
magiconair added a commit that referenced this issue May 16, 2018
@magiconair magiconair added this to the 1.5.9 milestone May 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants