Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior when configured with listeners that fail to bind. #3395

Open
PiotrSikora opened this issue May 16, 2018 · 10 comments
Open
Labels

Comments

@PiotrSikora
Copy link
Contributor

Description:

Envoy successfully starts when configured with listeners on ports that it fails to bind to, which results in unexpected behavior (see below).

Repro steps:

Port :10001 fail - both connections are matched to the wildcard listener 0.0.0.0:10001, even though more specific listener 127.0.0.1:10001 exists (or at least, was configured):

$ curl 192.168.9.1:10001
This server is listening on *:10001
$ curl 127.0.0.1:10001
This server is listening on *:10001

Port :10002 fail - all connections to the wildcard listener 0.0.0.0:10002 are refused, because listener 127.0.0.1:10002 was configured first:

$ curl 192.168.9.1:10002
curl: (7) Failed to connect to 192.168.9.1 port 10002: Connection refused
$ curl 127.0.0.1:10002
This server is listening on 127.0.0.1:10002

Admin and Stats Output:

$ curl -s localhost:10000/listeners
["0.0.0.0:10001","127.0.0.1:10001","127.0.0.1:10002","0.0.0.0:10002"]
$ curl -s localhost:10000/stats | grep ^listener
listener.0.0.0.0_10001.downstream_cx_active: 0
listener.0.0.0.0_10001.downstream_cx_destroy: 2
listener.0.0.0.0_10001.downstream_cx_total: 2
listener.0.0.0.0_10001.http.whatever.downstream_rq_1xx: 0
listener.0.0.0.0_10001.http.whatever.downstream_rq_2xx: 2
listener.0.0.0.0_10001.http.whatever.downstream_rq_3xx: 0
listener.0.0.0.0_10001.http.whatever.downstream_rq_4xx: 0
listener.0.0.0.0_10001.http.whatever.downstream_rq_5xx: 0
listener.0.0.0.0_10002.http.whatever.downstream_rq_1xx: 0
listener.0.0.0.0_10002.http.whatever.downstream_rq_2xx: 0
listener.0.0.0.0_10002.http.whatever.downstream_rq_3xx: 0
listener.0.0.0.0_10002.http.whatever.downstream_rq_4xx: 0
listener.0.0.0.0_10002.http.whatever.downstream_rq_5xx: 0
listener.127.0.0.1_10001.http.whatever.downstream_rq_1xx: 0
listener.127.0.0.1_10001.http.whatever.downstream_rq_2xx: 0
listener.127.0.0.1_10001.http.whatever.downstream_rq_3xx: 0
listener.127.0.0.1_10001.http.whatever.downstream_rq_4xx: 0
listener.127.0.0.1_10001.http.whatever.downstream_rq_5xx: 0
listener.127.0.0.1_10002.downstream_cx_active: 0
listener.127.0.0.1_10002.downstream_cx_destroy: 1
listener.127.0.0.1_10002.downstream_cx_total: 1
listener.127.0.0.1_10002.http.whatever.downstream_rq_1xx: 0
listener.127.0.0.1_10002.http.whatever.downstream_rq_2xx: 1
listener.127.0.0.1_10002.http.whatever.downstream_rq_3xx: 0
listener.127.0.0.1_10002.http.whatever.downstream_rq_4xx: 0
listener.127.0.0.1_10002.http.whatever.downstream_rq_5xx: 0
listener.admin.downstream_cx_active: 1
listener.admin.downstream_cx_destroy: 1
listener.admin.downstream_cx_total: 2
listener.admin.http.admin.downstream_rq_1xx: 0
listener.admin.http.admin.downstream_rq_2xx: 1
listener.admin.http.admin.downstream_rq_3xx: 0
listener.admin.http.admin.downstream_rq_4xx: 0
listener.admin.http.admin.downstream_rq_5xx: 0
listener_manager.listener_added: 4
listener_manager.listener_create_failure: 2
listener_manager.listener_create_success: 16
listener_manager.listener_modified: 0
listener_manager.listener_removed: 0
listener_manager.total_listeners_active: 4
listener_manager.total_listeners_draining: 0
listener_manager.total_listeners_warming: 0
listener.0.0.0.0_10001.downstream_cx_length_ms: P0(nan,1) P25(nan,1.05) P50(nan,1.1) P75(nan,2.05) P90(nan,2.08) P95(nan,2.09) P99(nan,2.098) P99.9(nan,2.0998) P100(nan,2.1)
listener.127.0.0.1_10002.downstream_cx_length_ms: P0(nan,1) P25(nan,1.025) P50(nan,1.05) P75(nan,1.075) P90(nan,1.09) P95(nan,1.095) P99(nan,1.099) P99.9(nan,1.0999) P100(nan,1.1)
listener.admin.downstream_cx_length_ms: P0(nan,nan) P25(nan,nan) P50(nan,nan) P75(nan,nan) P90(nan,nan) P95(nan,nan) P99(nan,nan) P99.9(nan,nan) P100(nan,nan)

Config:

admin:
  access_log_path: "/dev/null"
  address:
    socket_address: { address: 127.0.0.1, port_value: 10000 }

static_resources:
  listeners:
  # :10001 - wildcard, localhost
  - name: wildcard_10001
    address:
      socket_address: { address: 0.0.0.0, port_value: 10001 }
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: whatever
          codec_type: AUTO
          route_config:
            virtual_hosts:
            - name: whatever
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                direct_response:
                  status: 200
                  body: { inline_string: "This server is listening on *:10001\n" }
          http_filters:
          - name: envoy.router
  - name: localhost_10001
    address:
      socket_address: { address: 127.0.0.1, port_value: 10001 }
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: whatever
          codec_type: AUTO
          route_config:
            virtual_hosts:
            - name: whatever
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                direct_response:
                  status: 200
                  body: { inline_string: "This server is listening on 127.0.0.1:10001\n" }
          http_filters:
          - name: envoy.router
  # :10002 - localhost, wildcard
  - name: localhost_10002
    address:
      socket_address: { address: 127.0.0.1, port_value: 10002 }
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: whatever
          codec_type: AUTO
          route_config:
            virtual_hosts:
            - name: whatever
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                direct_response:
                  status: 200
                  body: { inline_string: "This server is listening on 127.0.0.1:10002\n" }
          http_filters:
          - name: envoy.router
  - name: wildcard_10002
    address:
      socket_address: { address: 0.0.0.0, port_value: 10002 }
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          stat_prefix: whatever
          codec_type: AUTO
          route_config:
            virtual_hosts:
            - name: whatever
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                direct_response:
                  status: 200
                  body: { inline_string: "This server is listening on *:10002\n" }
          http_filters:
          - name: envoy.router

Logs:

[info][main] source/server/server.cc:402] starting main dispatch loop
[debug][main] source/server/worker_impl.cc:91] worker entering dispatch loop
[critical][config] source/server/listener_manager_impl.cc:613] listener 'localhost_10001' failed to listen on address '127.0.0.1:10001' on worker
[debug][config] source/server/listener_manager_impl.cc:655] begin remove listener: name=localhost_10001
[debug][config] source/server/listener_manager_impl.cc:663] unknown/locked listener 'localhost_10001'. no remove
[critical][config] source/server/listener_manager_impl.cc:613] listener 'wildcard_10002' failed to listen on address '0.0.0.0:10002' on worker
[debug][config] source/server/listener_manager_impl.cc:655] begin remove listener: name=wildcard_10002
[debug][config] source/server/listener_manager_impl.cc:663] unknown/locked listener 'wildcard_10002'. no remove
@mattklein123
Copy link
Member

@PiotrSikora are you planning on working on this?

@PiotrSikora
Copy link
Contributor Author

Not anytime soon.

@mattklein123 mattklein123 added the help wanted Needs help! label May 18, 2018
@zyfjeff
Copy link
Member

zyfjeff commented Nov 11, 2018

@PiotrSikora Do you have plans to fix this? If not, I can handle this.

@PiotrSikora
Copy link
Contributor Author

@zyfjeff please do, thanks!

@abaptiste
Copy link
Contributor

@PiotrSikora, @mattklein123 If this is not being worked on, I can pick it up. I am able to reproduce this issue:

[2019-11-21 16:19:48.758][169][critical][config] [source/server/listener_manager_impl.cc:463] listener 'localhost_10001' failed to listen on address '127.0.0.1:10001' on worker
[2019-11-21 16:19:48.758][169][debug][config] [source/server/listener_manager_impl.cc:505] begin remove listener: name=localhost_10001
[2019-11-21 16:19:48.759][169][debug][config] [source/server/listener_manager_impl.cc:513] unknown/locked listener 'localhost_10001'. no remove
[2019-11-21 16:19:48.760][169][critical][config] [source/server/listener_manager_impl.cc:463] listener 'wildcard_10002' failed to listen on address '0.0.0.0:10002' on worker
[2019-11-21 16:19:48.760][169][debug][config] [source/server/listener_manager_impl.cc:505] begin remove listener: name=wildcard_10002
[2019-11-21 16:19:48.760][169][debug][config] [source/server/listener_manager_impl.cc:513] unknown/locked listener 'wildcard_10002'. no remove

I haven't yet looked closely at why 10001 works and 10002 does not. Right now I'm trying to establish what is the expected behavior.

@mattklein123
Copy link
Member

Right now I'm trying to establish what is the expected behavior.

I would have to look at this in a bunch of detail to figure out what is going on and what the expected behavior would be. Can you potentially analyze and propose something and then we can discuss?

@abaptiste
Copy link
Contributor

Will do. I'll update this issue once I have more substantial information.

@abaptiste
Copy link
Contributor

Tested this with two Envoy branches (1.12.2 and master), and found that Envoy appears to be working as expected. That said, there is a behavior difference when testing with containers.

When Envoy initializes listeners, the first listener to be created 'wins', so using the provided config:

  - name: wildcard_10001
    address:
      socket_address: { address: 0.0.0.0, port_value: 10001 }
  - name: localhost_10001
    address:
      socket_address: { address: 127.0.0.1, port_value: 10001 }

  - name: localhost_10002
    address:
      socket_address: { address: 127.0.0.1, port_value: 10002 }
  - name: wildcard_10002
    address:
      socket_address: { address: 0.0.0.0, port_value: 10002 }    

we end up with:

-bash-4.2# lsof -nP -i tcp:10001
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
envoy-sta  52 root   18u  IPv4  32231      0t0  TCP *:10001 (LISTEN)
-bash-4.2# lsof -nP -i tcp:10002
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
envoy-sta  52 root   20u  IPv4  29312      0t0  TCP 127.0.0.1:10002 (LISTEN)

Testing responses from the same host that is running Envoy, I see that:

  • curl to http://<any host address>:10001 returns a static response as expected.
  • curl to http://127.0.0.1:10002 returns a static response as expected
  • curl to http://<any global address>:10002 gets a connection refused message

If I run envoy in a container and expose both ports 10001 and 10002, from the host machine, only port 10001 works in this configuration since the container interface can respond on port 10001.

Since 127.0.0.1 is specific, the container interface will not respond on port 10002. Capturing inside the container shows the SYN/RST being received/sent on 172.17.0.2.

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
23:05:42.536007 IP 172.17.0.1.43588 > 172.17.0.2.10002: Flags [S], seq 1864299755, win 29200, options [mss 1460,sackOK,TS val 6315184 ecr 0,nop,wscale 7], length 0
23:05:42.536038 IP 172.17.0.2.10002 > 172.17.0.1.43588: Flags [R.], seq 0, ack 1864299756, win 0, length 0

This makes sense since there is no socket bound on 172.17.0.2:10002.

Changing the order of the last pair of listeners 'works' since the first bound socket is the wildcard, which can work through the container interface.

I wrote a single threaded binary that mimics the same bind order with the same port and its behavior was identical to what I see with Envoy.

If we desire to have this configuration work where we can bind to 0.0.0.0:10002 and 127.0.0.1:10002, we will have to use SO_REUSEPORT when creating the server sockets, though I feel this will create more issues than it solves.

@mattklein123
Copy link
Member

Thanks for the detailed write-up @abaptiste, very nice! From my read of your write-up as well as the original issue, I think @PiotrSikora is probably asking for the config to fail to load in this case in a more obvious way? That would prevent any surprising behavior from ordering issues? @PiotrSikora WDYT?

@abaptiste
Copy link
Contributor

abaptiste commented Nov 26, 2019

I'm working on adding a command line option currently named --server-exit-on-bind-failure that will print a message identifying the address and port that we failed to bind to. The message will also indicate that Envoy is exiting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants