Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing via POST sometimes timeouts when using multiple worker processes #509

Open
pbirkants opened this issue Jan 23, 2019 · 6 comments

Comments

@pbirkants
Copy link

When publishing via POST requests, it seems that some channel IDs work fine, but others do not, if the following conditions are met:

  • worker_processes 2; (or more)
  • Publisher uses both nchan_authorize_request and nchan_publisher_upstream_request

When publishing to the broken channels, subscribers receive the message immediately, the but the publisher's connection hangs until a timeout occurs, and when that happens, subscribers receive the same message again with a new ID. I've attached below a debug log, subscriber websocket output and a curl request log.

If I change the config to worker_processes 1; or remove either of the upstream requests, the issue disappears.

The ID values themselves don't seem to matter, I originally stumbled upon this when trying to use "public" together with numeric IDs, and some numbers would consistently work, while others consistently fail.

  • nginx version: nginx/1.14.0 (Ubuntu)
  • nchan version: 1.2.3
  • nginx.conf:
daemon off;
user www-data;
worker_processes 2;
pid /run/nginx.pid;

include /etc/nginx/modules-enabled/50-mod-nchan.conf;

events {
  worker_connections 768;
}

http {
  default_type application/octet-stream;

  access_log /dev/stdout;
  error_log /dev/stderr debug;


  server {
    listen 80 default_server;
    listen [::]:80 default_server;

    location = /listen {
      nchan_subscriber;

      nchan_channel_id public test foo bar;
    }

    location ~ /pub/(\w+)$ {
      nchan_publisher;

      nchan_channel_id $1;

      nchan_authorize_request /proxy/200;
      nchan_publisher_upstream_request /proxy/304;
    }

    location ~ /proxy/(\w+)$ {
        proxy_pass http://127.0.0.1:80/return/$1;
    }

    location = /return/403 {
        return 403;
    }

    location = /return/304 {
        return 304;
    }

    location = /return/200 {
        return 200 "foo";
    }
  }
}
$ for i in foo bar public test; do
  curl -v -d "$i@`date +%s`" -m5 http://bbs:8081/pub/$i
  echo; echo; echo; sleep 2
done
*   Trying 192.168.88.101...
* TCP_NODELAY set
* Connected to bbs (192.168.88.101) port 8081 (#0)
> POST /pub/foo HTTP/1.1
> Host: bbs:8081
> User-Agent: curl/7.63.0
> Accept: */*
> Content-Length: 14
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 14 out of 14 bytes
* Operation timed out after 5001 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 5001 milliseconds with 0 bytes received



*   Trying 192.168.88.101...
* TCP_NODELAY set
* Connected to bbs (192.168.88.101) port 8081 (#0)
> POST /pub/bar HTTP/1.1
> Host: bbs:8081
> User-Agent: curl/7.63.0
> Accept: */*
> Content-Length: 14
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 14 out of 14 bytes
< HTTP/1.1 201 Created
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 23 Jan 2019 01:52:30 GMT
< Content-Type: text/plain
< Content-Length: 101
< Connection: keep-alive
< 
queued messages: 1
last requested: -1 sec. ago
active subscribers: 1
* Connection #0 to host bbs left intact
last message id: 1548208350:0


*   Trying 192.168.88.101...
* TCP_NODELAY set
* Connected to bbs (192.168.88.101) port 8081 (#0)
> POST /pub/public HTTP/1.1
> Host: bbs:8081
> User-Agent: curl/7.63.0
> Accept: */*
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 17 out of 17 bytes
* Operation timed out after 5000 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 5000 milliseconds with 0 bytes received



*   Trying 192.168.88.101...
* TCP_NODELAY set
* Connected to bbs (192.168.88.101) port 8081 (#0)
> POST /pub/test HTTP/1.1
> Host: bbs:8081
> User-Agent: curl/7.63.0
> Accept: */*
> Content-Length: 15
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 15 out of 15 bytes
< HTTP/1.1 201 Created
< Server: nginx/1.14.0 (Ubuntu)
< Date: Wed, 23 Jan 2019 01:52:39 GMT
< Content-Type: text/plain
< Content-Length: 101
< Connection: keep-alive
< 
queued messages: 1
last requested: -1 sec. ago
active subscribers: 1
* Connection #0 to host bbs left intact
last message id: 1548208359:0
  • Subscriber:
$ wsta -I -e -H "Sec-WebSocket-Protocol: ws+meta.nchan" ws://bbs:8081/listen
WebSocket upgrade request
---
Host: bbs:8081
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: UuuelvTkLApQvwcOo5lS4Q==
Origin: http://bbs
Sec-WebSocket-Protocol:  ws+meta.nchan


WebSocket upgrade response
---
101 Switching Protocols
Server: nginx/1.14.0 (Ubuntu)
Date: Wed, 23 Jan 2019 01:52:01 GMT
Content-Length: 0
Connection: upgrade
Sec-WebSocket-Protocol: ws+meta.nchan
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: http://bbs
Sec-WebSocket-Accept: 0cqDCx103LXKzS8gIRV7gLYSdbQ=
Upgrade: websocket


Connected to ws://bbs:8081/listen
id: 1548208343:-,-,[0],-
content-type: application/x-www-form-urlencoded

foo@1548208343
id: 1548208348:-,-,[0],-
content-type: application/x-www-form-urlencoded

foo@1548208343
id: 1548208350:-,-,-,[0]
content-type: application/x-www-form-urlencoded

bar@1548208350
id: 1548208352:[0],-,-,-
content-type: application/x-www-form-urlencoded

public@1548208352
id: 1548208357:[0],-,-,-
content-type: application/x-www-form-urlencoded

public@1548208352
id: 1548208359:-,[0],-,-
content-type: application/x-www-form-urlencoded

test@1548208359
@slact
Copy link
Owner

slact commented Jan 23, 2019

Were there any worker crashes? I see nothing suspicious in the error log.

@pbirkants
Copy link
Author

Nope, no crashes.

I also tried the same scenario with the static nginx 1.10 + nchan 1.2.3 build available from the website, and got slightly different outcome - in this case the channel ID didn't matter, the failures are intermittent even when publishing to the same channel multiple times, whereas before it would either always succeed or always fail.

Please note that for the initial report I used a 5 second timeout for the curl requests. If I leave that out, the request stalls until send_timeout kicks in (60 seconds), curl outputs "Empty reply from server" and subscribers get the same message twice more 60 seconds after initial message.
I get this both with nginx 1.14 and with the static build.

Here are some more logs from static build + no timeout on client side:

Connected to ws://172.17.0.2:80/listen
id: 1548369334:-,[0],-,-
content-type: application/x-www-form-urlencoded

test @ 1548369334
id: 1548369394:-,[0],-,-
content-type: application/x-www-form-urlencoded

test @ 1548369334
id: 1548369394:-,[1],-,-
content-type: application/x-www-form-urlencoded

test @ 1548369334
id: 1548369396:-,[0],-,-
content-type: application/x-www-form-urlencoded

test @ 1548369396
id: 1548369398:-,[0],-,-
content-type: application/x-www-form-urlencoded

test @ 1548369398
id: 1548369458:-,[0],-,-
content-type: application/x-www-form-urlencoded

test @ 1548369398
id: 1548369458:-,[1],-,-
content-type: application/x-www-form-urlencoded

test @ 1548369398
$ for i in test test test; do curl -v -d "$i @ `date +%s`" http://172.17.0.2/pub/$i; sleep 2; done
*   Trying 172.17.0.2...
* TCP_NODELAY set
* Connected to 172.17.0.2 (172.17.0.2) port 80 (#0)
> POST /pub/test HTTP/1.1
> Host: 172.17.0.2
> User-Agent: curl/7.63.0
> Accept: */*
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 17 out of 17 bytes
* Empty reply from server
* Connection #0 to host 172.17.0.2 left intact
curl: (52) Empty reply from server
*   Trying 172.17.0.2...
* TCP_NODELAY set
* Connected to 172.17.0.2 (172.17.0.2) port 80 (#0)
> POST /pub/test HTTP/1.1
> Host: 172.17.0.2
> User-Agent: curl/7.63.0
> Accept: */*
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 17 out of 17 bytes
< HTTP/1.1 201 Created
< Server: nginx/1.10.1
< Date: Thu, 24 Jan 2019 22:36:36 GMT
< Content-Type: text/plain
< Content-Length: 101
< Connection: keep-alive
< 
queued messages: 4
last requested: -1 sec. ago
active subscribers: 1
* Connection #0 to host 172.17.0.2 left intact
last message id: 1548369396:0*   Trying 172.17.0.2...
* TCP_NODELAY set
* Connected to 172.17.0.2 (172.17.0.2) port 80 (#0)
> POST /pub/test HTTP/1.1
> Host: 172.17.0.2
> User-Agent: curl/7.63.0
> Accept: */*
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 17 out of 17 bytes
* Empty reply from server
* Connection #0 to host 172.17.0.2 left intact
curl: (52) Empty reply from server

@slact
Copy link
Owner

slact commented Apr 19, 2019

quick update: i'm on this, i can reproduce it, but the issue is going to take some deep digging inside Nginx guts. Will update when I have it figured out.

@mehdone
Copy link

mehdone commented Jun 23, 2019

I'm having the same issue using nginx 1.16 and nchan 1.2.5

Any updates?

@slact
Copy link
Owner

slact commented Jun 27, 2019

Please let me know if this is still happening in version 1.2.6

@edwh
Copy link

edwh commented Nov 14, 2019

I'm seeing this on nginx 1.17.5 and nchan 1.2.6, though I don't believe I'm using nchan_authorize_request and nchan_publisher_upstream_request. They're not in my nchan conf, anyway.

Changing the worker processes to 1 seems to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants