HTTP Server Stops Responding to Any/All Requests #40724

DavidRusso · 2021-11-04T15:29:16Z

Version

14.18.1

Platform

IBM i 7.4

Subsystem

http

What steps will reproduce the bug?

The HTTP server can suddenly stop responding to any and all requests. When the server gets into this state it will remain listening and accepting client connections but will stop running the requestListener callback, so clients will suddenly stop getting responses to any request. When the server gets into this state, there is no output at stdout/stderr and the behavior persists for the duration of the process.

The problem is intermittent and it's not clear what causes it, but it seems to be triggered by certain network activity. The only way I can reproduce it on demand is by running a network vulnerability scan against the server using Nessus Essentials, and having the server running in multiple processes using cluster.

To be clear, I have seen the problem occur many times during normal use of the HTTP server without any network scans taking place. I have also seen it happen without cluster in play. This is just the only way I have found to reliably reproduce it.

To reproduce, run this simple server on IBM i:

if (cluster.isMaster) {
  cluster.fork();
}
else {
  const server = http.createServer((req, res) => {
    console.log(new Date(), "request received");
    res.writeHead(
      200,
      {
        "Content-Type": "text/plain",
        "Cache-Control": "no-store" 
      }
    );
    res.end("OK");
  });
  server.on("listening", () => {
    console.log("Listening on port", PORT);
  });
  server.listen(PORT);
}

Then run a Basic Network Scan using Nessus Essentials:

https://www.tenable.com/downloads/nessus

I've been using Nessus 8.15.2 and running it on Windows 10. To setup the scan:

Click "New Scan"
Choose the Basic Network Scan

Enter your Node HTTP server IP under Targets

Click the Discovery link on the left, and set the Scan Type to Custom.

Click on the Port Scanning link and set the Port Scan Range to the Node HTTP server port

Click on the Host Discovery link and set Ping the Remote Host to OFF

Leave all other settings at default
Launch scan and wait for it to complete

Once the scan is complete, the server will stop responding to any requests. The requestListener callback won't run, as shown by lack of console output, and the client will never get a response. This behavior will persist for the duration of the server process.

How often does it reproduce? Is there a required condition?

The steps above will reproduce the problem every time.

What is the expected behavior?

The server should continue responding to requests.

What do you see instead?

The server inexplicably stops responding to requests for the duration of the process.

Additional information

I have been able to reproduce this problem reliably on IBM i 7.4 and 7.2. I haven't yet tried on 7.3, but I have had users of my package report what I think is the same problem on IBM i 7.3. I have never seen this happen with Node running on other platforms.

IBM i NETSTAT reports everything normal while the server is in the bad state. For example, if I run this query while trying some requests:

select                       
  tcp_state, count(tcp_state)
from                         
  qsys2.netstat_info         
where                        
  local_port = 8080          
group by                     
  tcp_state                  
order by                     
  tcp_state

I get this output:

TCP           COUNT ( TCP_STATE )
State                            
ESTABLISHED                  2   
LISTEN                       2

Running a WireShark trace on the client side while using Chrome to make a request also looks normal. The TCP 3-way connection handshake goes normally, and the server also ACKs the HTTP request frame. Then Chrome just waits and waits for the response that never comes. Meanwhile it sends TCP Keep Alive probes to the server, and the server ACKs them as expected.

When the server process is in this state, the IBM i active job status and callstack look normal. The active job status is SELW (select wait) and the callstack shows the process is waiting on I/O via poll() call. This state is identical to when the server is responding to requests normally. Also, there is nothing in the IBM i job log.

This is a serious stability issue for Node.js on IBM i. As I mentioned above, I have seen this happen without any network scans going on, and without cluster in play. Use of cluster seems to exacerbate the problem.

The text was updated successfully, but these errors were encountered:

richardlau · 2021-11-04T15:44:51Z

cc @nodejs/platform-ibmi

ThePrez · 2021-11-09T19:50:02Z

We have been able to recreate on an IBM i 7.4 system. Will look into.

ThePrez · 2022-02-16T16:07:10Z

@DavidRusso , as you can see via the PR to libuv, we think we have this one figured out. We'll be building patches for the IBM repository shortly. I'll post here when they're published, and I would appreciate if you could do some verification

DavidRusso · 2022-02-16T16:48:33Z

@ThePrez , that's great news, thanks! Yes, I'll be glad to test/verify.

DavidRusso · 2022-09-16T14:22:59Z

Hi. Are the patches released yet in the IBM i repos? If so, what version(s) include the fix?

V-for-Vasili · 2022-09-28T15:13:25Z

Hello @DavidRusso - apologies for the delay.

There is a thread and a pr with ongoing discussion about bumping libuv version in Nodejs to 1.44.2 (which is the earliest libuv version that includes the fix for this issue). Once landed node 18 should have it fixed, but I am not sure if libuv version change will end up backported into earlier Node versions.

richardlau added the ibm i Issues and PRs related to the IBM i platform. label Nov 4, 2021

Mesteery added the http Issues or PRs related to the http subsystem. label Nov 4, 2021

V-for-Vasili mentioned this issue Feb 15, 2022

Aix, ibmi: Handle server hang when remote sends TCP RST libuv/libuv#3481

Closed

ThePrez mentioned this issue Feb 15, 2022

Aix, ibmi: Handle server hang when remote sends TCP RST libuv/libuv#3482

Merged

V-for-Vasili mentioned this issue Sep 28, 2022

deps: upgrade to libuv 1.44.2 #42340

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP Server Stops Responding to Any/All Requests #40724

HTTP Server Stops Responding to Any/All Requests #40724

DavidRusso commented Nov 4, 2021 •

edited

richardlau commented Nov 4, 2021

ThePrez commented Nov 9, 2021

ThePrez commented Feb 16, 2022

DavidRusso commented Feb 16, 2022

DavidRusso commented Sep 16, 2022

V-for-Vasili commented Sep 28, 2022

HTTP Server Stops Responding to Any/All Requests #40724

HTTP Server Stops Responding to Any/All Requests #40724

Comments

DavidRusso commented Nov 4, 2021 • edited

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

richardlau commented Nov 4, 2021

ThePrez commented Nov 9, 2021

ThePrez commented Feb 16, 2022

DavidRusso commented Feb 16, 2022

DavidRusso commented Sep 16, 2022

V-for-Vasili commented Sep 28, 2022

DavidRusso commented Nov 4, 2021 •

edited