Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Server Stops Responding to Any/All Requests #40724

Open
DavidRusso opened this issue Nov 4, 2021 · 6 comments
Open

HTTP Server Stops Responding to Any/All Requests #40724

DavidRusso opened this issue Nov 4, 2021 · 6 comments
Labels
http Issues or PRs related to the http subsystem. ibm i Issues and PRs related to the IBM i platform.

Comments

@DavidRusso
Copy link

DavidRusso commented Nov 4, 2021

Version

14.18.1

Platform

IBM i 7.4

Subsystem

http

What steps will reproduce the bug?

The HTTP server can suddenly stop responding to any and all requests. When the server gets into this state it will remain listening and accepting client connections but will stop running the requestListener callback, so clients will suddenly stop getting responses to any request. When the server gets into this state, there is no output at stdout/stderr and the behavior persists for the duration of the process.

The problem is intermittent and it's not clear what causes it, but it seems to be triggered by certain network activity. The only way I can reproduce it on demand is by running a network vulnerability scan against the server using Nessus Essentials, and having the server running in multiple processes using cluster.

To be clear, I have seen the problem occur many times during normal use of the HTTP server without any network scans taking place. I have also seen it happen without cluster in play. This is just the only way I have found to reliably reproduce it.

To reproduce, run this simple server on IBM i:

if (cluster.isMaster) {
  cluster.fork();
}
else {
  const server = http.createServer((req, res) => {
    console.log(new Date(), "request received");
    res.writeHead(
      200,
      {
        "Content-Type": "text/plain",
        "Cache-Control": "no-store" 
      }
    );
    res.end("OK");
  });
  server.on("listening", () => {
    console.log("Listening on port", PORT);
  });
  server.listen(PORT);
}

Then run a Basic Network Scan using Nessus Essentials:

https://www.tenable.com/downloads/nessus

I've been using Nessus 8.15.2 and running it on Windows 10. To setup the scan:

  1. Click "New Scan"

  2. Choose the Basic Network Scan

image

  1. Enter your Node HTTP server IP under Targets

image

  1. Click the Discovery link on the left, and set the Scan Type to Custom.

image

  1. Click on the Port Scanning link and set the Port Scan Range to the Node HTTP server port

image

  1. Click on the Host Discovery link and set Ping the Remote Host to OFF

image

  1. Leave all other settings at default

  2. Launch scan and wait for it to complete

Once the scan is complete, the server will stop responding to any requests. The requestListener callback won't run, as shown by lack of console output, and the client will never get a response. This behavior will persist for the duration of the server process.

How often does it reproduce? Is there a required condition?

The steps above will reproduce the problem every time.

What is the expected behavior?

The server should continue responding to requests.

What do you see instead?

The server inexplicably stops responding to requests for the duration of the process.

Additional information

I have been able to reproduce this problem reliably on IBM i 7.4 and 7.2. I haven't yet tried on 7.3, but I have had users of my package report what I think is the same problem on IBM i 7.3. I have never seen this happen with Node running on other platforms.

IBM i NETSTAT reports everything normal while the server is in the bad state. For example, if I run this query while trying some requests:

select                       
  tcp_state, count(tcp_state)
from                         
  qsys2.netstat_info         
where                        
  local_port = 8080          
group by                     
  tcp_state                  
order by                     
  tcp_state        

I get this output:

TCP           COUNT ( TCP_STATE )
State                            
ESTABLISHED                  2   
LISTEN                       2   

Running a WireShark trace on the client side while using Chrome to make a request also looks normal. The TCP 3-way connection handshake goes normally, and the server also ACKs the HTTP request frame. Then Chrome just waits and waits for the response that never comes. Meanwhile it sends TCP Keep Alive probes to the server, and the server ACKs them as expected.

When the server process is in this state, the IBM i active job status and callstack look normal. The active job status is SELW (select wait) and the callstack shows the process is waiting on I/O via poll() call. This state is identical to when the server is responding to requests normally. Also, there is nothing in the IBM i job log.

This is a serious stability issue for Node.js on IBM i. As I mentioned above, I have seen this happen without any network scans going on, and without cluster in play. Use of cluster seems to exacerbate the problem.

@richardlau
Copy link
Member

cc @nodejs/platform-ibmi

@richardlau richardlau added the ibm i Issues and PRs related to the IBM i platform. label Nov 4, 2021
@Mesteery Mesteery added the http Issues or PRs related to the http subsystem. label Nov 4, 2021
@ThePrez
Copy link
Contributor

ThePrez commented Nov 9, 2021

We have been able to recreate on an IBM i 7.4 system. Will look into.

@ThePrez
Copy link
Contributor

ThePrez commented Feb 16, 2022

@DavidRusso , as you can see via the PR to libuv, we think we have this one figured out. We'll be building patches for the IBM repository shortly. I'll post here when they're published, and I would appreciate if you could do some verification

@DavidRusso
Copy link
Author

@ThePrez , that's great news, thanks! Yes, I'll be glad to test/verify.

@DavidRusso
Copy link
Author

Hi. Are the patches released yet in the IBM i repos? If so, what version(s) include the fix?

@V-for-Vasili
Copy link
Contributor

Hello @DavidRusso - apologies for the delay.

There is a thread and a pr with ongoing discussion about bumping libuv version in Nodejs to 1.44.2 (which is the earliest libuv version that includes the fix for this issue). Once landed node 18 should have it fixed, but I am not sure if libuv version change will end up backported into earlier Node versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. ibm i Issues and PRs related to the IBM i platform.
Projects
None yet
Development

No branches or pull requests

5 participants