Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP_KEEPIDLE value set by socket.setKeepAlive() is ignored on first keepalive probe #38445

Open
DavidRusso opened this issue Apr 27, 2021 · 2 comments
Labels
ibm i Issues and PRs related to the IBM i platform. net Issues and PRs related to the net subsystem.

Comments

@DavidRusso
Copy link

DavidRusso commented Apr 27, 2021

  • Version: v14.16.0
  • Platform: IBM i 7.2
  • Subsystem: net

What steps will reproduce the bug?

I have only observed this problem on IBM i 7.2. I'm not sure if it happens on other platforms -- I haven't tried it.

Create a simple server program like this:

const http = require("http");

const server = http.createServer((req, res) => {
  res.writeHead(200, { "Content-Type": "text/plain" })
  res.end("OK");
});
server.on("connection", socket => {
  socket.setKeepAlive(true, 60000);
});
server.listen(process.env["PORT"] || 8080);

Then use a simple client program like this to connect to it while running a packet capture, so that you can see when TCP keepalive probes are sent:

const net = require("net");

const socket = net.createConnection({
  host: "your_IBMi",
  port: process.env["PORT"] || 8080
});

The initial keepalive probe should be sent after the connection is idle for the amount of time passed to socket.setKeepAlive(), but instead the first probe is sent after the system-wide default time given by the value TCPKEEPALV on IBM i's CHGTCPA command. Subsequent keepalive probes are then sent on the correct interval.

For example, if you set the system-wide default to 5 minutes using this command:

CHGTCPA TCPKEEPALV(5)

And then run the test above, the first keepalive will be sent after 5 minutes, and then subsequent keepalives will be sent after 1 minute.

Additional information

The problem relates to the order of the setsockopt() calls used to enable TCP keepalive and to set the idle time. I've found that, on IBM i, coding like this produces the problem I'm describing here:

int enable = 1;
unsigned int tcpKeepIdle = 60;
setsockopt(client_fd, SOL_SOCKET, SO_KEEPALIVE, &enable, sizeof(enable));
setsockopt(client_fd, IPPROTO_TCP, TCP_KEEPIDLE, &tcpKeepIdle, sizeof(tcpKeepIdle));

However, if TCP_KEEPIDLE is set first, then it works as expected:

setsockopt(client_fd, IPPROTO_TCP, TCP_KEEPIDLE, &tcpKeepIdle, sizeof(tcpKeepIdle));
setsockopt(client_fd, SOL_SOCKET, SO_KEEPALIVE, &enable, sizeof(enable));

Reversing the setsockopt() calls in this code would fix the problem:

@richardlau
Copy link
Member

cc @nodejs/platform-ibmi

@Ayase-252 Ayase-252 added ibm i Issues and PRs related to the IBM i platform. net Issues and PRs related to the net subsystem. labels Apr 28, 2021
@DavidRusso
Copy link
Author

I tested this on Windows 10/x64 and Ubuntu/IBM POWER and things work correctly on those platforms. So, the problem does seem to be specific to IBM i.

I also tested on IBM i 7.4 and found that the problem happens the same way there as on 7.2.

Another interesting thing that I found is that the initialDelay value passed to socket.setKeepAlive() is only ignored for the first keepalive probe when it is less than the system-wide default...

For example, if the IBM i system default TCP keepalive time is set to 5 minutes like this:

CHGTCPA TCPKEEPALV(5)

Then values like this are not ignored on the first keepalive probe:

socket.setKeepAlive(true, 360000);
// Or...
socket.setKeepAlive(true, 420000);

But values like this are ignored on the first probe:

socket.setKeepAlive(true, 240000);
// Or...
socket.setKeepAlive(true, 60000);

That is really strange, but in either case, setting TCP_KEEPIDLE before SO_KEEPALIVE seems to make things work correctly.

If it helps, here is a simple socket server program that can be used to test the behavior. This code can be compiled with GCC in PASE for IBM i:

#include <netinet/in.h>
#include <netinet/tcp.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>

int main(int argc, char * argv[]) {

  int server_fd, client_fd;
  struct sockaddr_in address;
  int addressLen = sizeof(address);
  uint16_t port = 8080;
  int enable = 1;
  unsigned int tcpKeepIdle = 60;
  
  server_fd = socket(AF_INET, SOCK_STREAM, 0);
  if (server_fd == -1) {
    perror("socket()");
    exit(EXIT_FAILURE);
  }
  
  if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(enable)) != 0) {
    perror("setsockopt()");
    exit(EXIT_FAILURE);
  }
  
  if (getenv("PORT") != NULL)
    port = atoi(getenv("PORT"));
  address.sin_family = AF_INET;
  address.sin_addr.s_addr = INADDR_ANY;
  address.sin_port = htons(port);
  if (bind(server_fd, (struct sockaddr *) &address, sizeof(address)) != 0) {
    perror("bind()");
    exit(EXIT_FAILURE);
  }
  
  if (listen(server_fd, 5) != 0) {
    perror("listen()");
    exit(EXIT_FAILURE);
  }
  
  printf("Server listening on *:%u\n", port);
  
  client_fd = accept(server_fd, (struct sockaddr *) &address, (socklen_t *) &addressLen);
  if (client_fd == -1) {
    perror("accept()");
    exit(EXIT_FAILURE);
  }

  printf("Client connected.\n");
  
  if (setsockopt(client_fd, IPPROTO_TCP, TCP_KEEPIDLE, &tcpKeepIdle, sizeof(tcpKeepIdle)) != 0) {
    perror("setsockopt()");
    exit(EXIT_FAILURE);
  }
  if (setsockopt(client_fd, SOL_SOCKET, SO_KEEPALIVE, &enable, sizeof(enable)) != 0) {
    perror("setsockopt()");
    exit(EXIT_FAILURE);
  }
  
  printf("SO_KEEPALIVE enabled, TCP_KEEPIDLE set to %u\n", tcpKeepIdle);
  printf("Waiting 20 minutes\n");
  
  sleep(1200);

  close(client_fd);
  close(server_fd);

  return 0;
  
}

This code is in the working state, with TCP_KEEPIDLE set before SO_KEEPALIVE. If you change it so that SO_KEEPALIVE is set first (as in uv__tcp_keepalive()), then the problem occurs.

FWIW, I also tested with a program like above, but running in the native IBM ILE environment, and found that the problem is the same. It seems that IBM i is just a bit quirky here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ibm i Issues and PRs related to the IBM i platform. net Issues and PRs related to the net subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants