Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New failure on IBMI - test.sequential/test-tls-psk-client #42152

Closed
mhdawson opened this issue Feb 28, 2022 · 16 comments
Closed

New failure on IBMI - test.sequential/test-tls-psk-client #42152

mhdawson opened this issue Feb 28, 2022 · 16 comments
Labels
ibm i Issues and PRs related to the IBM i platform.

Comments

@mhdawson
Copy link
Member

Version

master

Platform

IBMi

Subsystem

crypto

What steps will reproduce the bug?

Run the test as part of the test suite - https://ci.nodejs.org/job/node-test-commit-ibmi/nodes=ibmi73-ppc64/680/testReport/(root)/test/sequential_test_tls_psk_client_/

How often does it reproduce? Is there a required condition?

100% in nightly CI run.

What is the expected behavior?

Test passes

What do you see instead?

Error Message
fail (-15)
Stacktrace
timeout
Failed: Timed out

Additional information

No response

@mhdawson
Copy link
Member Author

mhdawson commented Feb 28, 2022

May be related to other processes running on the machine:

-bash: top: command not found
-bash-5.1$ ps -ef |grep node
iojs 25693163 25693162 0 Jan 21 - 25:30 /QOpenSys/pkgs/lib/jvm/openjdk-11/bin/java -Xmx128m -Dos400.stdio.convert=N -Djava.net.preferIPv4Stack=true -jar /home/iojs/slave.jar -secret ... -jnlpUrl https://ci.nodejs.org/computer/test-iinthecloud-ibmi73-ppc64_be-1/slave-agent.jnlp
iojs 29897356 1 0 Feb 24 - 0:00 /home/IOJS/build/workspace/libuv-test-commit-ibmi-cmake/BUILDTYPE/Release/nodes/ibmi73-ppc64/out/cmake/uv_run_tests spawn_helper4
iojs 29897414 1 0 Feb 24 - 0:00 /home/IOJS/build/workspace/libuv-test-commit-ibmi-cmake/BUILDTYPE/Release/nodes/ibmi73-ppc64/out/cmake/uv_run_tests spawn_helper4
iojs 29897456 1 0 Feb 24 - 0:00 /home/IOJS/build/workspace/libuv-test-commit-ibmi-cmake/BUILDTYPE/Release/nodes/ibmi73-ppc64/out/cmake/uv_run_tests spawn_helper4
iojs 30278207 1 0 Feb 27 - 0:00 /home/IOJS/build/workspace/libuv-test-commit-ibmi-cmake/BUILDTYPE/Release/nodes/ibmi73-ppc64/out/cmake/uv_run_tests spawn_helper4
nodejs 30538888 30538887 0 13:33:44 pts/0 0:00 -bash
nodejs 30538941 30538888 0 13:37:51 pts/0 0:00 /QOpenSys/usr/bin/ssh -q -t -x IOJS@127.0.0.1 cd /home/IOJS/build/workspace/node-test-commit-ibmi/nodes/ibmi73-ppc64 && exec $SHELL
iojs 30538951 30538944 0 13:38:20 pts/1 0:00 out/Release/node /home/IOJS/build/workspace/node-test-commit-ibmi/nodes/ibmi73-ppc64/test/sequential/test-t

Those libuv ones seem to be from the same day that the test started failing:

@mhdawson
Copy link
Member Author

Still timed out after killing those other processes:

bash-5.1$ tools/test.py sequential/test-tls-psk-client
=== release test-tls-psk-client ===                   
Path: sequential/test-tls-psk-client
Failed: Timed out
Command: out/Release/node /home/IOJS/build/workspace/node-test-commit-ibmi/nodes/ibmi73-ppc64/test/sequential/test-tls-psk-client.js
--- TIMEOUT ---
[02:00|% 100|+   0|-   1]: Done             

@mhdawson
Copy link
Member Author

Times out fairly quickly when run manually (5 sec is set in test itself):

ash-5.1$ ./node --version
v18.0.0-pre
bash-5.1$ ./node test/sequential/test-tls-psk-client.js 
Failed: Timed out

@mhdawson
Copy link
Member Author

Error seems to failure to connect

Error: connect ECONNREFUSED 127.0.0.1:12346
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) {
  errno: -79,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 12346
}
^C
bash-5.1$ !vi
vi test/sequential/test-tls-psk-client.js
bash-5.1$ ./node test/sequential/test-tls-psk-client.js 
Error: connect ECONNREFUSED 127.0.0.1:12346
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) {
  errno: -79,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 12346
}
Failed: Timed out
Error: connect ECONNREFUSED 127.0.0.1:12346
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) {
  errno: -79,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 12346
}

@mhdawson
Copy link
Member Author

Added some instrumentation to print stdout and stderr from the

const server = spawn(common.opensslCli, [
  's_server',
  '-accept', common.PORT,
  '-cipher', CIPHERS,
  '-psk', KEY,
  '-psk_hint', IDENTITY,
  '-nocert',
  '-rev',
]);
./node test/sequential/test-tls-psk-client.js 
Using default temp DH parameters

   0 items in the session cache
   0 client connects (SSL_connect())
   0 client renegotiates (SSL_connect())
   0 client connects that finished
   0 server accepts (SSL_accept())
   0 server renegotiates (SSL_accept())
   0 server accepts that finished
   0 session cache hits
   0 session cache misses
   0 session cache timeouts
   0 callback cache hits
   0 cache full overflows (128 allowed)

00000001:error:8000002A:system library:BIO_listen:Protocol driver not attached:../deps/openssl/openssl/crypto/bio/bio_sock2.c:277:calling setsockopt()
00000001:error:10000088:BIO routines:BIO_listen:listen v6 only:../deps/openssl/openssl/crypto/bio/bio_sock2.c:279:

@richardlau richardlau added the ibm i Issues and PRs related to the IBM i platform. label Feb 28, 2022
@mhdawson
Copy link
Member Author

Looks like the test tries to use the OpenSSL client but that the client is not being able to connect?

@ThePrez, @V-for-Vasili any ideas on what might be up on the machine?

@richardlau
Copy link
Member

May be related to other processes running on the machine:

Ah that may have been why the IBM i CI has been failing.
https://ci.nodejs.org/job/node-test-commit-ibmi/nodes=ibmi73-ppc64/679/ failed the same set of tests as https://ci.nodejs.org/job/node-test-commit-ibmi/nodes=ibmi73-ppc64/680/ and subsequent builds but was marked overall flaky -- the difference between that and the ones that failed was the leftover libuv processes.

@mhdawson
Copy link
Member Author

@richardlau thanks, I guess we'll see tonight if that clean has resolved the CI issue even though I still see it failing locally (ie it is marked as flaky and the stray processes made it fail in a way that caused the CI to fail anyway).

@richardlau
Copy link
Member

FWIW this bit of test-ci fails if leftover processes are detected

node/Makefile

Lines 532 to 537 in b3723fa

$(info Clean up any leftover processes, error if found.)
ps awwx | grep Release/node | grep -v grep | cat
@PS_OUT=`ps awwx | grep Release/node | grep -v grep | awk '{print $$1}'`; \
if [ "$${PS_OUT}" ]; then \
echo $${PS_OUT} | xargs kill -9; exit 1; \
fi

Unfortunately the paths of the leftover libuv tests, /home/IOJS/build/workspace/libuv-test-commit-ibmi-cmake/BUILDTYPE/Release/nodes/ibmi73-ppc64/out/cmake/uv_run_tests, contains the Release/node substring.

@V-for-Vasili
Copy link
Contributor

Just seen it pass on my machine; I remember seeing ECONNREFUSED from it before. The difference might be using openssl from yum vs. deps/openssl. Just a guess - need to take a better look

@ThePrez
Copy link
Contributor

ThePrez commented Mar 1, 2022

A couple notes about "protocol driver not attached":

  1. I think restarting TCP without IPv6 support will circumvent this issue
  2. In nginx, I needed to patch it so that nginx handles an EUNATCH from the call to accept(). I suspect libuv needs the same kind of change (probably src/unix/stream.c)

The EUNATCH is actually working as designed. On IBM i, there is the ability to have a protocol activated (in this case IPv6) with no active, associated device. This is actually the reason we start the slave jar with -Djava.net.preferIPv4Stack=true.

@ThePrez
Copy link
Contributor

ThePrez commented Mar 1, 2022

I think I have that backwards. On IBM i, we can have an active interface with no active protocol behind it. I've stopped the IPv6 loopback and I think this will temporarily circumvent...

@ThePrez
Copy link
Contributor

ThePrez commented Mar 1, 2022

Needed to restart TCP, which I have done. I also restarted the Jenkins slave. Of note, though, "protocol driver not attached" may not be the root cause of this issue.

@mhdawson
Copy link
Member Author

mhdawson commented Mar 1, 2022

Looks like it was orange today - https://ci.nodejs.org/job/node-test-commit-ibmi/686/., Will check again tomorrow before closing.

@mhdawson
Copy link
Member Author

mhdawson commented Mar 7, 2022

Ok, looks like runs are back to Orange consistently. Going to close this as there is another issue to fix the excluded tests.

@abmusse
Copy link
Contributor

abmusse commented Mar 8, 2023

Looks like we want IPv6 enabled as the openssl code attempts to call setsockopt with IPv6 socket.

This returns EUNATCH as IPv6 is disabled on the Node.js CI. I tested running this test on a dev system that has IPv6 enabled and the test passed!

For reference looks like the snippet of code based on the error message is

// from deps/openssl/openssl/crypto/bio/bio_sock2.c
  /* On OpenBSD it is always ipv6 only with ipv6 sockets thus read-only */
# if defined(IPV6_V6ONLY) && !defined(__OpenBSD__)
    if (BIO_ADDR_family(addr) == AF_INET6) {
        /*  
         * Note: Windows default of IPV6_V6ONLY is ON, and Linux is OFF.
         * Therefore we always have to use setsockopt here.
         */
        on = options & BIO_SOCK_V6_ONLY ? 1 : 0;
        if (setsockopt(sock, IPPROTO_IPV6, IPV6_V6ONLY,
                       (const void *)&on, sizeof(on)) != 0) {
            ERR_raise_data(ERR_LIB_SYS, get_last_socket_error(),                          
                           "calling setsockopt()");
            ERR_raise(ERR_LIB_BIO, BIO_R_LISTEN_V6_ONLY);
            return 0;
        }   
    }
# endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ibm i Issues and PRs related to the IBM i platform.
Projects
None yet
Development

No branches or pull requests

5 participants