Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8267353: java/net/SctpSanity.java fails due to Protocol not supported #4199

Closed
wants to merge 1 commit into from

Conversation

DamonFool
Copy link
Member

@DamonFool DamonFool commented May 26, 2021

Hi all,

java/net/SctpSanity.java fails on some of our test machines due to Protocol not supported.
The reason is that the test fails to detect all the cases when a machine doesn't support SCTP.

The fix just follows what are done in [1][2][3] to detect unsupported paltforms at the beginning.

Thanks.
Best regards,
Jie

[1] https://github.com/openjdk/jdk/blob/master/test/jdk/com/sun/nio/sctp/SctpMultiChannel/Util.java#L59
[2] https://github.com/openjdk/jdk/blob/master/test/jdk/com/sun/nio/sctp/SctpServerChannel/Util.java#L59
[3] https://github.com/openjdk/jdk/blob/master/test/jdk/com/sun/nio/sctp/SctpChannel/Util.java#L59


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8267353: java/net/SctpSanity.java fails due to Protocol not supported

Contributors

  • Junji Wang <junjiwang@tencent.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4199/head:pull/4199
$ git checkout pull/4199

Update a local copy of the PR:
$ git checkout pull/4199
$ git pull https://git.openjdk.java.net/jdk pull/4199/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4199

View PR using the GUI difftool:
$ git pr show -t 4199

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4199.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 26, 2021

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@DamonFool
Copy link
Member Author

DamonFool commented May 26, 2021

/contributor add Junji Wang junjiwang@tencent.com

@openjdk openjdk bot added the rfr Pull request is ready for review label May 26, 2021
@openjdk
Copy link

openjdk bot commented May 26, 2021

@DamonFool
Contributor Junji Wang <junjiwang@tencent.com> successfully added.

@openjdk
Copy link

openjdk bot commented May 26, 2021

@DamonFool The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the net net-dev@openjdk.org label May 26, 2021
@mlbridge
Copy link

mlbridge bot commented May 26, 2021

Webrevs

@AlanBateman
Copy link
Contributor

AlanBateman commented May 26, 2021

I think this looks okay and aligns with Util.isSCTPSupported in test/jdk/com/sun/nio/sctp. One thing that I'm not 100% sure about is why we have a sanity test in test/jdk/java/net, @ChrisHegarty may remember.

@ChrisHegarty
Copy link
Member

ChrisHegarty commented May 26, 2021

This is probably ok, but a little surprising. Can you say a more about the machine setup?

@dfuch
Copy link
Member

dfuch commented May 26, 2021

I am intrigued - which exception do you get on the machine where SCTP is not supported?
SctpChannel.open() is supposed to throw UnsupportedOperationException - If the SCTP protocol is not supported, and the test catches UnsupportedOperationException.

Note: I don't have objection to the proposed fix.

@DamonFool
Copy link
Member Author

DamonFool commented May 26, 2021

I am intrigued - which exception do you get on the machine where SCTP is not supported?

Hi @dfuch , we got this exception, which was copied from the JBS.

java.net.SocketException: Protocol not supported
        at jdk.sctp/sun.nio.ch.sctp.SctpNet.socket0(Native Method)
        at jdk.sctp/sun.nio.ch.sctp.SctpNet.socket(SctpNet.java:81)
        at jdk.sctp/sun.nio.ch.sctp.SctpChannelImpl.<init>(SctpChannelImpl.java:138)
        at jdk.sctp/com.sun.nio.sctp.SctpChannel.open(SctpChannel.java:170)
        at SctpSanity.testSctpChannel(SctpSanity.java:58)
        at SctpSanity.main(SctpSanity.java:50)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
        at java.base/java.lang.Thread.run(Thread.java:831)

@DamonFool
Copy link
Member Author

DamonFool commented May 26, 2021

This is probably ok, but a little surprising. Can you say a more about the machine setup?

What do you mean by setup?
And how can I get these setups?
Thanks.

@AlanBateman
Copy link
Contributor

AlanBateman commented May 26, 2021

java.net.SocketException: Protocol not supported
at jdk.sctp/sun.nio.ch.sctp.SctpNet.socket0(Native Method)

Do we know if the socket(IPPROTO_SCTP) or setsockopt(IPPROTO_SCTP/SCTP_EVENTS) is failing? I would have expected the former to check for the not supported case (and the latter to close the fd, but that is a separate issue).

@ChrisHegarty
Copy link
Member

ChrisHegarty commented May 26, 2021

This is probably ok, but a little surprising. Can you say a more about the machine setup?

What do you mean by setup?
And how can I get these setups?
Thanks.

Let me rephrase.

You said: "The reason is that the test fails to detect all the cases when a machine doesn't support SCTP."

My question is: Please explain the case (or set of cases) where your "machine doesn't support SCTP" ?

@DamonFool
Copy link
Member Author

DamonFool commented May 26, 2021

Do we know if the socket(IPPROTO_SCTP) or setsockopt(IPPROTO_SCTP/SCTP_EVENTS) is failing?

This is probably ok, but a little surprising. Can you say a more about the machine setup?

What do you mean by setup?
And how can I get these setups?
Thanks.

Let me rephrase.

You said: "The reason is that the test fails to detect all the cases when a machine doesn't support SCTP."

My question is: Please explain the case (or set of cases) where your "machine doesn't support SCTP" ?

I'm not an expert in this area.
I don't know how to figure out why my machine don't support SCTP.
Any suggestions?
Thanks.

@DamonFool
Copy link
Member Author

DamonFool commented May 26, 2021

Do we know if the socket(IPPROTO_SCTP) or setsockopt(IPPROTO_SCTP/SCTP_EVENTS) is failing?

How can I get to know if the socket(IPPROTO_SCTP) or setsockopt(IPPROTO_SCTP/SCTP_EVENTS) is failing?
Any suggestions?
Thanks.

@DamonFool
Copy link
Member Author

DamonFool commented May 26, 2021

java/net/SctpSanity.java is the only one, which failed on the machine.
Other sctp related tests like jdk/com/sun/nio/sctp/SctpChannel/Connect.java are all fine.

@AlanBateman
Copy link
Contributor

AlanBateman commented May 26, 2021

How can I get to know if the socket(IPPROTO_SCTP) or setsockopt(IPPROTO_SCTP/SCTP_EVENTS) is failing?

strace -f should help, or just add trace message. If socket(IPPROTO_SCTP) is failing then it suggests to me that we are missing the check for ENOPROTO.

@DamonFool
Copy link
Member Author

DamonFool commented May 26, 2021

strace -f should help, or just add trace message.

Thanks for your help @AlanBateman .
I got the following msg (Connection timed out).

[pid 93582] socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP <unfinished ...>
[pid 93557] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 93557] futex(0x2b05c03cff78, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 93557] getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
[pid 93557] getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
[pid 93557] getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
[pid 93557] getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
[pid 93557] futex(0x2b05c03cffa4, FUTEX_WAIT_BITSET_PRIVATE, 5, {10828639, 700862386}, ffffffff <unfinished ...>
[pid 93582] <... socket resumed> )      = -1 EPROTONOSUPPORT (Protocol not supported)

Thanks.

@AlanBateman
Copy link
Contributor

AlanBateman commented May 26, 2021

Okay, so I think the issue is that EPROTONOSUPPORT is not handled as I would have expected that to cause UOE to be thrown. I suspect the isSCTPSupported check in Util.java (used by the tests in jdk/test/com/sun/nio/sctp) are masking this. My memory is that there were several relilability issues with SCTP in the early days and it might be that this is why they skip when any error is returned creating the SCTP socket.

@ChrisHegarty
Copy link
Member

ChrisHegarty commented May 26, 2021

There is no check for ENOPROTO in the JDK SCTP code. There is an implicit assumption that SCTP is "supported" when the native libsctp.so.1 library successfully loads and can be introspected. Maybe there should be an explicit ENOPROTO in Java_sun_nio_ch_sctp_SctpNet_socket0.

Maybe the machine in question has the libsctp.so.1 library, but the kernel module is not loaded or something?

$ which checksctp
/usr/bin/checksctp

$ /usr/bin/checksctp
SCTP supported

$ lsmod | grep sctp
sctp                  299008  2 
libcrc32c              16384  4 nf_conntrack,nf_nat,xfs,sctp

$ modinfo sctp
filename:       /lib/modules/4.14.35-1902.0.18.el7uek.x86_64/kernel/net/sctp/sctp.ko.xz
license:        GPL
description:    Support for the SCTP protocol (RFC2960)
author:         Linux Kernel SCTP developers <linux-sctp@vger.kernel.org>
alias:          net-pf-10-proto-132
alias:          net-pf-2-proto-132
srcversion:     CA88E7C1DCACA58B08A8E35
depends:        libcrc32c
retpoline:      Y
intree:         Y
name:           sctp
vermagic:       4.14.35-1902.0.18.el7uek.x86_64 SMP mod_unload modversions 
sig_id:         ...

$ /sbin/modprobe -n -v sctp

@msheppar
Copy link

msheppar commented May 26, 2021

if sctp is configured on a linux system would there exist a /proc/net/sctp and have an entry in /proc/net/protocols ?

@DamonFool
Copy link
Member Author

DamonFool commented May 27, 2021

if sctp is configured on a linux system would there exist a /proc/net/sctp and have an entry in /proc/net/protocols ?

No /proc/net/sctp and no entry in /proc/net/protocols.
But I don't think it's the reason since it passed on another machine without /proc/net/sctp and the entry too.
Thanks.

@DamonFool
Copy link
Member Author

DamonFool commented May 27, 2021

Maybe the machine in question has the libsctp.so.1 library, but the kernel module is not loaded or something?

Yes, you are right.
We do have libsctp.so.1 and no kernel module.

So what's your suggestion next?
Thanks.

@DamonFool
Copy link
Member Author

DamonFool commented May 27, 2021

if sctp is configured on a linux system would there exist a /proc/net/sctp and have an entry in /proc/net/protocols ?

No /proc/net/sctp and no entry in /proc/net/protocols.
But I don't think it's the reason since it passed on another machine without /proc/net/sctp and the entry too.
Thanks.

After sudo yum install lksctp-tools, which would install libsctp.so.1, both the machines fail now.
So @ChrisHegarty is right: the machine in question has the libsctp.so.1 library, but the kernel module is not loaded.
Thanks.

@ChrisHegarty
Copy link
Member

ChrisHegarty commented May 27, 2021

@DamonFool Thanks for checking this, and helping track down the root cause of the issue.

The machine configuration is a little unusual (since it has the user-level libraries installed, while the kernel module is not loaded). Clearly, to workaround the issue, the "sctp" kernel module could be loaded, but that will just serve to mask/hide the underlying problem.

The bug is really in the Java_sun_nio_ch_sctp_SctpNet_socket0 native method ( SctpNet.c ) - there should be an explicit check for ENOPROTO after calling socket(2), throwing UnsupportedOperationException if that is the case.

@DamonFool I would much prefer to keep the test as it is and fix the underlying bug, but this may be more than you want to do? If so, I'll get this done myself.

@DamonFool
Copy link
Member Author

DamonFool commented May 27, 2021

@DamonFool I would much prefer to keep the test as it is and fix the underlying bug, but this may be more than you want to do? If so, I'll get this done myself.

Please feel free to do so.
I'll close this pr soon.
Thanks.

@DamonFool DamonFool closed this May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net net-dev@openjdk.org rfr Pull request is ready for review
5 participants