-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8316649: JMX connection timeout cannot be changed and uses the default of 0 (infinite) #16771
8316649: JMX connection timeout cannot be changed and uses the default of 0 (infinite) #16771
Conversation
/csr |
👋 Welcome back kevinw! A progress list of the required criteria for merging this PR into |
@kevinjwalls has indicated that a compatibility and specification (CSR) request is needed for this pull request. @kevinjwalls please create a CSR request for issue JDK-8316649 with the correct fix version. This pull request cannot be integrated until the CSR request is approved. |
@kevinjwalls The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
/label serviceability |
@kevinjwalls |
The introduction of the system property is reasonable. I have questions over the veracity of the context described in the bug. I think it's a firewall configuration issue. If there is not listening end point for a TCP connect request, this will, result in a ConnectException being thrown in the RMI client. As such, for no response to happen, It means that a TCP connection has been established with the firewall (at the TCP level), from the RMI (JMX) client and that connection has not been accepted or passed through by the firewall application. Thus, it would appear that the shutdown of the target server has not been registered in the firewall. In any case the change looks fine. |
@SuppressWarnings("removal") | ||
private static final int connectTimeout = // default 1 minute | ||
AccessController.doPrivileged((PrivilegedAction<Integer>) () -> | ||
Integer.getInteger("sun.rmi.transport.tcp.initialConnectTimeout", 60 * 1000).intValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exception here will prevent the class from being initialized, any subsequent attempts to use the class will then produce hard to diagnose error. If the class is not loaded by the main thread, such exception/error could be swallowed unless an uncaught exception handler was registered. I would suggest implementing the logic in a static block, catch any exception and revert to default behaviour (i.e. 0) if the value supplied for the property is not an integer, or not a valid integer, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the default agent uses / may use its own socket factories - are we still coming here even with those factories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exception here will prevent the class from being initialized...
Maybe we can break it, but this new property is following the same pattern I think as the neighbouring file src/java.rmi/share/classes/sun/rmi/transport/tcp/TCPChannel.java when it reads the system props. I see Integer.getInteger handles the obvious Exceptions, so specifying a strange value does not immediately break us.
If there's more protection needed then we have various other places to apply it that might need separate follow-up if you think there's a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the default agent uses / may use its own socket factories - are we still coming here even with those factories?
We do get here, yes (not saying you can't customise your way out of getting here). This is a stack from a test I was experimenting with, when it did see the timeout:
---------System.out:(4/132)----------
JMX URL = service:jmx:rmi://x
expectTimeout = true
sun.rmi.transport.tcp.initialConnectTimeout = 1
Test passed
----------System.err:(24/1791)----------
java.rmi.ConnectIOException: Exception creating connection to: x.x.x.x; nested exception is:
java.net.SocketTimeoutException
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:637)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:217)
at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:204)
at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:134)
at java.management.rmi/javax.management.remote.rmi.RMIServerImpl_Stub.newClient(RMIServerImpl_Stub.java:85)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2106)
at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:321)
at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:229)
at RMIConnectionTimeoutTest.main(RMIConnectionTimeoutTest.java:65)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.net.SocketTimeoutException
at java.base/java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:112)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:752)
at java.rmi/sun.rmi.transport.tcp.TCPDirectSocketFactory.createSocket(TCPDirectSocketFactory.java:57)
at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
... 13 more
STATUS:Passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see Integer.getInteger handles the obvious Exceptions, so specifying a strange value does not immediately break us.
Oh - right. It's Integer.getInteger
, not Integer.parseInt
Ok, then - I withdraw my first comment :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrt to the property name initialConnectTimeout might infer that it is an initial value which can be subsequentually modified, but that is not possible. As such, would sun.rmi.transport.tcp.connectTimeout be more appropriate -- dropping the initial ?
If the connectTimeout initialization code was placed in a static method, it could be directly unit testable :-) (if such a test was desirable for coverage completeness)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Property name: there is an existing System Property "sun.rmi.transport.connectionTimeout" which is a 15-second idle timeout. Having a "connectionTimeout" and this new one as "connectTimeout" could be confusing, even in different but very similar package names, hence naming this "initialConnectTimeout".
(So "initial" to signal that it's the initial connect of a socket, different to the existing "connectionTimeout".)
It does not seem like the kind of system property a user would expect to be read again on every usage, I think, let me know if you see that as a problem.
I am hoping that as we already have various properties fetched in the same manner, we don't see the need to pursue a test that the value is fetched and passed on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered naming it "socketConnectTimeout" instead? Or possibly "tcpConnectTimeout"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you named it "tcpConnectTimeout" then you could name the (future) property for [JDK-8320703](https://bugs.openjdk.org/browse/JDK-8320703: JMX SSL RMI Socket connection timeout cannot be changed) "tlsConnectTimeout"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that could work 8-)
Thanks @msheppar yes I agree with the comments. For testing I got as far as a test that often cannot see the timeout as we are too quick for a 1ms timeout. I was thinking that as the change is basic enough, getting a property and passing on a value, the usable test seems rather elaborate... We do have test/jdk/java/rmi/transport/handshakeTimeout/HandshakeTimeout.java which tests -Dsun.rmi.transport.tcp.handshakeTimeout=1 but that must reliably fail to handshake in 1ms. The basic connect is much faster. 8-) |
It may be that changing the base RMI implementation isn't desirable. So a new JMX-specific property, not a general RMI property, and we limit the impact of the change, so it won't affect any other RMI apps. I'm putting this in #16856 it's in draft right now, I'll complete the details and see if it is a better way forward (will close this PR if the new one is better). |
Thanks for the feedback. At the moment the demand for the timeout is not strong, and the behaviour is what it has been for many years, so I am going to close this without progressing it. |
RMI Connections (in general) should use a timeout on the Socket connect call by default.
JMX connections use RMI and some connection failures never terminate. The hang described in 8316649 is hard to reproduce manually: the description says it can be caused by a firewall that never returns a response.
src/java.rmi/share/classes/sun/rmi/transport/tcp/TCPChannel.java
has other timeouts but nothing to control the initial Socket connect.
Defaulting to a 1 minute timeout on connect has no effect on existing tests for RMI and JMX, and should go unnoticed in applications unless there really is a significant connection delay. Specifying zero for the new System Property sun.rmi.transport.tcp.initialConnectTimeout uses the old code path of a connect with no timeout.
I have a test, but it is not realistically usable: although specifying a 1 millisecond timeout will often fail (as expected/desired for the test), it will often pass as the connection happens immediately.
Progress
Issues
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16771/head:pull/16771
$ git checkout pull/16771
Update a local copy of the PR:
$ git checkout pull/16771
$ git pull https://git.openjdk.org/jdk.git pull/16771/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16771
View PR using the GUI difftool:
$ git pr show -t 16771
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16771.diff
Webrev
Link to Webrev Comment