Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8316649: JMX connection timeout cannot be changed and uses the default of 0 (infinite) #16771

Closed

Conversation

kevinjwalls
Copy link
Contributor

@kevinjwalls kevinjwalls commented Nov 21, 2023

RMI Connections (in general) should use a timeout on the Socket connect call by default.

JMX connections use RMI and some connection failures never terminate. The hang described in 8316649 is hard to reproduce manually: the description says it can be caused by a firewall that never returns a response.

src/java.rmi/share/classes/sun/rmi/transport/tcp/TCPChannel.java
has other timeouts but nothing to control the initial Socket connect.

Defaulting to a 1 minute timeout on connect has no effect on existing tests for RMI and JMX, and should go unnoticed in applications unless there really is a significant connection delay. Specifying zero for the new System Property sun.rmi.transport.tcp.initialConnectTimeout uses the old code path of a connect with no timeout.

I have a test, but it is not realistically usable: although specifying a 1 millisecond timeout will often fail (as expected/desired for the test), it will often pass as the connection happens immediately.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Change requires CSR request JDK-8320580 to be approved
  • Commit message must refer to an issue

Issues

  • JDK-8316649: JMX connection timeout cannot be changed and uses the default of 0 (infinite) (Enhancement - P4)
  • JDK-8320580: JMX RMI Connections should have a connect timeout property (CSR)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16771/head:pull/16771
$ git checkout pull/16771

Update a local copy of the PR:
$ git checkout pull/16771
$ git pull https://git.openjdk.org/jdk.git pull/16771/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16771

View PR using the GUI difftool:
$ git pr show -t 16771

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16771.diff

Webrev

Link to Webrev Comment

@kevinjwalls
Copy link
Contributor Author

/csr

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 21, 2023

👋 Welcome back kevinw! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Nov 21, 2023
@openjdk
Copy link

openjdk bot commented Nov 21, 2023

@kevinjwalls has indicated that a compatibility and specification (CSR) request is needed for this pull request.

@kevinjwalls please create a CSR request for issue JDK-8316649 with the correct fix version. This pull request cannot be integrated until the CSR request is approved.

@openjdk
Copy link

openjdk bot commented Nov 21, 2023

@kevinjwalls The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Nov 21, 2023
@kevinjwalls
Copy link
Contributor Author

/label serviceability

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Nov 22, 2023
@openjdk
Copy link

openjdk bot commented Nov 22, 2023

@kevinjwalls
The serviceability label was successfully added.

@kevinjwalls kevinjwalls marked this pull request as ready for review November 23, 2023 09:32
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 23, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 23, 2023

Webrevs

@msheppar
Copy link

The introduction of the system property is reasonable. I have questions over the veracity of the context described in the bug. I think it's a firewall configuration issue. If there is not listening end point for a TCP connect request, this will, result in a ConnectException being thrown in the RMI client. As such, for no response to happen, It means that a TCP connection has been established with the firewall (at the TCP level), from the RMI (JMX) client and that connection has not been accepted or passed through by the firewall application. Thus, it would appear that the shutdown of the target server has not been registered in the firewall.

In any case the change looks fine.
Do you have a test c.f. bug comment

Comment on lines +43 to +46
@SuppressWarnings("removal")
private static final int connectTimeout = // default 1 minute
AccessController.doPrivileged((PrivilegedAction<Integer>) () ->
Integer.getInteger("sun.rmi.transport.tcp.initialConnectTimeout", 60 * 1000).intValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exception here will prevent the class from being initialized, any subsequent attempts to use the class will then produce hard to diagnose error. If the class is not loaded by the main thread, such exception/error could be swallowed unless an uncaught exception handler was registered. I would suggest implementing the logic in a static block, catch any exception and revert to default behaviour (i.e. 0) if the value supplied for the property is not an integer, or not a valid integer, etc...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the default agent uses / may use its own socket factories - are we still coming here even with those factories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exception here will prevent the class from being initialized...

Maybe we can break it, but this new property is following the same pattern I think as the neighbouring file src/java.rmi/share/classes/sun/rmi/transport/tcp/TCPChannel.java when it reads the system props. I see Integer.getInteger handles the obvious Exceptions, so specifying a strange value does not immediately break us.

If there's more protection needed then we have various other places to apply it that might need separate follow-up if you think there's a case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the default agent uses / may use its own socket factories - are we still coming here even with those factories?

We do get here, yes (not saying you can't customise your way out of getting here). This is a stack from a test I was experimenting with, when it did see the timeout:

---------System.out:(4/132)----------
JMX URL = service:jmx:rmi://x
expectTimeout = true
sun.rmi.transport.tcp.initialConnectTimeout = 1
Test passed
----------System.err:(24/1791)----------
java.rmi.ConnectIOException: Exception creating connection to: x.x.x.x; nested exception is:
        java.net.SocketTimeoutException
        at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:637)
        at java.rmi/sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:217)
        at java.rmi/sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:204)
        at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:134)
        at java.management.rmi/javax.management.remote.rmi.RMIServerImpl_Stub.newClient(RMIServerImpl_Stub.java:85)
        at java.management.rmi/javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2106)
        at java.management.rmi/javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:321)
        at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
        at java.management/javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:229)
        at RMIConnectionTimeoutTest.main(RMIConnectionTimeoutTest.java:65)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
        at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.net.SocketTimeoutException
        at java.base/java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:112)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
        at java.base/java.net.Socket.connect(Socket.java:752)
        at java.rmi/sun.rmi.transport.tcp.TCPDirectSocketFactory.createSocket(TCPDirectSocketFactory.java:57)
        at java.rmi/sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
        ... 13 more
STATUS:Passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see Integer.getInteger handles the obvious Exceptions, so specifying a strange value does not immediately break us.

Oh - right. It's Integer.getInteger, not Integer.parseInt Ok, then - I withdraw my first comment :-)

Copy link

@msheppar msheppar Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrt to the property name initialConnectTimeout might infer that it is an initial value which can be subsequentually modified, but that is not possible. As such, would sun.rmi.transport.tcp.connectTimeout be more appropriate -- dropping the initial ?

If the connectTimeout initialization code was placed in a static method, it could be directly unit testable :-) (if such a test was desirable for coverage completeness)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Property name: there is an existing System Property "sun.rmi.transport.connectionTimeout" which is a 15-second idle timeout. Having a "connectionTimeout" and this new one as "connectTimeout" could be confusing, even in different but very similar package names, hence naming this "initialConnectTimeout".
(So "initial" to signal that it's the initial connect of a socket, different to the existing "connectionTimeout".)
It does not seem like the kind of system property a user would expect to be read again on every usage, I think, let me know if you see that as a problem.

I am hoping that as we already have various properties fetched in the same manner, we don't see the need to pursue a test that the value is fetched and passed on.

Copy link
Member

@dfuch dfuch Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered naming it "socketConnectTimeout" instead? Or possibly "tcpConnectTimeout"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you named it "tcpConnectTimeout" then you could name the (future) property for [JDK-8320703](https://bugs.openjdk.org/browse/JDK-8320703: JMX SSL RMI Socket connection timeout cannot be changed) "tlsConnectTimeout"...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that could work 8-)

@kevinjwalls
Copy link
Contributor Author

In any case the change looks fine. Do you have a test c.f. bug comment

Thanks @msheppar yes I agree with the comments.

For testing I got as far as a test that often cannot see the timeout as we are too quick for a 1ms timeout. I was thinking that as the change is basic enough, getting a property and passing on a value, the usable test seems rather elaborate...

We do have test/jdk/java/rmi/transport/handshakeTimeout/HandshakeTimeout.java which tests -Dsun.rmi.transport.tcp.handshakeTimeout=1 but that must reliably fail to handshake in 1ms. The basic connect is much faster. 8-)

@kevinjwalls
Copy link
Contributor Author

It may be that changing the base RMI implementation isn't desirable.
I am bringing up a different implementation, using a new ClientSocketFactory for JMX which implements the connect timeout.

So a new JMX-specific property, not a general RMI property, and we limit the impact of the change, so it won't affect any other RMI apps.

I'm putting this in #16856 it's in draft right now, I'll complete the details and see if it is a better way forward (will close this PR if the new one is better).

@kevinjwalls
Copy link
Contributor Author

Thanks for the feedback.
Additionally, there can be a concern that affecting all RMI activity with the timeout property could produce unintended side-effects.

At the moment the demand for the timeout is not strong, and the behaviour is what it has been for many years, so I am going to close this without progressing it.

@kevinjwalls kevinjwalls closed this Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org csr Pull request needs approved CSR before integration rfr Pull request is ready for review serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants