Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform ssl handshake after socket timeout and buffer size settings #1584

Merged

Conversation

michail-nikolaev
Copy link
Contributor

@michail-nikolaev michail-nikolaev commented Oct 18, 2019

Currently setSoTimeout, setKeepAlive, setReceiveBufferSize and setSendBufferSize are called after enableSSL. In such a case SSL handshake uses only CONNECT_TIMEOUT and ignores other socket settings.

After few network splits we have found a lot of connections in next state for a very long time:

#15680 daemon prio=5 os_prio=0 cpu=47.27ms elapsed=76351.16s tid=0x00007f4abc0029e0 nid=0x4a93 runnable  [0x00007f4b4b7f6000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(java.base@11.0.3-internal/Native Method)
        at java.net.SocketInputStream.socketRead(java.base@11.0.3-internal/SocketInputStream.java:115)
        at java.net.SocketInputStream.read(java.base@11.0.3-internal/SocketInputStream.java:168)
        at java.net.SocketInputStream.read(java.base@11.0.3-internal/SocketInputStream.java:140)
        at sun.security.ssl.SSLSocketInputRecord.read(java.base@11.0.3-internal/SSLSocketInputRecord.java:448)
        at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(java.base@11.0.3-internal/SSLSocketInputRecord.java:237)
        at sun.security.ssl.SSLSocketInputRecord.decode(java.base@11.0.3-internal/SSLSocketInputRecord.java:190)
        at sun.security.ssl.SSLTransport.decode(java.base@11.0.3-internal/SSLTransport.java:108)
        at sun.security.ssl.SSLSocketImpl.decode(java.base@11.0.3-internal/SSLSocketImpl.java:1152)
        at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(java.base@11.0.3-internal/SSLSocketImpl.java:1063)
        at sun.security.ssl.SSLSocketImpl.startHandshake(java.base@11.0.3-internal/SSLSocketImpl.java:402)
        - locked <0x00000007f901bae0> (a sun.security.ssl.TransportContext)
        at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:62)
        at org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:389)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:160)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
        at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195)
        at org.postgresql.Driver.makeConnection(Driver.java:452)
        at org.postgresql.Driver.connect(Driver.java:254)
        at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:117)
        at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:123)
        at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:375)
        at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:204)
        at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:445)
        at com.zaxxer.hikari.pool.HikariPool.access$200(HikariPool.java:72)
        at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:632)
        at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:618)
        at java.util.concurrent.FutureTask.run(java.base@11.0.3-internal/FutureTask.java:264)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.3-internal/ThreadPoolExecutor.java:1128)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.3-internal/ThreadPoolExecutor.java:628)
        at java.lang.Thread.run(java.base@11.0.3-internal/Thread.java:834)

It caused because sockettimeout was not applied to the connection at the moment of a network partition.

Env:

  • PostgreSQL 10.10 (Ubuntu 10.10-103) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
  • OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.3+7, mixed mode)
  • postgresql-42.2.2.jar

Workaround:

  • Use LOGIN_TIMEOUT.

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Does mvn checkstyle:check pass ?
  3. Have you added your new test classes to an existing test suite?

Changes to Existing Features:

  • Does this break existing behaviour? If so please explain.
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully run tests with your changes locally?

@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Oct 18, 2019

@michail-nikolaev
Copy link
Contributor Author

@michail-nikolaev michail-nikolaev commented Oct 18, 2019

seems like IsValidTest is flapping...

seems like related to #1581

@davecramer davecramer merged commit e39a0be into pgjdbc:master Oct 30, 2019
2 of 3 checks passed
anton0xf pushed a commit to anton0xf/pgjdbc that referenced this issue Mar 6, 2020
This is connected with PR pgjdbc#1584 .
Now 'socketTimeout' property is a global query timeout, so we can't set it too low.
But on network split, it is possible that creation of a new connection
will stick on reading response from one of candidate hosts (like in PR above)
for a long time (our big socketTimeout).
The simplest solution is to decrease socket timeout during connection phase, and then restore it.
anton0xf pushed a commit to anton0xf/pgjdbc that referenced this issue Mar 11, 2020
This is connected with PR pgjdbc#1584 .
Now 'socketTimeout' property is a global query timeout, so we can't set it too low.
But on network split, it is possible that creation of a new connection
will stick on reading response from one of candidate hosts (like in PR above)
for a long time (our big socketTimeout).
The simplest solution is to decrease socket timeout during connection phase, and then restore it.
anton0xf pushed a commit to anton0xf/pgjdbc that referenced this issue Mar 11, 2020
This is connected with PR pgjdbc#1584 .
Now 'socketTimeout' property is a global query timeout, so we can't set it too low.
But on network split, it is possible that creation of a new connection
will stick on reading response from one of candidate hosts (like in PR above)
for a long time (our big socketTimeout).
The simplest solution is to decrease socket timeout during connection phase, and then restore it.
anton0xf pushed a commit to anton0xf/pgjdbc that referenced this issue Mar 11, 2020
This is connected with PR pgjdbc#1584 .
Now 'socketTimeout' property is a global query timeout, so we can't set it too low.
But on network split, it is possible that creation of a new connection
will stick on reading response from one of candidate hosts (like in PR above)
for a long time (our big socketTimeout).
The simplest solution is to decrease socket timeout during connection phase, and then restore it.
anton0xf pushed a commit to anton0xf/pgjdbc that referenced this issue Mar 11, 2020
This is connected with PR pgjdbc#1584 .
Now 'socketTimeout' property is a global query timeout, so we can't set it too low.
But on network split, it is possible that creation of a new connection
will stick on reading response from one of candidate hosts (like in PR above)
for a long time (our big socketTimeout).
The simplest solution is to decrease socket timeout during connection phase, and then restore it.
anton0xf pushed a commit to anton0xf/pgjdbc that referenced this issue Mar 16, 2020
This change is connected with PR pgjdbc#1584 .
Now 'socketTimeout' property is a global query timeout, so we can't set it too low.
But on network split, it is possible that creation of a new connection
will stick on reading response from one of candidate hosts (like in PR above)
for a long time (our big socketTimeout).
The simplest solution is to decrease socket timeout during connection phase, and then restore it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants