Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection hang during initialization / enableSSL part deux #2683

Closed
jamey-clari opened this issue Nov 29, 2022 · 4 comments
Closed

Connection hang during initialization / enableSSL part deux #2683

jamey-clari opened this issue Nov 29, 2022 · 4 comments

Comments

@jamey-clari
Copy link

jamey-clari commented Nov 29, 2022

Please read https://stackoverflow.com/help/minimal-reproducible-example

Describe the issue
Earlier this year I filed #2472 which was kindly addressed by @davecramer. Unfortunately it appears that there is still a problem in this area of the code as we still see the issue in our application. I believe the timeout that @davecramer added is probably working BUT the way the VisibleBufferedInputStream.readMore method is working when called from ConnectionFactoryImpl.enableSSL line 553 (int beresp = pgSTream.receiveChar()) it is entering a never-ending loop on reading the socket stream. The current fix calls pgStream.getSocket().setSoTimeout(sslTimeout) prior to attempting to read the control character however, unlike calling PgStream.setNetworkTimeout(int), the setTimeoutRequested flag is never set on the VisibleBufferedInputStream. Because of this, when the SocketTimeoutException occurs on the wrapped.read call in VisibileBufferedInputStream.readMore (line 161), no action is taken in the catch block. Both block and timeoutRequested are false so the code continues out of the catch block and returns true after a no-op update to endIndex. This causes it to remain in the while loop in VisibleBufferedInputStream.ensureBytes (line 127) retrying. Such is my theory anyway as the alternative is that calling setSoTimeout on the socket isn't having any effect which would be pretty odd.

Thanks for your consideration!

Driver Version?
42.4.2

Java Version?
openjdk version "1.8.0_312"
OpenJDK Runtime Environment Corretto-8.312.07.1 (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM Corretto-8.312.07.1 (build 25.312-b07, mixed mode)

OS Version?
Ubuntu 20.04.3 LTS

PostgreSQL Version?
AWS Aurora 12.11

To Reproduce
Steps to reproduce the behaviour:
Challenging, as described in #2472 - intermittently occurs when master node of cluster changes and our connection pools reset.

Expected behaviour
A clear and concise description of what you expected to happen.
And what actually happens

We expect the SSL negotiation to timeout after 5 seconds (by default) and the attempt to connect to throw after the timeout is encountered. Currently the attempt to connect blocks indefinitely and does not return.

Logs
If possible PostgreSQL logs surrounding the occurrence of the issue
Additionally logs from the driver can be obtained adding

Using the following template code make sure the bug can be replicated in the driver alone.


@jamey-clari
Copy link
Author

I believe the following would be a sufficient fix for this issue:

diff --git a/pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java b/pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java
index f6d6dbed..f320ba04 100644
--- a/pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java
+++ b/pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java
@@ -533,7 +533,7 @@ public class ConnectionFactoryImpl extends ConnectionFactory {
     LOGGER.log(Level.FINEST, " FE=> SSLRequest");
 
     int sslTimeout = PGProperty.SSL_RESPONSE_TIMEOUT.getInt(info);
-    int currentTimeout = pgStream.getSocket().getSoTimeout();
+    int currentTimeout = pgStream.getNetworkTimeout();
 
     // if the current timeout is less than sslTimeout then
     // use the smaller timeout. We could do something tricky
@@ -542,7 +542,7 @@ public class ConnectionFactoryImpl extends ConnectionFactory {
       sslTimeout = currentTimeout;
     }
 
-    pgStream.getSocket().setSoTimeout(sslTimeout);
+    pgStream.setNetworkTimeout(sslTimeout);
     // Send SSL request packet
     pgStream.sendInteger4(8);
     pgStream.sendInteger2(1234);
@@ -551,7 +551,7 @@ public class ConnectionFactoryImpl extends ConnectionFactory {
 
     // Now get the response from the backend, one of N, E, S.
     int beresp = pgStream.receiveChar();
-    pgStream.getSocket().setSoTimeout(currentTimeout);
+    pgStream.setNetworkTimeout(currentTimeout);
 
     switch (beresp) {
       case 'E':

@jamey-clari
Copy link
Author

I also think that perhaps it would be worth considering applying the same fix from #2472 and here to the ConnectionFactoryImpl.enableGSSEncrypted routine which has a similar pattern of socket read usage.

@davecramer
Copy link
Member

davecramer commented Nov 30, 2022

See #2687 for continuing work

davecramer added a commit to davecramer/pgjdbc that referenced this issue Nov 30, 2022
davecramer added a commit that referenced this issue Nov 30, 2022
* fix: Apply patch from Issue #2683 to fix hanging ssl connections

* add new configuration parameter gssResponseTimeout. This configuration parameter allows a different timeout to be used before requesting a GSS encrypted connection. Some small edits to docs
@davecramer
Copy link
Member

closed with #2687

benkard added a commit to benkard/mulkcms2 that referenced this issue Apr 2, 2023
This MR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [flow-bin](https://github.com/flowtype/flow-bin) ([changelog](https://github.com/facebook/flow/blob/master/Changelog.md)) | devDependencies | minor | [`^0.198.0` -> `^0.199.0`](https://renovatebot.com/diffs/npm/flow-bin/0.198.0/0.199.0) |
| [org.postgresql:postgresql](https://jdbc.postgresql.org) ([source](https://github.com/pgjdbc/pgjdbc)) | build | patch | `42.5.1` -> `42.5.2` |
| [io.quarkus:quarkus-maven-plugin](https://github.com/quarkusio/quarkus) | build | patch | `2.16.0.Final` -> `2.16.1.Final` |
| [io.quarkus:quarkus-universe-bom](https://github.com/quarkusio/quarkus-platform) | import | patch | `2.16.0.Final` -> `2.16.1.Final` |
| [org.apache.maven.plugins:maven-enforcer-plugin](https://maven.apache.org/enforcer/) | build | minor | `3.1.0` -> `3.2.1` |

---

### Release Notes

<details>
<summary>flowtype/flow-bin</summary>

### [`v0.199.0`](flow/flow-bin@0568b6e...05bb4e3)

[Compare Source](flow/flow-bin@0568b6e...05bb4e3)

### [`v0.198.2`](flow/flow-bin@0d01841...0568b6e)

[Compare Source](flow/flow-bin@0d01841...0568b6e)

### [`v0.198.1`](flow/flow-bin@2b180bb...0d01841)

[Compare Source](flow/flow-bin@2b180bb...0d01841)

</details>

<details>
<summary>pgjdbc/pgjdbc</summary>

### [`v42.5.2`](https://github.com/pgjdbc/pgjdbc/blob/HEAD/CHANGELOG.md#&#8203;4252-2023-01-31-143046--0500)

##### Changed

docs: specify that timeouts are in seconds and there is a maximum. Housekeeping on some tests fixes [#Issue 2671](pgjdbc/pgjdbc#2671) [MR #&#8203;2686](pgjdbc/pgjdbc#2686)
docs: clarify binaryTransfer and add it to README [MR# 2698](pgjdbc/pgjdbc#2698)
docs: Document the need to encode reserved characters in the connection URL [MR #&#8203;2700](pgjdbc/pgjdbc#2700)
feat: Define binary transfer for custom types dynamically/automatically fixes [Issue #&#8203;2554](pgjdbc/pgjdbc#2554) [MR #&#8203;2556](pgjdbc/pgjdbc#2556)

##### Added

fix: added gssResponseTimeout as part of [MR #&#8203;2687](pgjdbc/pgjdbc#2687) to make sure we don't wait forever on a GSS RESPONSE

##### Fixed

fix: Ensure case of XML tags in Maven snippet is correct [MR #&#8203;2682](pgjdbc/pgjdbc#2682)
fix: Make sure socket is closed if an exception is thrown in createSocket fixes [Issue #&#8203;2684](pgjdbc/pgjdbc#2684) [MR #&#8203;2685](pgjdbc/pgjdbc#2685)
fix: Apply patch from [Issue #&#8203;2683](pgjdbc/pgjdbc#2683) to fix hanging ssl connections [MR #&#8203;2687](pgjdbc/pgjdbc#2687)
fix - binary conversion of (very) long numeric values (longer than 4 \* 2^15 digits) [MR #&#8203;2697](pgjdbc/pgjdbc#2697) fixes [Issue #&#8203;2695](pgjdbc/pgjdbc#2695)
minor: enhance readability connection of startup params [MR #&#8203;2705](pgjdbc/pgjdbc#2785)

</details>

<details>
<summary>quarkusio/quarkus</summary>

### [`v2.16.1.Final`](https://github.com/quarkusio/quarkus/releases/tag/2.16.1.Final)

[Compare Source](quarkusio/quarkus@2.16.0.Final...2.16.1.Final)

##### Complete changelog

-   [#&#8203;30729](quarkusio/quarkus#30729) - Bump mariadb-java-client from 3.1.1 to 3.1.2
-   [#&#8203;30724](quarkusio/quarkus#30724) - Upgrade to Mutiny 1.9.0
-   [#&#8203;30722](quarkusio/quarkus#30722) - Set SameSite Strict only on OIDC session cookie
-   [#&#8203;30720](quarkusio/quarkus#30720) - Bump picocli.version from 4.7.0 to 4.7.1
-   [#&#8203;30719](quarkusio/quarkus#30719) - Bump jackson-bom from 2.14.1 to 2.14.2
-   [#&#8203;30715](quarkusio/quarkus#30715) - PanacheRepositoryResource should implement ReactiveRestDataResource
-   [#&#8203;30713](quarkusio/quarkus#30713) - Use MapProperty instead of Map
-   [#&#8203;30694](quarkusio/quarkus#30694) - Use newer API for creating tmp files in RESTEasy Reactive
-   [#&#8203;30692](quarkusio/quarkus#30692) - Bump htmlunit version to 2.70.0
-   [#&#8203;30686](quarkusio/quarkus#30686) - Don't fail send when a sse sink has been closed
-   [#&#8203;30681](quarkusio/quarkus#30681) - RESTEasy Reactive: SSE broadcaster fails if a sink has been closed
-   [#&#8203;30680](quarkusio/quarkus#30680) - Mark methods generatred by ASM transformations as synthetic
-   [#&#8203;30659](quarkusio/quarkus#30659) - Drop unused class GradleLogger
-   [#&#8203;30653](quarkusio/quarkus#30653) - Fix opening in IDE when more than IDE is running
-   [#&#8203;30652](quarkusio/quarkus#30652) - Match prometheus export metrics format
-   [#&#8203;30651](quarkusio/quarkus#30651) - ArC - use reflection fallback for PreDestroy callbacks if needed
-   [#&#8203;30649](quarkusio/quarkus#30649) - Document redirect options in RESTEasy Reactive
-   [#&#8203;30644](quarkusio/quarkus#30644) - Adjust source language absent in documentation code blocks
-   [#&#8203;30636](quarkusio/quarkus#30636) - PreDestroy hooks fail depending on method modifiers
-   [#&#8203;30635](quarkusio/quarkus#30635) - Introduce a `minimum-java-version` in the extension descriptor metadata
-   [#&#8203;30625](quarkusio/quarkus#30625) - OIDC authentication loop if Cookie Policy sameSite=strict
-   [#&#8203;30624](quarkusio/quarkus#30624) - Fix NPE obtaining a project map from Maven session
-   [#&#8203;30622](quarkusio/quarkus#30622) - Update invalid package in guide
-   [#&#8203;30612](quarkusio/quarkus#30612) - Fix import file name in redis-reference.adoc
-   [#&#8203;30609](quarkusio/quarkus#30609) - Qute generated resolvers - getters should take precedence over fields
-   [#&#8203;30593](quarkusio/quarkus#30593) - Qute validation - improve hierarchy indexing to fix assignability issues
-   [#&#8203;30591](quarkusio/quarkus#30591) - Resolve correct version when application version is unset
-   [#&#8203;30589](quarkusio/quarkus#30589) - Bump junit-bom from 5.9.1 to 5.9.2
-   [#&#8203;30585](quarkusio/quarkus#30585) - Bump Microsoft SQL Server JDBC driver to 11.2.3
-   [#&#8203;30584](quarkusio/quarkus#30584) - Update MS SQL JDBC driver to version 11.2.3
-   [#&#8203;30576](quarkusio/quarkus#30576) - Use accept header to choose metrics export format
-   [#&#8203;30574](quarkusio/quarkus#30574) - Handle empty source directory for included builds
-   [#&#8203;30569](quarkusio/quarkus#30569) - Add default implementation for REST Data interfaces
-   [#&#8203;30564](quarkusio/quarkus#30564) - Update security-openid-connect-client.adoc
-   [#&#8203;30559](quarkusio/quarkus#30559) - container-image extension running with kubernetes extension
-   [#&#8203;30557](quarkusio/quarkus#30557) - AWT: JniRuntimeAccess: freetypeScaler.c calls sun.font.FontUtilities
-   [#&#8203;30548](quarkusio/quarkus#30548) - Add a blurb about not supporting validation.xml in Quarkus
-   [#&#8203;30526](quarkusio/quarkus#30526) - RESTEasy classic servlets - add RoutingContext to active request context
-   [#&#8203;30515](quarkusio/quarkus#30515) - Native build fails with hibernate-orm-rest-data-panache + elytron-security-properties-file
-   [#&#8203;30513](quarkusio/quarkus#30513) - Limit application.properties lookup to main source set
-   [#&#8203;30510](quarkusio/quarkus#30510) - Simplify logic in create-app.adoc and allow to define stream
-   [#&#8203;30501](quarkusio/quarkus#30501) - Fix HibernateOrmCodestart
-   [#&#8203;30500](quarkusio/quarkus#30500) - Place extension with an unknown category in the uncategorized category
-   [#&#8203;30496](quarkusio/quarkus#30496) - Update documentation
-   [#&#8203;30490](quarkusio/quarkus#30490) - Avoid adding the exception itself as a suppressed exception
-   [#&#8203;30488](quarkusio/quarkus#30488) - Updates to Infinispan 14.0.6.Final
-   [#&#8203;30485](quarkusio/quarkus#30485) - Verify code flow access token first if no UserInfo precondition exists
-   [#&#8203;30479](quarkusio/quarkus#30479) - Define defaultValueDocumentation for builderImage
-   [#&#8203;30474](quarkusio/quarkus#30474) - Docs - default value of `quarkus.native.builder-image` is not shown
-   [#&#8203;30470](quarkusio/quarkus#30470) - Revert --enable-monitoring with no arguments support
-   [#&#8203;30460](quarkusio/quarkus#30460) - Bump kafka3.version from 3.3.1 to 3.3.2
-   [#&#8203;30453](quarkusio/quarkus#30453) - Gradle build failing w/ Quarkus 2.16.0
-   [#&#8203;30430](quarkusio/quarkus#30430) - Bump gizmo from 1.5.0.Final to 1.6.0.Final
-   [#&#8203;30429](quarkusio/quarkus#30429) - Bump Keycloak version to 20.0.3
-   [#&#8203;30426](quarkusio/quarkus#30426) - Fix redundant push when using buildx
-   [#&#8203;30424](quarkusio/quarkus#30424) - Building of container images with buildx causes build failures
-   [#&#8203;30423](quarkusio/quarkus#30423) - 2.15+ - Services dependent on libraries without classes no longer build
-   [#&#8203;30418](quarkusio/quarkus#30418) - Disable -D argument propagation in DevMojo
-   [#&#8203;30415](quarkusio/quarkus#30415) - Arc - Change Types#getTypeClosure so that superclasses and interfaces of producer types no longer throw on finding wildcards
-   [#&#8203;30412](quarkusio/quarkus#30412) - Arc - wildcard detection for producer methods/fields is too aggressive
-   [#&#8203;30410](quarkusio/quarkus#30410) - Introduce support for GraalVM `--enable-monitoring`
-   [#&#8203;30408](quarkusio/quarkus#30408) - Warning: Option 'AllowVMInspection' is deprecated and might be removed from future versions: Please use --enable-monitoring
-   [#&#8203;30405](quarkusio/quarkus#30405) - Quarkus Undertow doesn't work with blocking SecurityIdentityAugmentor
-   [#&#8203;30399](quarkusio/quarkus#30399) - Fix ElasticSearch Dev Services container restart
-   [#&#8203;30384](quarkusio/quarkus#30384) - Elasticsearch Dev Services restarts container on every auto-compile
-   [#&#8203;30368](quarkusio/quarkus#30368) - Allow Environment variables to populate property Maps in build time Config
-   [#&#8203;30354](quarkusio/quarkus#30354) - AWT `io.quarkus.awt.it.ImageGeometryFontsIT` native integration test failing with "GraalVM for Java 20" dev builds
-   [#&#8203;30347](quarkusio/quarkus#30347) - Bump junit-jupiter from 5.9.1 to 5.9.2
-   [#&#8203;30343](quarkusio/quarkus#30343) - Trailing comma is lost from prometheus metrics
-   [#&#8203;30335](quarkusio/quarkus#30335) - Add native compilation section to Hibernate Validator guide
-   [#&#8203;30332](quarkusio/quarkus#30332) - NPE in toString method for Processor Parameters in kafka-streams 3.3.1 version
-   [#&#8203;30275](quarkusio/quarkus#30275) - Inline Log category property doesn't work
-   [#&#8203;30208](quarkusio/quarkus#30208) - OIDC: 401 when access-token needs to be refreshed and user-info-required=true
-   [#&#8203;30179](quarkusio/quarkus#30179) - Add an owasp-check profile
-   [#&#8203;28781](quarkusio/quarkus#28781) - RESTEasy Reactive: document redirects
-   [#&#8203;24027](quarkusio/quarkus#24027) - Hibernate Validator does not use META-INF/validation.xml, it should work or be stated in the documentation.
-   [#&#8203;23002](quarkusio/quarkus#23002) - if more than two running IDE while launching 'x' gives error

</details>

<details>
<summary>quarkusio/quarkus-platform</summary>

### [`v2.16.1.Final`](quarkusio/quarkus-platform@2.16.0.Final...2.16.1.Final)

[Compare Source](quarkusio/quarkus-platform@2.16.0.Final...2.16.1.Final)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR is behind base branch, or you tick the rebase/retry checkbox.

👻 **Immortal**: This MR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4yNC4wIiwidXBkYXRlZEluVmVyIjoiMzQuMjQuMCJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants