Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: fix 1ms per async CopyAPI (regression since 42.2.5) #1314

Merged
merged 1 commit into from Oct 25, 2018

Conversation

@vlsi
Copy link
Member

@vlsi vlsi commented Oct 18, 2018

Let's see what breaks

@vlsi vlsi force-pushed the fix_nonblocking_copy branch 6 times, most recently from 0586a6e to ea0ac40 Oct 19, 2018
@davecramer
Copy link
Member

@davecramer davecramer commented Oct 19, 2018

@vlsi the reason that replication is failing now is that SslMode.of https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/SslMode.java#L63-L71

is returning Prefer and setting up an SSL connection which your network timeout code attempted to fix.

Thoughts on how to fix this ?

@vlsi
Copy link
Member Author

@vlsi vlsi commented Oct 19, 2018

Well, the idea of prefer is to use encrypted connection when possible.

In other words, even if we flip the switch and stick with non-encrypted by default (why would we?), the clients that use SSL would suffer.

I have not reviewed "replication" tests, however the tests might suffer from the similar problem: async read might always immediately return, so replication stream should likely use timed-polling.

@davecramer
Copy link
Member

@davecramer davecramer commented Oct 19, 2018

I'm not suggesting flipping the switch.

But in the test case I did not ask for an ssl replication connection, but received one anyway.
replication protocol uses the copy protocol so yes, it suffers from the same problem. The question is how to solve this without the performance regression in Copy ?

I expect we need a background thread reading ..

@vlsi
Copy link
Member Author

@vlsi vlsi commented Oct 19, 2018

  1. Replication stream sends periodic heartbeats to the server:


    and we might force boolean replyRequired=true when readPending is called.
    That would force server to generate some traffic, and it would eventually push out relevant messages that are sitting in the queues.

  2. We could augment org.postgresql.replication.LogicalReplicationTest#receiveMessageWithoutBlock to prevent "non-blocking" mode. For instance it could do a true .pick() call once per 1 second or something like that. We might want to add that once per 1second (or once per 10sec) check to the PGStream.available.

  3. It might be relevant to the way we configure SSL.

@davecramer
Copy link
Member

@davecramer davecramer commented Oct 19, 2018

regarding 3: no, it's the implementation.. https://bugs.openjdk.java.net/browse/JDK-7134707
One thought it to use your code only if the wrapped connection is an SSLConnection ?

@vlsi vlsi force-pushed the fix_nonblocking_copy branch 2 times, most recently from 810f446 to 2e0db8a Oct 21, 2018
PGStream#hasMessagePending should not perform network requests otherwise it causes per row delay in async copy

fixes pgjdbc#1312
@vlsi vlsi force-pushed the fix_nonblocking_copy branch from 2e0db8a to e6228a5 Oct 24, 2018
@codecov-io
Copy link

@codecov-io codecov-io commented Oct 24, 2018

Codecov Report

Merging #1314 into master will increase coverage by 0.04%.
The diff coverage is 88.88%.

@@             Coverage Diff              @@
##             master    #1314      +/-   ##
============================================
+ Coverage     68.54%   68.58%   +0.04%     
- Complexity     3880     3883       +3     
============================================
  Files           178      178              
  Lines         16199    16205       +6     
  Branches       2645     2646       +1     
============================================
+ Hits          11104    11115      +11     
+ Misses         3863     3859       -4     
+ Partials       1232     1231       -1

@vlsi
Copy link
Member Author

@vlsi vlsi commented Oct 24, 2018

@davecramer , this passes the tests at a cost of:

  1. Once per second it performs blocking 1ms call for regular copy ops
  2. replication "non-blocking" calls always perform "1ms reads" behind the scene

Any thoughts?

@vlsi vlsi changed the title Fix nonblocking copy perf: fix 1ms per async CopyAPI (regression since 42.2.5) Oct 24, 2018
@vlsi vlsi added this to the 42.2.6 milestone Oct 24, 2018
@vlsi vlsi merged commit e2623d6 into pgjdbc:master Oct 25, 2018
2 checks passed
davecramer added a commit to davecramer/pgjdbc that referenced this issue Jul 5, 2021
The problem is SSL sockets use buffers for input, and `stream#available()` might easily return 0 even though the data is available in the downstream socket. The only viable WA is to perform `read` calls, however it might impact performance (e.g. for async copy operations). The resolution is to throttle read calls for copy api, and keep 1ms reads for replication api.

fixes pgjdbc#1312
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants