Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf 8 encoding optimizations #1444

Merged
merged 15 commits into from Jan 16, 2020
Merged

Utf 8 encoding optimizations #1444

merged 15 commits into from Jan 16, 2020

Conversation

bokken
Copy link
Member

@bokken bokken commented Mar 16, 2019

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Does mvn checkstyle:check pass ?

Changes to Existing Features:

  • Does this break existing behaviour? If so please explain.
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully run tests with your changes locally?

Addresses #1431 by limiting the size of ‘char[]’ cached for reuse.
Also provides different implementation for java 9+ where ‘String’ implementation changes to be backed by ‘byte[]’.

sehrope and others added 4 commits Mar 13, 2019
…compatibility test

Adds a two parameter constructor to Encoding to allow sub classes to specify whether their
known ASCII compatability so as to skip testing. The only usage of it is the UTF8Encoding
which is changed to use the new constructor.
Also optimize for java 9+ byte[] backed strings
@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Mar 16, 2019

@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Mar 16, 2019

@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Mar 16, 2019

@codecov-io
Copy link

@codecov-io codecov-io commented Mar 16, 2019

Codecov Report

Merging #1444 into master will increase coverage by 0.1%.
The diff coverage is 74.6%.

@@             Coverage Diff             @@
##             master    #1444     +/-   ##
===========================================
+ Coverage     68.88%   68.99%   +0.1%     
- Complexity     3949     4154    +205     
===========================================
  Files           181      186      +5     
  Lines         16485    17122    +637     
  Branches       2679     2810    +131     
===========================================
+ Hits          11356    11813    +457     
- Misses         3885     4027    +142     
- Partials       1244     1282     +38

bokken added 2 commits Mar 18, 2019
use existing JavaVersion enum to pick implementation
add unit test string values
more consistency between byte and char based implementations
fix backwards comparison
add more strings to unit test
@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Mar 18, 2019

@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Mar 18, 2019

# Conflicts:
#	pgjdbc/src/main/java/org/postgresql/core/Encoding.java
#	pgjdbc/src/main/java/org/postgresql/core/OptimizedUTF8Encoder.java
@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Apr 16, 2019

Move to using new String(byte[], int, int, Charset) rather than custom
decoding for jre newer than 1.8.
@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Apr 25, 2019

Back to custom utf-8 decoding for performance gains while validating
@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Apr 25, 2019

@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Apr 26, 2019

@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Nov 25, 2019

@abhinav2600
Copy link

@abhinav2600 abhinav2600 commented Jan 16, 2020

Is this the right fix for this issue and is it going to be merged?

@davecramer
Copy link
Member

@davecramer davecramer commented Jan 16, 2020

@bokken can you fix the checkstyle complaint?
Are you still OK with this PR ?

if (size <= thresholdSize) {
decoderArray = new char[size];
Copy link
Member

@davecramer davecramer Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@bokken
Copy link
Member Author

@bokken bokken commented Jan 16, 2020

It appears we have a somewhat flaky test that is failing.

It also appears that our use of maven tool chains means that our tests are not testing what we think they are testing:

$ java -Xmx32m -version
openjdk version "15-ea" 2020-09-15
OpenJDK Runtime Environment (build 15-ea+5-83)
OpenJDK 64-Bit Server VM (build 15-ea+5-83, mixed mode, sharing)
$ javac -J-Xmx32m -version
javac 15-ea
...
[INFO] --- maven-toolchains-plugin:1.1:toolchain (default) @ postgresql ---
[INFO] Required toolchain: jdk [ version='1.8' ]
[INFO] Found matching toolchain for type jdk: JDK[/usr/lib/jvm/java-8-oracle]
...
java.version = 1.8.0_151, JavaVersion.getRuntimeVersion() = v1_8
[testFastCloses] total counts for each sql state: {0=617, 55000=379, 57014=4}

https://travis-ci.org/pgjdbc/pgjdbc/jobs/637954721?utm_medium=notification&utm_source=github_status

@davecramer
Copy link
Member

@davecramer davecramer commented Jan 16, 2020

I knew about the flaky test. Was not aware of the toolchain issue.

@bokken
Copy link
Member Author

@bokken bokken commented Jan 16, 2020

It looks like a lot of the tests are actually executing with jdk 8.
Here is output from the oraclejdk9 test:

Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z)
Maven home: /usr/local/maven-3.5.2
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-oracle/jre

Strangely, another test, also targeting oraclejdk9, does not have an issue:

Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T03:58:13-04:00)
Maven home: /usr/local/maven-3.5.2
Java version: 9.0.1, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-9-oracle

One thing I observed as being different here is the one which actually uses jdk9 has the build property JDK=9. There other does not.

@davecramer
Copy link
Member

@davecramer davecramer commented Jan 16, 2020

It looks like a lot of the tests are actually executing with jdk 8.
Here is output from the oraclejdk9 test:

Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z)
Maven home: /usr/local/maven-3.5.2
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-oracle/jre

Strangely, another test, also targeting oraclejdk9, does not have an issue:

Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T03:58:13-04:00)
Maven home: /usr/local/maven-3.5.2
Java version: 9.0.1, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-9-oracle

One thing I observed as being different here is the one which actually uses jdk9 has the build property JDK=9. There other does not.

OK, lets flag this as another issue to fix the tests instead of getting bogged down here.

@bokken
Copy link
Member Author

@bokken bokken commented Jan 16, 2020

Looks like travis was successful. I think this is good to go @davecramer.

@davecramer davecramer merged commit c84e62e into pgjdbc:master Jan 16, 2020
3 checks passed
@fdrozdowski
Copy link

@fdrozdowski fdrozdowski commented Jan 29, 2020

Is there a plan to incorporate this pull request into the next release? I'm assuming the next one would be 42.3.0 (I found the release board here: https://github.com/pgjdbc/pgjdbc/projects/2)?

@davecramer
Copy link
Member

@davecramer davecramer commented Jan 29, 2020

well it's merged so yes, but not sure why it would be 42.3.0. this is not a breaking change

@fdrozdowski
Copy link

@fdrozdowski fdrozdowski commented Jan 30, 2020

@davecramer Sounds good. Is there a tentative timeline for the next patch release?

@davecramer
Copy link
Member

@davecramer davecramer commented Jan 30, 2020

it's currently at the top of my queue

davecramer added a commit to davecramer/pgjdbc that referenced this issue Jul 5, 2021
* fix: Correct typo weather to whether

* misc: Change internal Encoding.testAsciiNumbers(...) to be static

* perf: Enhance Encoding constructor to allow skipping of ASCII number compatibility test

Adds a two parameter constructor to Encoding to allow sub classes to specify whether their
known ASCII compatability so as to skip testing. The only usage of it is the UTF8Encoding
which is changed to use the new constructor.

* fix: limit size of char[] for utf-8 decoding

Also optimize for java 9+ byte[] backed strings

* fix: limit size of char[] for utf-8 decoding

address style issues

* fix: limit size of char[] for utf-8 decoding

address style issues

* fix: limit size of char[] for utf-8 decoding

use existing JavaVersion enum to pick implementation
add unit test string values
more consistency between byte and char based implementations

* fix: limit size of char[] for utf-8 decoding

fix backwards comparison
add more strings to unit test

* fix: limit size of char[] for utf-8 decoding

Move to using new String(byte[], int, int, Charset) rather than custom
decoding for jre newer than 1.8.

* fix: limit size of char[] for utf-8 decoding

Back to custom utf-8 decoding for performance gains while validating

* javadoc

* put test back into test suite

* avoid creating an unnecessary `char[]` when growing cached array

Co-authored-by: Sehrope Sarkuni <sehrope@jackdb.com>
Co-authored-by: Dave Cramer <davecramer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants