Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup encoding #1441

Merged
merged 3 commits into from Apr 9, 2019

Conversation

Projects
None yet
6 participants
@sehrope
Copy link
Contributor

commented Mar 14, 2019

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Does mvn checkstyle:check pass ?

Changes to Existing Features:

  • Does this break existing behaviour? If so please explain.
  • No it's only internals that are changing.
  • Have you added an explanation of what your changes do and why you'd like us to include them?

  • Have you written new tests for your core changes, as applicable?

  • Have you successfully run tests with your changes locally?


Was taking a look at the internals of the string decoder to see what would be involved in supporting a "Use system decoder if longer than X... (as a possible alternative to #1433) and noticed a couple minor things to clean up.

PR is split into three commits:

  1. Typo correction for s/weather/whether/
  2. Change the internal test function to validate if a given encoding is "ASCII Number Compatible" to be static.
  3. Add a new protected (so internal only) constructor to Encoding that allows callers to specify if they're ASCII compatible. This allows the UTF8Encoding class to skip the ASCII number compatibility test on every construction as we know it'll always pass. I doubt it's a measurable difference but why not right?

That third commit also removes the encoding parameter to UTF8Encoding's constructor as the only valid value is "UTF-8" anyway. The only invocation of the constructor in Encoding is with "UTF-8" as a parameter and anything else wouldn't make any sense.

I've got a separate PR built atop this one that splits out Encoding further to make it an interface rather than a class and moves the majority of it's logic to a new EncodingCore internal class. All of that is in the hopes of simplifying the injection of connection level properties to customize the used encoding. If that idea pans out I'll push a separate PR for it.

@AppVeyorBot

This comment has been minimized.

Copy link

commented Mar 14, 2019

@sehrope

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2019

FYI, this PR does build and test successfully. The only Travis failure is an unrelated issue against PG HEAD and fixed by #1442.

@@ -122,7 +134,7 @@ public boolean hasAsciiNumbers() {
*/
public static Encoding getJVMEncoding(String jvmEncoding) {
if ("UTF-8".equals(jvmEncoding)) {

This comment has been minimized.

Copy link
@bokken

bokken Mar 15, 2019

Member

should this ignore case, or do we know it will be all caps?

This comment has been minimized.

Copy link
@sehrope

sehrope Mar 15, 2019

Author Contributor

The existing code checks for explicit equals so haven't touched any of it in this PR.

I tested that out locally and every combination of upper, lower, or mixed case, with or without a dash worked. Maybe a better approach would be to defer to the JDK to determine if the charset is UTF8 via comparing the canonical name:

if ("UTF-8".equals(Charset.forName(encoding).name())) {
   // We're UTF-8...
}

Could follow up with something for that in a separate PR. I don't think it'd break anything but it is a change as it'd expand the encoding considered UTF-8 (likely for the better).

This comment has been minimized.

Copy link
@bokken

bokken Mar 15, 2019

Member

@sehrope, do you want to coordinate on proposed changes?

This comment has been minimized.

Copy link
@sehrope

sehrope Mar 15, 2019

Author Contributor

Sure but let's keep it separate from this PR as it's a change in behavior. I'll follow up later with a PR for it.

@bokken

bokken approved these changes Mar 15, 2019

sehrope added some commits Mar 13, 2019

perf: Enhance Encoding constructor to allow skipping of ASCII number …
…compatibility test

Adds a two parameter constructor to Encoding to allow sub classes to specify whether their
known ASCII compatability so as to skip testing. The only usage of it is the UTF8Encoding
which is changed to use the new constructor.

@sehrope sehrope force-pushed the sehrope:cleanup-encoding branch from 2ce66ad to 5263bfd Mar 16, 2019

@sehrope

This comment has been minimized.

Copy link
Contributor Author

commented Mar 16, 2019

Rebased ... tests should all work now.

@AppVeyorBot

This comment has been minimized.

Copy link

commented Mar 16, 2019

@codecov-io

This comment has been minimized.

Copy link

commented Mar 16, 2019

Codecov Report

Merging #1441 into master will increase coverage by 0.12%.
The diff coverage is 100%.

@@             Coverage Diff              @@
##             master    #1441      +/-   ##
============================================
+ Coverage     68.63%   68.75%   +0.12%     
- Complexity     3896     3903       +7     
============================================
  Files           179      179              
  Lines         16414    16414              
  Branches       2672     2672              
============================================
+ Hits          11266    11286      +20     
+ Misses         3895     3880      -15     
+ Partials       1253     1248       -5
@adeopujariarkin
Copy link

left a comment

@sehrope
How does this fix the issue of UTF8Encoding.decoderArray growing very large and not shrinking back?

@sehrope

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

@adeopujariarkin It does not fix that issue. This is some general cleanup and optimization I put together while looking into options for the large string decoding. I put it up as a separate PR because there was not consensus yet on how to address the large string decoding and the changes in this PR are generic improvements that need not wait for that.

@davecramer davecramer merged commit 73ec817 into pgjdbc:master Apr 9, 2019

3 checks passed

codecov/project 68.75% (+0.12%) compared to 42d6bfa
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.