New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: short circuit Oid.BOOL in getBoolean #745

Merged
merged 1 commit into from Feb 12, 2017

Conversation

Projects
None yet
5 participants
@jorsol
Contributor

jorsol commented Feb 7, 2017

The BE sends "t" or "f" for Oid.BOOL so there is no need to fall in getString which execute again checkResultSet(), decode and trim the returned String.

All of this can be avoided and get a performance benefit.

@davecramer

I'd think putting the col=columnIndex in the if would avoid creating the col object ?

@codecov-io

This comment has been minimized.

codecov-io commented Feb 7, 2017

Codecov Report

Merging #745 into master will decrease coverage by -0.01%.

@@             Coverage Diff              @@
##             master     #745      +/-   ##
============================================
- Coverage     64.42%   64.42%   -0.01%     
- Complexity     3469     3470       +1     
============================================
  Files           164      164              
  Lines         15147    15150       +3     
  Branches       2501     2503       +2     
============================================
+ Hits           9759     9760       +1     
  Misses         4156     4156              
- Partials       1232     1234       +2

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63ed212...0393ade. Read the comment docs.

@@ -1945,8 +1945,12 @@ public boolean getBoolean(int columnIndex) throws SQLException {
return false; // SQL NULL
}
int col = columnIndex - 1;
if (Oid.BOOL == fields[col].getOID()) {
return 't' == (char) (this_row[col][0] & 0xFF);

This comment has been minimized.

@vlsi

vlsi Feb 8, 2017

Member

Does this buy much?
Can a test be added to cover that?
Is this resilient against ArrayIndexOutOfBoundsException?

This comment has been minimized.

@panchenko

panchenko Feb 8, 2017

Index bounds are checked first in this function.
Should be faster, as it does not allocate new objects.
Just for my understanding: is it necessary to check TEXT_FORMAT/BINARY_FORMAT?

This comment has been minimized.

@vlsi

vlsi Feb 8, 2017

Member

I mean [0] bounds check. I do not see it checked

This comment has been minimized.

@panchenko

panchenko Feb 8, 2017

so, something like:

final byte[] value = this_row[col];
return value.length == 1 && value[0] == (byte) 't';
@jorsol

This comment has been minimized.

Contributor

jorsol commented Feb 8, 2017

Does this buy much?

Preliminary test (added to the PR) with the length check as suggested:
Before:

Benchmark                  (rowsize)  Mode  Cnt        Score       Error  Units
ProcessBoolean.getBoolean          5  avgt   10      690.743 ±    21.417  ns/op
ProcessBoolean.getBoolean         10  avgt   10     1623.945 ±    50.107  ns/op
ProcessBoolean.getBoolean         50  avgt   10     9092.337 ±   149.268  ns/op
ProcessBoolean.getBoolean        100  avgt   10    17716.010 ±   913.568  ns/op
ProcessBoolean.getBoolean      10000  avgt   10  1788860.346 ± 14054.267  ns/op

After:

Benchmark                  (rowsize)  Mode  Cnt      Score     Error  Units
ProcessBoolean.getBoolean          5  avgt   10     12.155 ±   0.041  ns/op
ProcessBoolean.getBoolean         10  avgt   10     20.835 ±   0.096  ns/op
ProcessBoolean.getBoolean         50  avgt   10    103.061 ±   2.483  ns/op
ProcessBoolean.getBoolean        100  avgt   10    194.745 ±   0.969  ns/op
ProcessBoolean.getBoolean      10000  avgt   10  21071.552 ± 170.534  ns/op

That is about ~95% performance increase.

@davecramer

This comment has been minimized.

Member

davecramer commented Feb 8, 2017

@jorsol

This comment has been minimized.

Contributor

jorsol commented Feb 8, 2017

@davecramer it's nanoseconds

int col = columnIndex - 1;
if (Oid.BOOL == fields[col].getOID()) {
final byte[] value = this_row[col];
return (value.length == 1) && ('t' == (char) (value[0] & 0xFF));

This comment has been minimized.

@panchenko

panchenko Feb 9, 2017

is this "byte to char" conversion necessary?
(byte) 't' looks shorter and overall protocol is defined in bytes, not chars.

This comment has been minimized.

@jorsol

jorsol Feb 9, 2017

Contributor

"byte to char", "char to byte"? It's the same.

This comment has been minimized.

@panchenko

panchenko Feb 9, 2017

not quite the same :-) but close:

  • 't' == (char) (value[0] & 0xFF)
  • (byte) 't' == value[0]

This comment has been minimized.

@jorsol

jorsol Feb 9, 2017

Contributor

Ok, let's squeeze one nanosecond more and make it shorter :-)
116 == v[0]

public void getBoolean() throws SQLException {
rs.first();
while (rs.next()) {
rs.getBoolean(1);

This comment has been minimized.

@vlsi

vlsi Feb 9, 2017

Member

Technically speaking this code is prone to dead code elimination benchmark flaw.

Is this benchmark class any different from existing https://github.com/pgjdbc/pgjdbc/blob/master/ubenchmark/src/main/java/org/postgresql/benchmark/statement/ProcessResultSet.java#L49 ?

Could you please add boolean there?

This comment has been minimized.

@jorsol

jorsol Feb 9, 2017

Contributor

Let me check that class, I did not want to mess with ProcessResultSet but if you think that would be a better choice I will try to use it.

This comment has been minimized.

@jorsol

jorsol Feb 9, 2017

Contributor

😕 that test (ProcessResultSet) hammer the db executing the query every single time, it's not a realistic way to measure the java method performance since there is a lot overhead from the db. Anyway, I will add the BOOL to that test and add a "Blackhole" to ProcessBoolean to avoid the dead code elimination.

@vlsi

This comment has been minimized.

Member

vlsi commented Feb 9, 2017

I've tested the PR a bit. I was able to activate logging via -DloggerLevel=DEBUG (loggerFile was inherited).

However I do not like log format much.
Here's what I get (that was from IntelliJ, so /*$mvn.project.property.parsedversion.osg... was not resolved):

фев 10, 2017 12:21:50 AM org.postgresql.Driver connect
FINE: Connecting with URL: jdbc:postgresql://localhost:5432/postgres?ApplicationName=Driver Tests&loggerLevel=DEBUG&loggerFile=target/pgjdbc-tests.log
фев 10, 2017 12:21:50 AM org.postgresql.jdbc.PgConnection <init>
FINE: PostgreSQL JDBC Driver /*$mvn.project.property.parsedversion.osgiversion$*/
фев 10, 2017 12:21:50 AM org.postgresql.jdbc.PgConnection setDefaultFetchSize
FINE:   setDefaultFetchSize = 0
фев 10, 2017 12:21:50 AM org.postgresql.jdbc.PgConnection setPrepareThreshold
FINE:   setPrepareThreshold = 5
фев 10, 2017 12:21:50 AM org.postgresql.core.v3.ConnectionFactoryImpl openConnectionImpl
FINE: Trying to establish a protocol version 3 connection to localhost:5432
фев 10, 2017 12:21:50 AM org.postgresql.core.v3.ConnectionFactoryImpl openConnectionImpl
FINE: Receive Buffer Size is 408 300
  1. two-line log records look odd to me. Baiscally the first line is date+location, and the second line is the message. I would prefer to have single log lines. Any thoughts here?
  2. "automatic" class and method names (see the line with timestamps). Of course they are not free, are they?

I've did a bit of google and a bit of benchmarks. Here's what I've got:

https://bugs.openjdk.java.net/browse/JDK-6511515: poor performance of LogRecord.inferCaller depending on java.lang.Throwable.getStackTraceElement

package org.postgresql.benchmark.connection;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.profile.GCProfiler;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.TimeUnit;
import java.util.logging.Level;
import java.util.logging.LogRecord;
import java.util.logging.SimpleFormatter;

@Fork(1)
@Measurement(iterations = 5)
@Warmup(iterations = 5)
@State(Scope.Thread)
@Threads(1)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class LoggingBenchmark {

  String msg = "test";
  SimpleFormatter formatter = new SimpleFormatter();

  @Benchmark
  public LogRecord baseline() {
    return new LogRecord(Level.FINE, "test");
  }

  @Benchmark
  public LogRecord locationInfo(Blackhole bh) {
    LogRecord log = new LogRecord(Level.FINE, "test");
    bh.consume(log.getSourceClassName());
    bh.consume(log.getSourceMethodName());
    return log;
  }

  @Benchmark
  public String format(Blackhole bh) {
    LogRecord log = new LogRecord(Level.FINE, "test");
    return formatter.format(log);
  }


  public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
        .include(LoggingBenchmark.class.getSimpleName())
        .addProfiler(GCProfiler.class)
        .detectJvmArgs()
        .build();

    new Runner(opt).run();
  }

}

Results:

Benchmark                                    Mode  Cnt      Score      Error   Units
LoggingBenchmark.baseline                    avgt    5      0,084 ±    0,003   us/op
LoggingBenchmark.baseline:·gc.alloc.rate     avgt    5    818,191 ±   30,430  MB/sec
LoggingBenchmark.locationInfo                avgt    5     21,710 ±    3,459   us/op
LoggingBenchmark.locationInfo:·gc.alloc.rate avgt    5     47,141 ±    7,540  MB/sec
LoggingBenchmark.format                      avgt    5     36,515 ±    5,391   us/op
LoggingBenchmark.format:·gc.alloc.rate       avgt    5    338,382 ±   48,791  MB/sec

What it means is it takes 36 microseconds to format a single log line with a simple "test" message.
The main time consumer is indeed location calculation that takes like 22us.

Just in case: simple JDBC query (prepare+execute+fetch) over localhost connection takes comparable time.

The formatting is performed under synchronization (SimpleFormatter#format is synchronized), so I don't even bother to benchmark how it would break in multi-threaded environment.

I'm not saying this logging facility should always be enabled in production at DEBUG/TRACE levels, however this level of overhead might bite us (or pgjdbc users :) ).

Any thoughts if we should cleanup log format before merging the PR in?

@davecramer

This comment has been minimized.

Member

davecramer commented Feb 9, 2017

@jorsol

This comment has been minimized.

Contributor

jorsol commented Feb 10, 2017

More benchmark
Before:

Benchmark                  (rowsize)  (tokens)  Mode  Cnt        Score       Error  Units
ProcessBoolean.getBoolean          5         0  avgt   10      716.504 ±     8.007  ns/op
ProcessBoolean.getBoolean          5        10  avgt   10      668.592 ±     5.034  ns/op
ProcessBoolean.getBoolean          5        50  avgt   10      765.153 ±     8.295  ns/op
ProcessBoolean.getBoolean          5       100  avgt   10      859.289 ±     8.129  ns/op
ProcessBoolean.getBoolean         10         0  avgt   10     1567.071 ±     8.863  ns/op
ProcessBoolean.getBoolean         10        10  avgt   10     1582.278 ±    13.387  ns/op
ProcessBoolean.getBoolean         10        50  avgt   10     1704.858 ±    78.937  ns/op
ProcessBoolean.getBoolean         10       100  avgt   10     1825.133 ±    64.091  ns/op
ProcessBoolean.getBoolean         50         0  avgt   10     8884.121 ±   302.890  ns/op
ProcessBoolean.getBoolean         50        10  avgt   10     8764.162 ±   308.606  ns/op
ProcessBoolean.getBoolean         50        50  avgt   10     8781.140 ±   250.280  ns/op
ProcessBoolean.getBoolean         50       100  avgt   10     9031.006 ±   300.511  ns/op
ProcessBoolean.getBoolean        100         0  avgt   10    18512.088 ±   941.861  ns/op
ProcessBoolean.getBoolean        100        10  avgt   10    17759.530 ±   502.616  ns/op
ProcessBoolean.getBoolean        100        50  avgt   10    17987.266 ±   518.799  ns/op
ProcessBoolean.getBoolean        100       100  avgt   10    18534.797 ±   518.986  ns/op
ProcessBoolean.getBoolean      10000         0  avgt   10  1762470.368 ± 33905.257  ns/op
ProcessBoolean.getBoolean      10000        10  avgt   10  1806601.186 ± 99918.977  ns/op
ProcessBoolean.getBoolean      10000        50  avgt   10  1807746.782 ± 34044.221  ns/op
ProcessBoolean.getBoolean      10000       100  avgt   10  1763167.027 ± 27145.896  ns/op

After:

Benchmark                  (rowsize)  (tokens)  Mode  Cnt      Score      Error  Units
ProcessBoolean.getBoolean          5         0  avgt   10     33.610 ±    0.191  ns/op
ProcessBoolean.getBoolean          5        10  avgt   10     41.735 ±    0.075  ns/op
ProcessBoolean.getBoolean          5        50  avgt   10    116.491 ±    1.377  ns/op
ProcessBoolean.getBoolean          5       100  avgt   10    208.021 ±    1.434  ns/op
ProcessBoolean.getBoolean         10         0  avgt   10     72.764 ±    3.320  ns/op
ProcessBoolean.getBoolean         10        10  avgt   10     79.348 ±    0.609  ns/op
ProcessBoolean.getBoolean         10        50  avgt   10    153.837 ±    0.534  ns/op
ProcessBoolean.getBoolean         10       100  avgt   10    247.109 ±    3.211  ns/op
ProcessBoolean.getBoolean         50         0  avgt   10    360.975 ±    4.420  ns/op
ProcessBoolean.getBoolean         50        10  avgt   10    366.895 ±    2.675  ns/op
ProcessBoolean.getBoolean         50        50  avgt   10    437.836 ±    0.907  ns/op
ProcessBoolean.getBoolean         50       100  avgt   10    529.986 ±    0.840  ns/op
ProcessBoolean.getBoolean        100         0  avgt   10    708.309 ±    2.597  ns/op
ProcessBoolean.getBoolean        100        10  avgt   10    735.806 ±   18.164  ns/op
ProcessBoolean.getBoolean        100        50  avgt   10    797.098 ±   10.979  ns/op
ProcessBoolean.getBoolean        100       100  avgt   10    892.736 ±   18.678  ns/op
ProcessBoolean.getBoolean      10000         0  avgt   10  73050.736 ± 2890.068  ns/op
ProcessBoolean.getBoolean      10000        10  avgt   10  73380.023 ± 2193.849  ns/op
ProcessBoolean.getBoolean      10000        50  avgt   10  74308.542 ± 3546.948  ns/op
ProcessBoolean.getBoolean      10000       100  avgt   10  73160.013 ± 2428.678  ns/op
perf: short circuit Oid.BOOL in getBoolean
The BE sends "t" of "f" for Oid.BOOL so there is no need to fall in getString
which execute again checkResultSet(), decode the byte and trim the returned String.

All of this can be avoided and get a performance benefit.

@jorsol jorsol force-pushed the jorsol:short-circuit-bool branch from c2a7c9c to 0393ade Feb 10, 2017

@vlsi vlsi added this to the 42.0.0 milestone Feb 12, 2017

@vlsi vlsi merged commit e69e4a1 into pgjdbc:master Feb 12, 2017

2 checks passed

codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +10.57% compared to 63ed212
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

davecramer added a commit to davecramer/pgjdbc that referenced this pull request Sep 19, 2017

perf: short circuit Oid.BOOL in getBoolean (pgjdbc#745)
The BE sends "t" of "f" for Oid.BOOL so there is no need to fall in getString
which execute again checkResultSet(), decode the byte and trim the returned String.

All of this can be avoided and get a performance benefit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment