makes escape processing strict according to JDBC spec #657
Conversation
before, escape code parsing was very lenient and invalid escape code formats were accidentally processed.
+1 for the idea |
|
||
@Override | ||
String replacementKeyword() { | ||
return replacementKeyword; |
vlsi
Oct 6, 2016
Member
Can this be factored out to enum constructor
?
That is just add char[] keyword
and String replacement
fields to SqlParseState
instance and avoid duplicating the logic in overridden methods.
Can this be factored out to enum constructor
?
That is just add char[] keyword
and String replacement
fields to SqlParseState
instance and avoid duplicating the logic in overridden methods.
seut
Oct 6, 2016
Author
Contributor
good point, will do
good point, will do
Please correct the benchmark. Current benchmark is likely to measure nothing. |
|
||
@Benchmark | ||
public void escapeFunctionWithDate() throws Exception { | ||
Parser.replaceProcessing("{fn week({d '2005-01-24'})}", true, false); |
vlsi
Oct 6, 2016
•
Member
- Use fields for arguments: http://hg.openjdk.java.net/code-tools/jmh/file/7a25c71b43bf/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_10_ConstantFold.java#l74
- return the value of
Parser.replaceProcessing
: http://hg.openjdk.java.net/code-tools/jmh/file/7a25c71b43bf/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_08_DeadCode.java#l67
- Use fields for arguments: http://hg.openjdk.java.net/code-tools/jmh/file/7a25c71b43bf/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_10_ConstantFold.java#l74
- return the value of
Parser.replaceProcessing
: http://hg.openjdk.java.net/code-tools/jmh/file/7a25c71b43bf/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_08_DeadCode.java#l67
pushed a fixup, @vlsi can you check it pls? (pls let me squash then before merging ;) I've also re-run the benchmarks: MASTERResult "escapeFunctionWithDate": s/strict-escape-processingResult "escapeFunctionWithDate": |
ESC_TIMESTAMP(new char[]{'t','s'}, SINGLE_QUOTE, "TIMESTAMP "), | ||
ESC_FUNCTION(new char[]{'f','n'}, QUOTE_OR_ALPHABETIC_MARKER, null), | ||
ESC_OUTERJOIN(new char[]{'o','j'}, QUOTE_OR_ALPHABETIC_MARKER, null), | ||
ESC_ESCAPECHAR(new char[]{'e','s','c','a','p','e'}, SINGLE_QUOTE, "ESCAPE "); |
vlsi
Oct 6, 2016
Member
Just in case: you can make it more readable if passing "escape" here and using .toCharArray()
in the constructor.
Just in case: you can make it more readable if passing "escape" here and using .toCharArray()
in the constructor.
seut
Oct 6, 2016
Author
Contributor
sure right, it's more readable then, will change
sure right, it's more readable then, will change
seut
Oct 6, 2016
•
Author
Contributor
hm reason why I changed that, was to avoid unnecessary String allocation (even that it's a singleton). but if it's ok for you, i'll change it back to .toCharArray()
hm reason why I changed that, was to avoid unnecessary String allocation (even that it's a singleton). but if it's ok for you, i'll change it back to .toCharArray()
vlsi
Oct 6, 2016
Member
hm reason why I changed that, was to avoid unnecessary String allocation (even that it's a singleton
I see absolutely no reason to keep code obscure. Who cares of singleton initialization?
Just to be clear: I'm picky when it comes to runtime performance (e.g. sql parsing / query execution that happens thousands of times), and I do not care how "initialization" code is written provided it is readable & understandable.
hm reason why I changed that, was to avoid unnecessary String allocation (even that it's a singleton
I see absolutely no reason to keep code obscure. Who cares of singleton initialization?
Just to be clear: I'm picky when it comes to runtime performance (e.g. sql parsing / query execution that happens thousands of times), and I do not care how "initialization" code is written provided it is readable & understandable.
seut
Oct 6, 2016
Author
Contributor
I totally agree, but this is afaik not about performance but memory allocation. anyway its really micro, so I'll change it in favour of readability.
I totally agree, but this is afaik not about performance but memory allocation. anyway its really micro, so I'll change it in favour of readability.
seut
Oct 6, 2016
Author
Contributor
fixup pushed
fixup pushed
// time constant | ||
i++; | ||
newsql.append("TIME "); | ||
SqlParseState[] availableStates = SqlParseState.values(); |
vlsi
Oct 6, 2016
Member
@seut, can you rerun the benchmark with -prof gc
?
I wonder if SqlParseState.values();
is worth caching.
@seut, can you rerun the benchmark with -prof gc
?
I wonder if SqlParseState.values();
is worth caching.
seut
Oct 6, 2016
Author
Contributor
hm yeah was not sure about that too, will re-run
hm yeah was not sure about that too, will re-run
seut
Oct 6, 2016
Author
Contributor
afaik it's worth caching.
cached run:
Result "escapeFunctionWithDate":
665.654 ±(99.9%) 6.960 ns/op [Average]
(min, avg, max) = (625.776, 665.654, 725.398), stdev = 20.521
CI (99.9%): [658.694, 672.614] (assumes normal distribution)
Secondary result "·gc.alloc.rate":
2339.752 ±(99.9%) 23.956 MB/sec [Average]
(min, avg, max) = (2145.052, 2339.752, 2486.556), stdev = 70.636
CI (99.9%): [2315.796, 2363.709] (assumes normal distribution)
Secondary result "·gc.alloc.rate.norm":
1632.001 ±(99.9%) 0.001 B/op [Average]
(min, avg, max) = (1632.000, 1632.001, 1632.008), stdev = 0.002
CI (99.9%): [1632.000, 1632.002] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space":
2328.778 ±(99.9%) 31.127 MB/sec [Average]
(min, avg, max) = (2081.674, 2328.778, 2558.467), stdev = 91.778
CI (99.9%): [2297.652, 2359.905] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space.norm":
1624.358 ±(99.9%) 14.103 B/op [Average]
(min, avg, max) = (1541.086, 1624.358, 1727.382), stdev = 41.583
CI (99.9%): [1610.256, 1638.461] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space":
0.208 ±(99.9%) 0.025 MB/sec [Average]
(min, avg, max) = (0.093, 0.208, 0.406), stdev = 0.074
CI (99.9%): [0.183, 0.233] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space.norm":
0.145 ±(99.9%) 0.017 B/op [Average]
(min, avg, max) = (0.064, 0.145, 0.284), stdev = 0.051
CI (99.9%): [0.127, 0.162] (assumes normal distribution)
Secondary result "·gc.count":
1676.000 ±(99.9%) 0.001 counts [Sum]
(min, avg, max) = (10.000, 16.760, 19.000), stdev = 1.664
CI (99.9%): [1676.000, 1676.000] (assumes normal distribution)
Secondary result "·gc.time":
926.000 ±(99.9%) 0.001 ms [Sum]
(min, avg, max) = (6.000, 9.260, 10.000), stdev = 0.799
CI (99.9%): [926.000, 926.000] (assumes normal distribution)
# Run complete. Total time: 00:03:25
Benchmark Mode Cnt Score Error Units
EscapeProcessing.escapeFunctionWithDate avgt 100 665.654 ± 6.960 ns/op
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate avgt 100 2339.752 ± 23.956 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate.norm avgt 100 1632.001 ± 0.001 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space avgt 100 2328.778 ± 31.127 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space.norm avgt 100 1624.358 ± 14.103 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space avgt 100 0.208 ± 0.025 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space.norm avgt 100 0.145 ± 0.017 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.count avgt 100 1676.000 counts
EscapeProcessing.escapeFunctionWithDate:·gc.time avgt 100 926.000 ms
non-cached, code changed to:
for (int j = 1; j < SqlParseState.values().length; j++) {
SqlParseState availableState = SqlParseState.values()[j];
...
non-cached run:
Result "escapeFunctionWithDate":
662.387 ±(99.9%) 7.569 ns/op [Average]
(min, avg, max) = (608.812, 662.387, 738.422), stdev = 22.318
CI (99.9%): [654.818, 669.956] (assumes normal distribution)
Secondary result "·gc.alloc.rate":
2348.322 ±(99.9%) 26.825 MB/sec [Average]
(min, avg, max) = (2107.321, 2348.322, 2555.984), stdev = 79.093
CI (99.9%): [2321.498, 2375.147] (assumes normal distribution)
Secondary result "·gc.alloc.rate.norm":
1629.601 ±(99.9%) 2.454 B/op [Average]
(min, avg, max) = (1608.000, 1629.601, 1632.008), stdev = 7.236
CI (99.9%): [1627.147, 1632.055] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space":
2344.777 ±(99.9%) 30.917 MB/sec [Average]
(min, avg, max) = (2093.271, 2344.777, 2585.418), stdev = 91.160
CI (99.9%): [2313.860, 2375.694] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space.norm":
1627.368 ±(99.9%) 14.359 B/op [Average]
(min, avg, max) = (1498.802, 1627.368, 1726.184), stdev = 42.339
CI (99.9%): [1613.008, 1641.727] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space":
0.226 ±(99.9%) 0.022 MB/sec [Average]
(min, avg, max) = (0.062, 0.226, 0.404), stdev = 0.065
CI (99.9%): [0.204, 0.247] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space.norm":
0.157 ±(99.9%) 0.015 B/op [Average]
(min, avg, max) = (0.042, 0.157, 0.278), stdev = 0.046
CI (99.9%): [0.141, 0.172] (assumes normal distribution)
Secondary result "·gc.count":
1801.000 ±(99.9%) 0.001 counts [Sum]
(min, avg, max) = (14.000, 18.010, 21.000), stdev = 1.425
CI (99.9%): [1801.000, 1801.000] (assumes normal distribution)
Secondary result "·gc.time":
930.000 ±(99.9%) 0.001 ms [Sum]
(min, avg, max) = (8.000, 9.300, 11.000), stdev = 0.704
CI (99.9%): [930.000, 930.000] (assumes normal distribution)
# Run complete. Total time: 00:03:24
Benchmark Mode Cnt Score Error Units
EscapeProcessing.escapeFunctionWithDate avgt 100 662.387 ± 7.569 ns/op
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate avgt 100 2348.322 ± 26.825 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate.norm avgt 100 1629.601 ± 2.454 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space avgt 100 2344.777 ± 30.917 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space.norm avgt 100 1627.368 ± 14.359 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space avgt 100 0.226 ± 0.022 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space.norm avgt 100 0.157 ± 0.015 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.count avgt 100 1801.000 counts
EscapeProcessing.escapeFunctionWithDate:·gc.time avgt 100 930.000 ms
afaik it's worth caching.
cached run:
Result "escapeFunctionWithDate":
665.654 ±(99.9%) 6.960 ns/op [Average]
(min, avg, max) = (625.776, 665.654, 725.398), stdev = 20.521
CI (99.9%): [658.694, 672.614] (assumes normal distribution)
Secondary result "·gc.alloc.rate":
2339.752 ±(99.9%) 23.956 MB/sec [Average]
(min, avg, max) = (2145.052, 2339.752, 2486.556), stdev = 70.636
CI (99.9%): [2315.796, 2363.709] (assumes normal distribution)
Secondary result "·gc.alloc.rate.norm":
1632.001 ±(99.9%) 0.001 B/op [Average]
(min, avg, max) = (1632.000, 1632.001, 1632.008), stdev = 0.002
CI (99.9%): [1632.000, 1632.002] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space":
2328.778 ±(99.9%) 31.127 MB/sec [Average]
(min, avg, max) = (2081.674, 2328.778, 2558.467), stdev = 91.778
CI (99.9%): [2297.652, 2359.905] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space.norm":
1624.358 ±(99.9%) 14.103 B/op [Average]
(min, avg, max) = (1541.086, 1624.358, 1727.382), stdev = 41.583
CI (99.9%): [1610.256, 1638.461] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space":
0.208 ±(99.9%) 0.025 MB/sec [Average]
(min, avg, max) = (0.093, 0.208, 0.406), stdev = 0.074
CI (99.9%): [0.183, 0.233] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space.norm":
0.145 ±(99.9%) 0.017 B/op [Average]
(min, avg, max) = (0.064, 0.145, 0.284), stdev = 0.051
CI (99.9%): [0.127, 0.162] (assumes normal distribution)
Secondary result "·gc.count":
1676.000 ±(99.9%) 0.001 counts [Sum]
(min, avg, max) = (10.000, 16.760, 19.000), stdev = 1.664
CI (99.9%): [1676.000, 1676.000] (assumes normal distribution)
Secondary result "·gc.time":
926.000 ±(99.9%) 0.001 ms [Sum]
(min, avg, max) = (6.000, 9.260, 10.000), stdev = 0.799
CI (99.9%): [926.000, 926.000] (assumes normal distribution)
# Run complete. Total time: 00:03:25
Benchmark Mode Cnt Score Error Units
EscapeProcessing.escapeFunctionWithDate avgt 100 665.654 ± 6.960 ns/op
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate avgt 100 2339.752 ± 23.956 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate.norm avgt 100 1632.001 ± 0.001 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space avgt 100 2328.778 ± 31.127 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space.norm avgt 100 1624.358 ± 14.103 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space avgt 100 0.208 ± 0.025 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space.norm avgt 100 0.145 ± 0.017 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.count avgt 100 1676.000 counts
EscapeProcessing.escapeFunctionWithDate:·gc.time avgt 100 926.000 ms
non-cached, code changed to:
for (int j = 1; j < SqlParseState.values().length; j++) {
SqlParseState availableState = SqlParseState.values()[j];
...
non-cached run:
Result "escapeFunctionWithDate":
662.387 ±(99.9%) 7.569 ns/op [Average]
(min, avg, max) = (608.812, 662.387, 738.422), stdev = 22.318
CI (99.9%): [654.818, 669.956] (assumes normal distribution)
Secondary result "·gc.alloc.rate":
2348.322 ±(99.9%) 26.825 MB/sec [Average]
(min, avg, max) = (2107.321, 2348.322, 2555.984), stdev = 79.093
CI (99.9%): [2321.498, 2375.147] (assumes normal distribution)
Secondary result "·gc.alloc.rate.norm":
1629.601 ±(99.9%) 2.454 B/op [Average]
(min, avg, max) = (1608.000, 1629.601, 1632.008), stdev = 7.236
CI (99.9%): [1627.147, 1632.055] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space":
2344.777 ±(99.9%) 30.917 MB/sec [Average]
(min, avg, max) = (2093.271, 2344.777, 2585.418), stdev = 91.160
CI (99.9%): [2313.860, 2375.694] (assumes normal distribution)
Secondary result "·gc.churn.PS_Eden_Space.norm":
1627.368 ±(99.9%) 14.359 B/op [Average]
(min, avg, max) = (1498.802, 1627.368, 1726.184), stdev = 42.339
CI (99.9%): [1613.008, 1641.727] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space":
0.226 ±(99.9%) 0.022 MB/sec [Average]
(min, avg, max) = (0.062, 0.226, 0.404), stdev = 0.065
CI (99.9%): [0.204, 0.247] (assumes normal distribution)
Secondary result "·gc.churn.PS_Survivor_Space.norm":
0.157 ±(99.9%) 0.015 B/op [Average]
(min, avg, max) = (0.042, 0.157, 0.278), stdev = 0.046
CI (99.9%): [0.141, 0.172] (assumes normal distribution)
Secondary result "·gc.count":
1801.000 ±(99.9%) 0.001 counts [Sum]
(min, avg, max) = (14.000, 18.010, 21.000), stdev = 1.425
CI (99.9%): [1801.000, 1801.000] (assumes normal distribution)
Secondary result "·gc.time":
930.000 ±(99.9%) 0.001 ms [Sum]
(min, avg, max) = (8.000, 9.300, 11.000), stdev = 0.704
CI (99.9%): [930.000, 930.000] (assumes normal distribution)
# Run complete. Total time: 00:03:24
Benchmark Mode Cnt Score Error Units
EscapeProcessing.escapeFunctionWithDate avgt 100 662.387 ± 7.569 ns/op
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate avgt 100 2348.322 ± 26.825 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.alloc.rate.norm avgt 100 1629.601 ± 2.454 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space avgt 100 2344.777 ± 30.917 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Eden_Space.norm avgt 100 1627.368 ± 14.359 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space avgt 100 0.226 ± 0.022 MB/sec
EscapeProcessing.escapeFunctionWithDate:·gc.churn.PS_Survivor_Space.norm avgt 100 0.157 ± 0.015 B/op
EscapeProcessing.escapeFunctionWithDate:·gc.count avgt 100 1801.000 counts
EscapeProcessing.escapeFunctionWithDate:·gc.time avgt 100 930.000 ms
vlsi
Oct 6, 2016
Member
afaik it's worth caching.
Well, the difference seems to be just 5 bytes: 1632-1627.
Feel free to cache or non-cache
afaik it's worth caching.
Well, the difference seems to be just 5 bytes: 1632-1627.
Feel free to cache or non-cache
for (int j = 1; j < availableStates.length; j++) { | ||
SqlParseState availableState = availableStates[j]; | ||
int matched = SqlParseState.matchEscapeCode(p_sql, i + 1, | ||
availableState.escapeKeyword, availableState.allowedValues); |
vlsi
Oct 6, 2016
Member
Sorry for one-line comments, but wouldn't that be better to coded as "non-static" enum method?
Something like int matchPoosition = availableState.getMatchPosition(p_sql, i+1)
.
Sorry for one-line comments, but wouldn't that be better to coded as "non-static" enum method?
Something like int matchPoosition = availableState.getMatchPosition(p_sql, i+1)
.
seut
Oct 6, 2016
Author
Contributor
pushed fixup for this
pushed fixup for this
addresses code style
Current coverage is 62.01% (diff: 90.90%)@@ master #657 diff @@
==========================================
Files 150 150
Lines 15164 15178 +14
Methods 0 0
Messages 0 0
Branches 3051 3053 +2
==========================================
+ Hits 9396 9413 +17
- Misses 4516 4520 +4
+ Partials 1252 1245 -7
|
@vlsi afaik I've addressed all comments, can you check pls again? thx |
before, escape code parsing was very lenient and invalid escape code formats were accidentally processed. closes #657
Non-standard `{oj (...)}` is produced by CrystalReports, so enable that deviation from the spec and ignore the parenthesis. Note: this basically reverts "strict" part of #657
Non-standard `{oj (...)}` is produced by CrystalReports, so enable that deviation from the spec and ignore the parenthesis. Note: this basically reverts "strict" part of pgjdbc#657
Non-standard `{oj (...)}` is produced by CrystalReports, so enable that deviation from the spec and ignore the parenthesis. Note: this basically reverts "strict" part of pgjdbc#657
Before, escape code parsing was very lenient and invalid escape code formats were accidentally processed.
I've also added a JMH benchmark, so it seems that my new implementation even runs slightly faster:
MASTER
Result "escapeFunctionWithDate":
677.556 ±(99.9%) 7.699 ns/op [Average](min, avg, max) = (638.185, 677.556, 741.213), stdev = 22.701
CI (99.9%): [669.856, 685.255](assumes normal distribution)
s/strict-escape-processing
Result "escapeFunctionWithDate":
666.126 ±(99.9%) 5.878 ns/op [Average](min, avg, max) = (633.941, 666.126, 704.415), stdev = 17.333
CI (99.9%): [660.247, 672.004](assumes normal distribution)
Reason for coming across this is that at crate (https://github.com/crate/crate), we've experimentally implemented the PostgreSQL wire protocol for client connections. crate supports object literals like e.g.
{ o1 = 'string'}
(http://crate.readthedocs.io/en/latest/sql/data_types.html#object-literals) which were wrongly interpreted as an outerjoin escape code by the existing implementation.