LDEV-4298 QoQ performance enhancements #1887
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull should not change the outward behavior of QoQ at all. It refactors all the looping does internally the process a native QoQ to instead use Java Streams. If there are more than 50 records in the query object, a parallel stream will be used which uses more than one thread to process your query. This brings a performance boost of over 200% which I've tested using up to 1 Million rows. The more complex the query is (where clause operations, selects, grouping, etc) the better the performance gains are. Lucee is now also now even more performant than Adobe CF's QoQ in nearly all cases.
The threshold of parallelism can be controlled by the following new system property:
lucee.qoq.parallelism
Set the property to the number of records where you want parallel threads to kick in. Setting it to
0
would make all QoQs parallel. setting it to a large number like9999999
would effectively make no QoQ's parallel. The default is 50, which is where I measured the performance gains outweighed the overhead of threading.Note, Java Streams will not use multi-threaded sorting (used in
ORDER BY
processing) until the number of items is above ~8,000. This is a hard-coded limit baked into the JDK.Java streams use the fork/join pool by default, which has as many threads as you have CPU cores on the machine. Since QoQ is an inherently CPU-bound operation, this makes sense to use. This also means that your performance improvements will be directly tied to your number of CPU cores because more cores means more threads in the pool.
The
ORDER BY
logic has also been entirely rewritten. The sort is no longer performed on each column in separate passes, which copies all data in the query for every pass. There is now a newQueryComparator
class which sorts all columns in theORDER BY
in a single pass, short circuiting where possible so if the first column is already sorted above or below, it doesn't even proceed to the next column. Furthermore, the sort register logic has been removed in favor of anint[]
stream which maps the original row to the new sorted location without the need to have any copies of the data nor a separate sort register class to track the mapping. The actual data in the query is only re-sorted in memory once. This has around a 300% increase in sort performance when several columns are being sorted on. Many new tests have been added to ensure the behavior of sorting every possible type of query column behaves the same as Lucee 5.3.10.Other change have been made to reduce async bottlenecks when multiple threads are adding rows to a query object. The addRow() method was previously synchronized on both the query and column class, which led to unecessary blocking since every addRow() call doesn't necessarily touch the native array of data. I have removed the heavy-handed syncronization from these methods, opting to ONLY syncronize the portion of the
growTo()
method IF AND ONLY IF the internal native array needs grown. In all other cases, there is no locking. Furthermore, I have replaced therecordcount
property of theQueryImpl
class and thesize
property of theQueryColumnImpl
classes to be anAtomicInteger
to ensure they don't get corrupted without the locking in a multi-threaded environment. I have also modifiedgetRow()
to return the row specifically added, not just the last row in the query as the previous behavior was not threadsafe.The class that handles compilation of
LIKE
statements now employs a simple double-checked synchronized block to prevent parallel streams from compiling the same regex twice.Special treatment has also been added to the
ColumnExpression
class so it no longer holds cached references to the original query object when it is not safe to do so in multiple threadsIt was necessary add new methods to the
QueryImpl
andQueryColumnImpl
classes for some of this functionality. Since theQuery
andQueryColumn
interfaces are in the loader, which is not being changed for Lucee 6, I was forced to switch the typing in theQoQ
class to use the concrete class. Otherwise, I would have to constantly cast the query objects toQueryImpl
to use these methods and I was concerned about the performance (not to mention the code readability). When Lucee updates the loader again, we can add these new methods to the interfaces and go back and start using the Interface types.I also noticed heavy string concatenation performance was VERY poor under load due to Lucee always trying to cast both operands to a number, and when that failed, THEN defaulting to string concat. I remedied that to use
Decision.isNumber()
which is much faster to detect when we have strings or numbers. This had a huge performance boost for any expressions concatenating strings in the select, group by, or where.All previous QoQ tests should be passing. I have added a new suite of tests that hits all affected QoQ functionality and that runs against 1 Million records to ensure the parallel threads kick in.
Note: when you use a QoQ and do NOT use an
ORDER BY
clause, you cannot depend on the final rows to be in the same order as the original query due to the asynchronous nature of threading. If you want a specific order, you'll need to use anORDER BY
. The previous QoQ implementation tended to preserve the order of the original query object, but this was never a guarantee.