-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERXFetchSpecification fetchRange support #213
ERXFetchSpecification fetchRange support #213
Conversation
…t to _MySQL.Plugin.MySQLExpression. ERXFetchSpecificationBatchIterator in its current incarnation works fine for batching where a reference to the batch iterator is maintained. However in direct actions, as used inside current ERXRestFetchSpecification.results(...) method, a new ERXFetchSpecificationBatchIterator is created on each request resulting in the entire population of primary keys being fetched and then the batch of EOEnterpriseObjects for the primary key range is fetched. Having a fetchRange option in ERXFetchSpecification, with support in SQL PlugIns, enables the same range of objects to be returned in one fetch in a stateless request. This is useful for stateless requests that request a range of a huge population of objects. This feature may also not have the ERXFetchSpecificationBatchIterator limitation of not supporting multi-column primary keys. Another advantage is that it has no limit to the total count of the population of objects form which the range if fetched whereas ERXFetchSpecificationBatchIterator does have a practical limit associated with ther total number of primary keys that can or should be fetched. A disadvantage of direct SQL batch fetch is that for some complex qualifiers that return duplicate rows, the DISTINCT property must be set to true to get accurate results. Having DISTINCT and ORDER BY in the same SQL statement apparently causes problems in some database platforms such as Oracle that is taken into account by ERXFetchSpecificationBatchIterator (see ERXSQLHelper.newSQLHelper(entity).shouldPerformDistinctInMemory(pkFetchSpec)) reference in primaryKeys() method of ERXFetchSpecificationBatchIterator.
This seems really useful! |
About Distinct in SQLI needed distinct on a keyPath qualifier that included 2 many-to-many join tables, something like this where I was fetching dataList where keyPath contained a specific user.
In any case, I would not get hung up too much on the distinct aspect since it is only needed on some fetches. Better to leave the developer decide whether to use distinct on the fetch spec or not. So, to answer your question, I would not recommend setting distinct automatically for a few reasons: [1] Depending on the database platform DISTINCT can have the same effect as using GROUP BY and apparently may cause unnecessary overhead if used when it is not actually needed. [2] Also looking at the javadoc again about the oracle issue at ERXSQLHelper.canReliablyPerformDistinctWithSortOrderings() it says:
However the above Oracle problem (or any db platform that has a similar issue) is less likely when using a fetchRange since we are fetching all the keys of the entity unlike @@ERXFetchSpecificationBatchIterator@@ which is only fetching the primaryKeys. Interestingly, PostgreSQL anf FrontBase SQLExpression may have a similar special handling of DISTINCT and an Anjo hack can be seen to address this in both those PlugIns at assembleSelectStatementWithAttributes(...) About Distinct in Memory and Manipulating ResultsIf distinct is done in memory, then the ERXFetchSpecificationBatchIterator approach is more accurate since, in the case of duplicate (or triplicate, etc.) rows the fetchRange would not only has less objects that requested, but also has the wrong objects since the start position is wrong. Again, I think the best approach here is let the developer decide when distinct should be used in the FetchSpec since in many (most?) cases it is not needed |
By the way, implementing support in PostgreSQL PlugIn looks pretty simple, so if we go ahead with this I will add support in PostgreSQL, but will need someone who uses PostgreSQL to road-test it. |
I can test the PostgreSQL support if you want. |
How does this implementation compare to ERXEOControlUtilities.objectsInRange(...)? Those method worked already, at least with PostgreSQL, the last time I used it. |
Just looking at source code, it seems like fetchRange is a lower level implementation that is encapsulated in ERXFetchSpecification whereas ERXEOControlUtilities.objectsInRange(...) appears to be built with the intention of working around the fact that EOFetchSpecification and EOSQLExpression in general did not support fetch range. Following the logic for ERXEOControlUtilities.objectsInRange(...), it does one of two things: [1] If if has custom SQL hints, it conveniently uses ERXSQLHelper.limitExpressionForSQL(...) to modify the hint and stuff it back in there before letting EOF do its normal thing. the fetchRange feature does not consider custom SQL query hint since AFAIK, EOF just takes it as is and executes it. [2] For a common fetchspec, it fetches primaryKeys in the range, turns those into global IDs and fetchjes the objects for the global IDs. |
So fetchRange just "feels" like it belongs encapsulated in ERXFetchSpecification instead of having to use utility methods and working with PKs to get a range of objects. However the original creators of EOF obviously considered this and left it out for good reason. Does anyone see a good reason why we should not implement fetchRange in ERXFetchSpecification given that we already have fetchLimit support in popular PlugIns and given that for common database platforms a fetchRange (offset + limit) is just an extension of the simple LIMIT (limit without specificed offset) clause? |
My only concern is that with ERXFetchSpecificationBatchIterator you have a consistent "snapshot" of object state at a point in time. Objects stay in the batches they fall into. Doing a LIMIT fetch and moving forward and backwards, you will see the effect of additions and deletions. I don't know that I want to argue that either is absolutely correct. I think it is important to document this difference. |
Thanks for the feedback. Yes I agree with you and it makes sense to me that current implementation is better for the stateful case where you indeed have a snapshot. However for the stateless (direct action) case, it takes a snapshot (even of millions of objects) on every request ( as demonstrated by ERXRestFetchSpecification for example ) Yes I think that this is useful in specific cases only. |
…ationRange Adds fetchRange option to ERXFetchSpecification and fetchRange support to _MySQL.Plugin.MySQLExpression. ERXFetchSpecificationBatchIterator in its current incarnation works fine for batching where a reference to the batch iterator is maintained. However in direct actions, as used inside current ERXRestFetchSpecification.results(...) method, a new ERXFetchSpecificationBatchIterator is created on each request resulting in the entire population of primary keys being fetched and then the batch of EOEnterpriseObjects for the primary key range is fetched. Having a fetchRange option in ERXFetchSpecification, with support in SQL PlugIns, enables the same range of objects to be returned in one fetch in a stateless request. This is useful for stateless requests that request a range of a huge population of objects, for example in a REST API. This feature may also not display the ERXFetchSpecificationBatchIterator limitation of not supporting multi-column primary keys. Another advantage is that it has no limit to the total count of the population of objects from which the range is fetched whereas ERXFetchSpecificationBatchIterator does have a practical limit associated with ther total number of primary keys that can or should be fetched. Keep in mind that ERXFetchSpecificationBatchIterator, if suitable, is probably still a better choice for stateful requests. If a SQL query returns duplicate rows then the distinct option may need to be set on the fetchspec to accurately return the expected results. If the fetch is against an abstract entity the results may be unpredictable.
Don't actually pull this into integration. The objective of this pull request is to get feedback on whether this is fundamentally something we should even consider doing in public Wonder. Is it something useful?
Setting an NSRange of { 100, 50 } on an ERXFetchSpecification and fetching will generate SQL something like this in MySQL
SELECT ..... LIMIT 100, 50;
.. that is, return the 50 records from record 101 thru 150 in what would otherwise have been a large resultset.
Implementing this in other PlugIns should be pretty easy for databases that support. The intent is not to replace ERXFetchSpecificationBatchIterator but to have another mechanism available to directly batch in SQL if desired.
Comments, opinions, criticism on its usefulness and whether we should include this in Wonder is welcome :-)
Commit comments:
Adds fetchRange option to ERXFetchSpecification and fetchRange support to _MySQL.Plugin.MySQLExpression.
ERXFetchSpecificationBatchIterator in its current incarnation works fine for batching where a reference to the
batch iterator is maintained. However in direct actions, as used inside current ERXRestFetchSpecification.results(...)
method, a new ERXFetchSpecificationBatchIterator is created on each request resulting in the entire population
of primary keys being fetched and then the batch of EOEnterpriseObjects for the primary key range is fetched.
Having a fetchRange option in ERXFetchSpecification, with support in SQL PlugIns, enables the same range of objects
to be returned in one fetch in a stateless request. This is useful for stateless requests that request a range of
a huge population of objects. This feature may also not have the ERXFetchSpecificationBatchIterator
limitation of not supporting multi-column primary keys. Another advantage is that it has no limit to the total count of
the population of objects form which the range if fetched whereas ERXFetchSpecificationBatchIterator does have a practical
limit associated with ther total number of primary keys that can or should be fetched.
A disadvantage of direct SQL batch fetch is that for some complex qualifiers that return duplicate rows, the DISTINCT property
must be set to true to get accurate results. Having DISTINCT and ORDER BY in the same SQL statement apparently causes
problems in some database platforms such as Oracle that is taken into account by ERXFetchSpecificationBatchIterator
(see ERXSQLHelper.newSQLHelper(entity).shouldPerformDistinctInMemory(pkFetchSpec)) reference in primaryKeys() method
of ERXFetchSpecificationBatchIterator.