ERXFetchSpecification fetchRange support #213

kierankelleher · 2012-06-08T04:13:28Z

Don't actually pull this into integration. The objective of this pull request is to get feedback on whether this is fundamentally something we should even consider doing in public Wonder. Is it something useful?

Setting an NSRange of { 100, 50 } on an ERXFetchSpecification and fetching will generate SQL something like this in MySQL

SELECT ..... LIMIT 100, 50;

.. that is, return the 50 records from record 101 thru 150 in what would otherwise have been a large resultset.

Implementing this in other PlugIns should be pretty easy for databases that support. The intent is not to replace ERXFetchSpecificationBatchIterator but to have another mechanism available to directly batch in SQL if desired.

Comments, opinions, criticism on its usefulness and whether we should include this in Wonder is welcome :-)

Commit comments:

Adds fetchRange option to ERXFetchSpecification and fetchRange support to _MySQL.Plugin.MySQLExpression.

ERXFetchSpecificationBatchIterator in its current incarnation works fine for batching where a reference to the
batch iterator is maintained. However in direct actions, as used inside current ERXRestFetchSpecification.results(...)
method, a new ERXFetchSpecificationBatchIterator is created on each request resulting in the entire population
of primary keys being fetched and then the batch of EOEnterpriseObjects for the primary key range is fetched.

Having a fetchRange option in ERXFetchSpecification, with support in SQL PlugIns, enables the same range of objects
to be returned in one fetch in a stateless request. This is useful for stateless requests that request a range of
a huge population of objects. This feature may also not have the ERXFetchSpecificationBatchIterator
limitation of not supporting multi-column primary keys. Another advantage is that it has no limit to the total count of
the population of objects form which the range if fetched whereas ERXFetchSpecificationBatchIterator does have a practical
limit associated with ther total number of primary keys that can or should be fetched.

A disadvantage of direct SQL batch fetch is that for some complex qualifiers that return duplicate rows, the DISTINCT property
must be set to true to get accurate results. Having DISTINCT and ORDER BY in the same SQL statement apparently causes
problems in some database platforms such as Oracle that is taken into account by ERXFetchSpecificationBatchIterator
(see ERXSQLHelper.newSQLHelper(entity).shouldPerformDistinctInMemory(pkFetchSpec)) reference in primaryKeys() method
of ERXFetchSpecificationBatchIterator.

…t to _MySQL.Plugin.MySQLExpression. ERXFetchSpecificationBatchIterator in its current incarnation works fine for batching where a reference to the batch iterator is maintained. However in direct actions, as used inside current ERXRestFetchSpecification.results(...) method, a new ERXFetchSpecificationBatchIterator is created on each request resulting in the entire population of primary keys being fetched and then the batch of EOEnterpriseObjects for the primary key range is fetched. Having a fetchRange option in ERXFetchSpecification, with support in SQL PlugIns, enables the same range of objects to be returned in one fetch in a stateless request. This is useful for stateless requests that request a range of a huge population of objects. This feature may also not have the ERXFetchSpecificationBatchIterator limitation of not supporting multi-column primary keys. Another advantage is that it has no limit to the total count of the population of objects form which the range if fetched whereas ERXFetchSpecificationBatchIterator does have a practical limit associated with ther total number of primary keys that can or should be fetched. A disadvantage of direct SQL batch fetch is that for some complex qualifiers that return duplicate rows, the DISTINCT property must be set to true to get accurate results. Having DISTINCT and ORDER BY in the same SQL statement apparently causes problems in some database platforms such as Oracle that is taken into account by ERXFetchSpecificationBatchIterator (see ERXSQLHelper.newSQLHelper(entity).shouldPerformDistinctInMemory(pkFetchSpec)) reference in primaryKeys() method of ERXFetchSpecificationBatchIterator.

darkv · 2012-06-08T06:11:34Z

This seems really useful!
For what type of complex qualifiers did you observe wrong results? Should the distinct flag then be automatically set to true when a range is set on the fetch specification? If the distinct is done in memory after the fetch has been done (seems to be the case for Oracle and DB2) perhaps the plugin should automatically fill the result up to match the requested count of objects (either by fetching NSRange.length() + delta and discarding the ones that are surplus or fetching a second time to fill up the result set)?

kierankelleher · 2012-06-08T14:25:24Z

About Distinct in SQL

I needed distinct on a keyPath qualifier that included 2 many-to-many join tables, something like this where I was fetching dataList where keyPath contained a specific user.

dataList <-->> joinDataListCampaign <<--> campaign <<-->customer <-->> joinCustomerUser <<--> user

In any case, I would not get hung up too much on the distinct aspect since it is only needed on some fetches. Better to leave the developer decide whether to use distinct on the fetch spec or not. So, to answer your question, I would not recommend setting distinct automatically for a few reasons:

[1] Depending on the database platform DISTINCT can have the same effect as using GROUP BY and apparently may cause unnecessary overhead if used when it is not actually needed.

[2] Also looking at the javadoc again about the oracle issue at ERXSQLHelper.canReliablyPerformDistinctWithSortOrderings() it says:

Oracle, for instance, will fail if you try to sort on a key that isn't in the list of fetched keys.

However the above Oracle problem (or any db platform that has a similar issue) is less likely when using a fetchRange since we are fetching all the keys of the entity unlike @@ERXFetchSpecificationBatchIterator@@ which is only fetching the primaryKeys.

Interestingly, PostgreSQL anf FrontBase SQLExpression may have a similar special handling of DISTINCT and an Anjo hack can be seen to address this in both those PlugIns at assembleSelectStatementWithAttributes(...)

About Distinct in Memory and Manipulating Results

If distinct is done in memory, then the ERXFetchSpecificationBatchIterator approach is more accurate since, in the case of duplicate (or triplicate, etc.) rows the fetchRange would not only has less objects that requested, but also has the wrong objects since the start position is wrong.

Again, I think the best approach here is let the developer decide when distinct should be used in the FetchSpec since in many (most?) cases it is not needed

kierankelleher · 2012-06-08T14:47:04Z

By the way, implementing support in PostgreSQL PlugIn looks pretty simple, so if we go ahead with this I will add support in PostgreSQL, but will need someone who uses PostgreSQL to road-test it.

hprange · 2012-06-08T15:28:15Z

I can test the PostgreSQL support if you want.

lbane · 2012-06-08T20:02:08Z

How does this implementation compare to ERXEOControlUtilities.objectsInRange(...)? Those method worked already, at least with PostgreSQL, the last time I used it.

kierankelleher · 2012-06-08T20:43:36Z

Just looking at source code, it seems like fetchRange is a lower level implementation that is encapsulated in ERXFetchSpecification whereas ERXEOControlUtilities.objectsInRange(...) appears to be built with the intention of working around the fact that EOFetchSpecification and EOSQLExpression in general did not support fetch range.

Following the logic for ERXEOControlUtilities.objectsInRange(...), it does one of two things:

[1] If if has custom SQL hints, it conveniently uses ERXSQLHelper.limitExpressionForSQL(...) to modify the hint and stuff it back in there before letting EOF do its normal thing. the fetchRange feature does not consider custom SQL query hint since AFAIK, EOF just takes it as is and executes it.

[2] For a common fetchspec, it fetches primaryKeys in the range, turns those into global IDs and fetchjes the objects for the global IDs.

kierankelleher · 2012-06-08T20:56:07Z

So fetchRange just "feels" like it belongs encapsulated in ERXFetchSpecification instead of having to use utility methods and working with PKs to get a range of objects. However the original creators of EOF obviously considered this and left it out for good reason.

Does anyone see a good reason why we should not implement fetchRange in ERXFetchSpecification given that we already have fetchLimit support in popular PlugIns and given that for common database platforms a fetchRange (offset + limit) is just an extension of the simple LIMIT (limit without specificed offset) clause?

ghost · 2012-06-15T03:03:01Z

My only concern is that with ERXFetchSpecificationBatchIterator you have a consistent "snapshot" of object state at a point in time. Objects stay in the batches they fall into. Doing a LIMIT fetch and moving forward and backwards, you will see the effect of additions and deletions. I don't know that I want to argue that either is absolutely correct. I think it is important to document this difference.

kierankelleher · 2012-06-15T04:00:34Z

Thanks for the feedback. Yes I agree with you and it makes sense to me that current implementation is better for the stateful case where you indeed have a snapshot.

However for the stateless (direct action) case, it takes a snapshot (even of millions of objects) on every request ( as demonstrated by ERXRestFetchSpecification for example )

Yes I think that this is useful in specific cases only.

…ationRange Adds fetchRange option to ERXFetchSpecification and fetchRange support to _MySQL.Plugin.MySQLExpression. ERXFetchSpecificationBatchIterator in its current incarnation works fine for batching where a reference to the batch iterator is maintained. However in direct actions, as used inside current ERXRestFetchSpecification.results(...) method, a new ERXFetchSpecificationBatchIterator is created on each request resulting in the entire population of primary keys being fetched and then the batch of EOEnterpriseObjects for the primary key range is fetched. Having a fetchRange option in ERXFetchSpecification, with support in SQL PlugIns, enables the same range of objects to be returned in one fetch in a stateless request. This is useful for stateless requests that request a range of a huge population of objects, for example in a REST API. This feature may also not display the ERXFetchSpecificationBatchIterator limitation of not supporting multi-column primary keys. Another advantage is that it has no limit to the total count of the population of objects from which the range is fetched whereas ERXFetchSpecificationBatchIterator does have a practical limit associated with ther total number of primary keys that can or should be fetched. Keep in mind that ERXFetchSpecificationBatchIterator, if suitable, is probably still a better choice for stateful requests. If a SQL query returns duplicate rows then the distinct option may need to be set on the fetchspec to accurately return the expected results. If the fetch is against an abstract entity the results may be unpredictable.

ghost assigned kierankelleher Jun 8, 2012

kierankelleher merged commit 0dc67f6 into wocommunity:integration Jun 21, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERXFetchSpecification fetchRange support #213

ERXFetchSpecification fetchRange support #213

kierankelleher commented Jun 8, 2012

darkv commented Jun 8, 2012

kierankelleher commented Jun 8, 2012

kierankelleher commented Jun 8, 2012

hprange commented Jun 8, 2012

lbane commented Jun 8, 2012

kierankelleher commented Jun 8, 2012

kierankelleher commented Jun 8, 2012

ghost commented Jun 15, 2012

kierankelleher commented Jun 15, 2012

ERXFetchSpecification fetchRange support #213

ERXFetchSpecification fetchRange support #213

Conversation

kierankelleher commented Jun 8, 2012

Commit comments:

darkv commented Jun 8, 2012

kierankelleher commented Jun 8, 2012

About Distinct in SQL

About Distinct in Memory and Manipulating Results

kierankelleher commented Jun 8, 2012

hprange commented Jun 8, 2012

lbane commented Jun 8, 2012

kierankelleher commented Jun 8, 2012

kierankelleher commented Jun 8, 2012

ghost commented Jun 15, 2012

kierankelleher commented Jun 15, 2012