Add support for PagingAndSortingRepositories [DATAGEODE-263] #311
Comments
John Blum commented SDG's Repository infrastructure extension Paging functionality design will follow a 2 phase implementation:
Between 1 & 2, the SDG Repository infrastructure extension, will calculate which sub-set of keys matching the "requested" SDG's Paging implementation could be a single phase approach if Apache Geode provided the notion of a Also note that SDG's Paging implementation was designed with Query Projections in mind. While Projections are not yet implemented, it is an important consideration in the design of Paging. |
John Blum commented Keep in mind that a Repository query method cannot have both a If an query result set order is required, which is often the case when paging the results of a query, then the Of course, users are always free to order the results of the query after the query method returns in addition to providing a Users are encouraged to apply the |
John Blum commented 1 important distinction between the "Derived" queries, "Named" queries and This is/was an implementation choice made by the SDG framework given the name of the provided query method is " This choice resulted in a simpler implementation and a "single" query. However, this could prove to be very costly in practice if 1) the number of entries in the Region is large, 2) the size of the objects stored in the Region is large and 3) the results are coming from a PR. Still, for certain UCs, this implementation could be quite useful and even more optimal vs. the 2 phase approach. As such, the SDG framework encourages the use of "Derived", "Named" and |
John Blum commented 1 optimization that can be made to the 2 phase query approach/implementation for Paging is use of the OQL If the user/caller has not already specified the query result set The limit = normalize(pageable.getNumber()) * pageable.getSize(); Of course, the "page number" would be adjusted (i.e. normalized) given The query in phase 1 would be approximately...
For example, if the Page Number were 1 (i.e. page 2) and the Page Size were 5 (i.e. 5 elements per page) then the LIMIT would be 10. We only need to bring back the first 10 (ordered) results (if there are even 10 elements in the result set) given we are only interested in page 2. If the query in the first phase would potentially return 100s, 1000s or even 100s of thousands of keys (etc), then this would limit the number of results returned based on the page requested, saving bandwidth, memory and processing. Though, generally speaking, a properly tuned and qualified query should rarely return 100s of thousands of results in the first place. This would be beneficial for the first few pages of the results since it is likely users of paging applications do not search far beyond the 2nd page of results (or even the 1st page in most cases) anyway. Of course, application developers are encouraged to "prioritize" the query results from most relevant to least relevant first, which would further support this optimization. This involves carefully planned query predicate(s). This optimization in no way has any effect on paging behavior. However, it does impact the the framework's ability to determine the total query result set size. If 1000 keys would be returned in the query matching the predicate, then we would know the total matching entries is 1000. However, by applying a LIMIIT based on the page number and page size requested, we will not know the total query result set size |
John Blum commented Regarding The most optimal approach to query a PR is by way of The keys provide "routing" to the member node in the cluster hosting data for the targeted key(s) (which could be a secondary unless the Of course, this requires there to be a OQL query "submitting" SDG's current querying behavior on PRs hosted by data nodes in the cluster is not currently implemented this way and neither will page-based queries even though this could be a nice optimization. This might be open for consideration later in the future |
John Blum commented Regarding JOIN OQL queries and Paging... JOIN OQL queries involve 2 or more Regions, for example:
The second collection in the FROM clause does not even need to be a Region. In this case, the "paging" functionality (and 2 phase approach) will applied to NOTE: generally speaking, JOINS are not applicable to Repositories. The only way to perform a JOIN is by use of "Named" and Actually, this might require more careful thought and consideration and as such, may not be handled in the first iteration |
John Blum commented Thought of another approach to Paging vs. the 2 phase approach/implementation described in the first comment above. Given the use of the This has the advantage of running a single query and minimizing error-prone modification(s) of the original query issued by the caller. Modifying the original query requires careful pattern matching, parsing and replacement. By way of example, given a value query:
The query to retrieve the keys based on the query predicate(s) is roughly equivalent to:
This requires significant, non-trivial modification to the original query, which, while possible, is error-prone and further complicated by "non-Derived" queries (i.e. Unfortunately, the downside of the For instance, if the page size were 100 and the user requested page 10, then this would retrieve the first 1000 values (if present) vs the 2 phase approach which would always be based on the page size, but result in 2 queries. Of course, again, if the user uses sensible page sizes (e.g. 20) and well-crafted query predicates (targeted results) the effects of the For the time being, I am considering using the 2 phase approach/implementation for "Derived" queries and the |
John Blum commented Another thought regarding It is not immediately apparent what type (i.e. For instance, you cannot determine the Typically, this determination is only achievable through a "custom" Of course, Gfsh itself provides such behavior, which is even typically implemented with a |
John Blum commented Another thought on Paging and Actually, the 2 phase approach/implementation could be further optimized now by inspecting the Certainly, if the user requested the first page of results, then the query for keys does not need to be executed. The framework can simply "limit" the number of results to the first page. Subsequently, the framework could also forgo the query for keys if 1) the page number is (say) 0-2 (pages 1, 2, or 3), 2) the page size is sensible (e.g. <= 20) and 3) object size is reasonable. The object size would be reasonable in the context of Projections, most likely, which is why Projections is a important consideration in Paging. Still, an arbitrary page count and page size maybe in sufficient for some Use Cases. Unfortunately, users are notorious for storing extremely large objects. This could also be configurable and tunable to a degree |
John Blum commented NOTE: I will be filing a new JIRA ticket to complete the 2-phase paged OQL query implementation and approach to paging |
John Blum opened DATAGEODE-263 and commented
This JIRA will track the (epic) development of Spring Data for Apache Geode's (SDG) support of the Spring Data Common's
PagingAndSortingRepository
.Currently,
Apache Geode
does not implement nor support the notion of a database cursor in the querying infrastructure (e.g. Apache Geode'sQueryService
), which allows for such things as [pre-]fetch size, handling concurrent updates and scroll sensitivity.However, a good interim solution may be to collect a "list" of keys for the values satisfying the query predicate and lazily fetch the values based on the ordered, paged results.
Apache Geode could handle concurrent updates by the user flipping the
copy-on-read
switch, but scroll sensitivity would not be handled unless the keys were cached. Still, the "cached" keys would be invalidated the moment the user changed the sort order since the "list" is retaining the order the user initially specified, and without the values, there is no way to change the order. Therefore, subsequent queries with the original predicate could result in a significantly different result set thus changing the number and order of the resultsReference URL: https://jira.spring.io/browse/SGF-524
1 votes, 2 watchers
The text was updated successfully, but these errors were encountered: