Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit with DataList::filterByCallback is inconsistent. #14

Open
GuySartorelli opened this issue Mar 21, 2022 · 6 comments
Open

Limit with DataList::filterByCallback is inconsistent. #14

GuySartorelli opened this issue Mar 21, 2022 · 6 comments

Comments

@GuySartorelli
Copy link
Member

Affected Version: 4.9+

Description

If you call limit on a DataList, and then call filterByCallback, the data query will first be executed, which gets a limited number of results from the database. This limited result set is then filtered using the callback, which may result in even less results being returned.

This was pointed out by @christopherdarling in the below comment, which perhaps explains it a little more clearly:

that's true but this does unearth the issue of the order you call these methods for example;

  1. if you do Page::get()->filterByCallback(fn () => true)->limit(10)->count();, filterByCallback() loops over the entire set of Page rows (possibly 1000's) to do the filter comparison even after the 10th valid item is found which isn't necessary. I think this code change avoids that
  2. if you do Page::get()->limit(10)->filterByCallback(fn ($item) => $item->URLSegment !== 'home')->count(); I get 9 records because we're filtering on the 10 records from the database and 1 fails this filter
  3. if you do Page::get()->reverse()->limit(10)->filterByCallback(fn ($item) => $item->URLSegment !== 'home')->count(); I get 10 records (where the filter doesn't remove any rows)

Originally posted by @christopherdarling in silverstripe/silverstripe-framework#10248 (comment)

@GuySartorelli
Copy link
Member Author

I was thinking of resolving this by removing the limit from a clone of the DataList prior to running the callback filter loop, and then breaking the loop once we have the limit number of items. That will respect the limit better, but it would not be performant as it's still fetching (and presumably instantiating) all records from the database.

I haven't had time to really dive into this so I'm raising it as an issue for now.

@dhensby
Copy link
Contributor

dhensby commented Mar 21, 2022

That will respect the limit better, but it would not be performant as it's still fetching (and presumably instantiating) all records from the database.

Yes, I don't think the solution should be to just discard the limit off the original query... Whilst this shouldn't instantiate all the objects, it could load thousands, hundreds of thousands, etc of DB rows into memory only to return 10.

It may be better to go through the results in chunks of 2 x limit until the limit is hit. What happens when we start introducing offset logic? For that to work properly, we always need to start from the beginning and not rely on any DB level offsets.

Ultimately, the filterByCallback method is always going to be crude and, as datasets get large, be completely unreliable as a method for filtering large sets. I think we have two choices, one of which is just to leave it as it is and have it be a known limitation of the feature, or we improve it but provide lots of guidance/warnings about how it really shouldn't be used for any large sized datasets.

@kinglozzer
Copy link
Member

I think we have two choices, one of which is just to leave it as it is and have it be a known limitation of the feature

This gets my vote. I think time on this would be better spent elsewhere given that filterByCallback() is a bit of an anti-pattern

@christopherdarling
Copy link
Contributor

I'm happy enough with it being left as is also. I wonder if it's just worth documenting somewhere these limitations/gotchas and to consider the order of the use of this due to potential performance impact

@GuySartorelli
Copy link
Member Author

Documenting this as a known limitation sounds like a good idea.

@GuySartorelli GuySartorelli transferred this issue from silverstripe/silverstripe-framework Jul 16, 2022
@GuySartorelli
Copy link
Member Author

Moved to docs repo since the concensus is to update the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants