Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when searching large datasets for common query using paginate #807

Closed
razvaniacob opened this issue Feb 19, 2024 · 12 comments · Fixed by #817
Closed

Exception when searching large datasets for common query using paginate #807

razvaniacob opened this issue Feb 19, 2024 · 12 comments · Fixed by #817
Assignees

Comments

@razvaniacob
Copy link

razvaniacob commented Feb 19, 2024

Scout Version

10.8

Scout Driver

Typesense

Laravel Version

10.44.0

PHP Version

8.2.13

Description

When searching for a string that is relatively common throughout a collection of indexed data, the results can not be fetched or displayed because the search with the Scout driver results in the following exception:

Typesense\\Exceptions\\ObjectUnprocessable(code: 0): Only upto 250 hits can be fetched per page

Steps to reproduce

  1. Start a fresh Laravel project, install Scout and the Typesense Scout driver according to documentation
  2. Add a model and make it searchable
  3. Add a large dataset (~ 50000+ records with "lorem ipsum" text)
  4. Query the dataset and try to display the results in a paginated way:
$items = Item::search($request-input('query', ''))
    -paginate(10);

Expected Behavior

I expect the search to succeed, even when there are more then 250 hits, because that's why I use pagination on my frontend.

Actual Behavior

Following exception is thrown:

Typesense\\Exceptions\\ObjectUnprocessable(code: 0): Only upto 250 hits can be fetched per page
@karakhanyans
Copy link
Contributor

@razvaniacob working on it. 👌

@driesvints
Copy link
Member

@jasonbosco do you maybe know more about this?

@jasonbosco
Copy link
Contributor

jasonbosco commented Feb 21, 2024

@driesvints Typesense has a max of 250 hits per page, and beyond that we'd have to use the page parameter to fetch additional pages. It looks like we're not managing this limit automatically within the Scout driver's pagination mechanism.

@karakhanyans is looking into this.

@karakhanyans
Copy link
Contributor

Hi @razvaniacob

I did setup a fresh project and could not reproduce the issue you are having.

  1. I have setup typesense driver
  2. Added Searchable to User model
  3. Configured schema
  4. Imported 10K users with factories
  5. And did search with pagination ( User::search('m')->paginate(10);

The results worked fine.

Here is the repo where I did all that. The results are returned with /users endpoint.

https://github.com/karakhanyans/laravel-scout-typesense

Could you please fork the repo, and add the steps that you did and push them so I can reproduce error?

Thanks.

@razvaniacob
Copy link
Author

Thanks @karakhanyans,

I've tested with your code and it works.

So then I started experimenting with what I have so when I do something like this:

return ImportedProperty::search('pipera')->paginate(10)->onEachSide(1)->withQueryString()
    ->through(fn ($obj) => [
        'name' => $obj->source_id,
    ]);

It works, but if I do it like this:

return ImportedProperty::search('pipera')
    ->query(fn (Builder $query) => $query->with(['imported_district', 'neighbourhood']))
    ->paginate(10)->onEachSide(1)->withQueryString()
    ->through(fn ($obj) => [
        'name' => $obj->source_id,
        'district' => $obj->imported_district?->name ?? ($obj->neighbourhood?->name ?? ''),
    ]);

It fails with the error

Screenshot 2024-02-27 at 4 15 29 PM

Maybe it has something to do with the
->query(fn (Builder $query) => $query->with(['imported_district', 'neighbourhood']))
line?

Any thoughts?

@karakhanyans
Copy link
Contributor

@razvaniacob have you tried this line ->query(fn (Builder $query) => $query->with(['imported_district', 'neighbourhood'])) like this $query->with(['imported_district', 'neighbourhood']), without putting it inside query, as it's just a ->with.

@razvaniacob
Copy link
Author

razvaniacob commented Feb 28, 2024

I followed the documentation found here Laravel Scout Documentation

If I do just ->with... I get this error
Method Laravel\Scout\Builder::with does not exist.

@thannaske
Copy link

This is an issue I also encountered, see: typesense/laravel-scout-typesense-driver#86

@errand
Copy link

errand commented Mar 25, 2024

Hey guys,
im getting the same issue when using queyr Builder

$posts = Post::search($text)
                ->query(fn (Builder $query) => $query->with(['images:id,name,path,extension']))

Removing it fixed the issue

Copy link

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@davidstoker
Copy link

davidstoker commented Mar 27, 2024

I just ran into this as well when using paginate() with query(). Looks to me like the root cause is how getTotalCount skips using the engine's result if a queryCallback exists. That causes it to call take with the $totalCount result if there is no limit. That ends up calling Typesense with a per_page value of $totalCount triggering the error.

scout/src/Builder.php

Lines 480 to 509 in 6e5b47d

/**
* Get the total number of results from the Scout engine, or fallback to query builder.
*
* @param mixed $results
* @return int
*/
protected function getTotalCount($results)
{
$engine = $this->engine();
$totalCount = $engine->getTotalCount($results);
if (is_null($this->queryCallback)) {
return $totalCount;
}
$ids = $engine->mapIdsFrom($results, $this->model->getScoutKeyName())->all();
if (count($ids) < $totalCount) {
$ids = $engine->keys(tap(clone $this, function ($builder) use ($totalCount) {
$builder->take(
is_null($this->limit) ? $totalCount : min($this->limit, $totalCount)
);
}))->all();
}
return $this->model->queryScoutModelsByIds(
$this, $ids
)->toBase()->getCountForPagination();
}

As a workaround, because of the is_null($this->limit) ? $totalCount : min($this->limit, $totalCount) check, setting a limit in parallel to the paginate call results in a per_page value that's controlled.

This should workaround it for example and not affect pagination since it doesn't look at the $limit value.

$items = Item::search($request-input('query', ''))
    ->take(10)
    ->paginate(10);

It's not clear to me why the existence of the queryCallback should ignore the $totalCount returned by the engine? Why should it force falling back to the query builder for count if the query callback's purpose is to be "invoked after the relevant models have already been retrieved from your application's search engine" as described in docs.

@karakhanyans
Copy link
Contributor

@davidstoker thanks for your input, this helped a lot with solving this.
@razvaniacob

cc: @driesvints @jasonbosco

#817

@driesvints driesvints linked a pull request Apr 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants