Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results window is too large - Kaminari #577

Open
thorlando opened this issue Jun 7, 2016 · 13 comments
Open

Results window is too large - Kaminari #577

thorlando opened this issue Jun 7, 2016 · 13 comments
Labels

Comments

@thorlando
Copy link

When using this with Kaminari versus using from + size, the following error occurs when navigating to the last page in the paginated results:

ActionView::Template::Error ([500] {"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11880]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"users","node":"bB32lITgax0OBbQtim3hFbI","reason":{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11880]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."}}]},"status":500}):

I'd limit my results to 10,000 if I knew how, but I can't seem to figure it out (similar to how in MySQL you could say SELECT * FROM users LIMIT 10000). Is this an issue with Kaminari?

@Ashviniv
Copy link

Ashviniv commented Jun 30, 2016

@realmadrid2727 I was also getting the same error. I fixed it by setting index.max_result_window: 100000 in the elasticsearch configuration file i.e elasticsearch.yml . Basically we need to set it to max possible count of our result set.
And this is not a kaminari issue. It is one of the configurations of elasticsearch. You can read more about it on https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_21_search_changes.html

@chikamichi
Copy link

chikamichi commented Oct 13, 2016

Setting index.max_result_window to 10000, which is ES' default value btw, might not be considered a proper fix. The failure occurs because the dataset has more than 10000 records and the user is trying to access a page beyond that limit, breaking the constraint that from + size cannot be more than the index.max_result_window.

To make the deep pagination work "properly", one can either:

  • increase max_result_window — which is limited by memory/cpu consumption & cluster size
  • use searchAfter — but it may not be trivial to implement in one's code, for it basically works like the scroll API: one needs to specify an ID for offsetting the search, and finding that ID in a reliable way requires parsing the pages, starting on the first one

@michniewicz
Copy link

Can you please give us the status on this issue? Since increasing the max_result_window is a bad idea, what is a better idea to deal with large data sets (ex. pagination with scroll api)?

@c-moyer
Copy link

c-moyer commented Aug 30, 2017

Same problem

@NickCraver
Copy link

Is there any recommendation here on how to handle large page numbers in large result sets? We're now hitting this on Stack Overflow as well.

@spodlecki
Copy link

bump on this... I'd love to hear what others are doing... it seems like searchAfter might be the easiest option, but it really feels silly to be having to pass through the last known ID on a search when that isn't pagination at all. Keeping a session in search just isn't an option.

@karmi
Copy link
Contributor

karmi commented May 15, 2018

Argh, this went under my radar, sorry for the silence, everybody.

The search_after is indeed the right way to work around the "result window too large" limit, because it is something which can be optimized on the Elasticsearch side, as opposed to just increasing the limit, which would potentially overload the server. (That would be a problem for any database, I imagine.)

As a note, the Scroll API is not meant to be used for "regular searching", especially in a high concurrency scenarios (many people searching at the same time), since it keeps an "open window" into the index, and that can again break. The Scroll API is great for something like "export this dataset" feature, which is run only now and then, either by people or by a cronjob.

Supporting pagination over a very large dataset probably needs a bit of re-designing and re-implementing, going in a different direction than the regular from/size pattern. As I've mentioned on couple of other issues, we have scheduled an effort to re-focus on the Rails integration in the upcoming months, so we'll try to have a look at this particular problem as well!

@spodlecki
Copy link

To follow up here, provided business end of things with the different options and after a bit of discussion, we ended up just upping the limit to an acceptable level (~60k) to cover most of the most viewed content. At the eod, we're still using from/size -- I'm hoping there is no plans to really remove this entirely?? It'd seem pretty silly in the sense of actual pagination to get rid of something like this

@unavailabl3
Copy link

unavailabl3 commented Apr 1, 2019

products = []
after = 0
loop do
  query = {
    size: query_size,
    query: {
      multi_match: {
        query: "black",
        fields: [ "description", "title", "information", "params" ]
      }
    },
    search_after: [after],
    sort: [ {id: "asc"} ]
  }
  results =  Product.search(query).records
  results_amount = results.size
  products += results
  after += 10000

  break if results_amount == 0
end

@stale
Copy link

stale bot commented Aug 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 31, 2020
@stale stale bot closed this as completed Sep 7, 2020
@montdidier
Copy link

This really isn't stale.

@philsmy
Copy link

philsmy commented Apr 9, 2021

Is it really possible that 5 years on there still isn't a good way to do front-end driven pagination of large result sets?

We want to let users scroll through a month's worth of orders. Sometimes that is 100,000+ orders. What are we supposed to do? We use datatables on the front end.

@lulessa
Copy link

lulessa commented Apr 10, 2021

Is it really possible that 5 years on there still isn't a good way to do front-end driven pagination of large result sets?

We want to let users scroll through a month's worth of orders. Sometimes that is 100,000+ orders. What are we supposed to do? We use datatables on the front end.

@philsmy Limit/offset pagination is not a scalable pattern for a distributed system like elasticsearch.

Use cursor-based pagination with search_after. See an example (loop) in #577 (comment) for a pattern. In pagination, the cursor for the next page is the sort value of the last doc on the current page. To detect if there is a next page, make size the desired number of docs per page + 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests