Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partial return of results #13

Open
ageorgou opened this issue Jul 16, 2018 · 3 comments
Open

Support partial return of results #13

ageorgou opened this issue Jul 16, 2018 · 3 comments
Assignees
Labels
api Accessing the search and its results

Comments

@ageorgou
Copy link
Contributor

We need to be able to retrieve results lazily, i.e. a "page" at a time, to avoid long loading times. See also oracc/oracc-search-front-end#11.

ElasticSearch offers several options for getting paged results:

  • The from and size fields; this has a limit (by default 10000, which we may exceed) past which it becomes disallowed, or at least very inefficient.
  • Scrolling; this is not meant to be used for real-time requests, but rather for processing large amounts of data on the back-end.
  • The search_after option seems like the best solution. It requires keeping track of the last result (glossary entry), but does not look too complex to implement. Performance is not clear, as it seems that the search is repeated each time, but that might also be true of the other options.

Regardless of the choice, we will also need to extend the search endpoints to accept a field on which to sort.

@ageorgou
Copy link
Contributor Author

Note that with newer (5+) versions of ElasticSearch have changed how text fields are indexed. Sorting on the field itself is not supported, but it is possible to search on e.g. cf.keyword. This is an automatically created field, so no change to the mapping should be required. See here or here, for example.

(It may be possible to do this with a single field by enabling fielddata during indexing, but that seems both more complex and not recommended, so not pursuing this for now)

@ageorgou ageorgou mentioned this issue Jul 16, 2018
6 tasks
@ageorgou
Copy link
Contributor Author

ageorgou commented Jul 17, 2018

Note for future reference: getting all results of a search (with scan) invalidates the sorting, as mentioned in the docs. This means that, if we're sorting, we should only try to retrieve paginated results, or work around this some other way.

EDIT: It seems there is a way of doing this by using the preserve_order parameter as explained here. However, this may cause loss of efficiency (source).

@ageorgou ageorgou added the api Accessing the search and its results label May 17, 2019
@ageorgou
Copy link
Contributor Author

ageorgou commented Jul 30, 2019

Dealt with in #14, but #15 will improve this, so leaving open until that is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Accessing the search and its results
Projects
None yet
Development

No branches or pull requests

1 participant