Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Keeping or removing the check on index existence in list_splits method #3760

Open
fmassot opened this issue Aug 17, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@fmassot
Copy link
Contributor

fmassot commented Aug 17, 2023

Currently, in list_splits, we return filtered splits notably with a filter on index UIDs.

In cases where an index does not exist, we have a different behavior between the file-backed metastore and the PostgreSQL metastore.

  • file-backed metastore: as soon as an index is not found, it returns an error IndexesDoNotExist.
  • PostgreSQL metastore: we check only indexes' existence if no splits are returned. See the convoluted code.

(note that the PR on the multi-indexes #3734 introduced this discrepancy).

Also, with the multi indexes features, querying splits while a user deletes some indexes will more likely happen (scenario such as "a user makes a search query on a list of indexes, an admin user deletes one of the indexes, the user gets an error").

One pro for returning an error when an index does not exist is that the caller gets the right error immediately. This is useful in Quickwit's code base as we can react to this error and stop getting splits from the metastore for a given index.
This is why I'm currently reluctant to drop the existence check.

@fmassot fmassot added the enhancement New feature or request label Aug 17, 2023
@guilload
Copy link
Member

A left outer join can make the code less complex and more accurate. I hope the performance penalty will be negligible with indexes on the index_uid columns in the indexes and splits tables.

@fmassot
Copy link
Contributor Author

fmassot commented Nov 10, 2023

In #4109, I replaced the list splits method by a stream splits method.

So now I see two options:

  1. strict mode: check index existence before the select (I think it should be possible to avoid 2 calls to the db)
  2. lenient mode: no check on the index existence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants