Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete issue search results in repository with many issues #24662

Closed
brechtvl opened this issue May 11, 2023 · 1 comment · Fixed by #26012
Closed

Incomplete issue search results in repository with many issues #24662

brechtvl opened this issue May 11, 2023 · 1 comment · Fixed by #26012
Labels
issue/critical This issue should be fixed ASAP. If it is a PR, the PR should be merged ASAP type/bug

Comments

@brechtvl
Copy link
Contributor

Description

There multiple ways to reproduce this, but one way:

  • Create 50 issues with same title
  • Create 1 pull request with the same title as the issues
  • Searching for the title will return either 0 results in pull requests search, or only 49 results in issue search

Another way:

  • Create 60 issues with same title
  • Apply one label to 30 of them, and another label to the other 30
  • Search for the title, filter by one of the labels, and it will return less than 30 results

The reason behind this is that that indexers will index the title, contents and comments and return up to 50 search results based on that. Filtering by issue or PR, open or closed, labels, author, .. happens afterwards. Note that pagination as in #22704 does not solve this problem.

The solution could be to make all indexers filter by and index all these issue fields. That would require adding quite a bit of code to all indexers though, every filtering option would need to be implemented in every indexer.

Alternative solutions with worse performance would be to get an unlimited number of results from the indexers, or compute a list of filter matching issue IDs to give to the indexers.

Gitea Version

60e7963 (main)

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

Own build, using meilisearch. But it should happen anywhere, with any indexer.

Database

None

@lunny
Copy link
Member

lunny commented May 12, 2023

I think this is an indexer design problem. In the indexer, only content, repo_id, issue_id have been stored. So when starting a search with keyword, labels and other conditions, it will have two steps. First, search the keyword in indexer to get issue ids, and then group issue ids and other conditions and pagination to search in database.

So I think maybe we need to store almost all content to indexer to resolve the problem.

@lunny lunny added the issue/critical This issue should be fixed ASAP. If it is a PR, the PR should be merged ASAP label May 13, 2023
techknowlogick pushed a commit that referenced this issue Jul 31, 2023
…ng and paging (#26012)

Fix #24662.

Replace #24822 and #25708 (although it has been merged)


## Background

In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.

To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.

## Major changes

- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.

---------

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/critical This issue should be fixed ASAP. If it is a PR, the PR should be merged ASAP type/bug
Projects
None yet
2 participants