Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support better search engines for indexing (elasticsearch, mysql/pgsql/sqlite fulltext or something else) #6648

Closed
1 task done
vitalif opened this issue Apr 16, 2019 · 8 comments
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@vitalif
Copy link
Contributor

vitalif commented Apr 16, 2019

  • Gitea 1.7.6
  • Git 2.11
  • Operating system: Linux
  • Database: all
  • Can you reproduce the bug at https://try.gitea.io:
    • Not relevant

Description

Bleve indexer is very inefficient: it uses a lot of disk space and a lot of memory. Also it keeps all index mmap'ed all the time which makes Gitea crash when I enable it on my 32-bit server with just 1.4gb of git repositories after generating 2GB of index.

There exist a lot of more popular and efficient full-text search engines, starting with ones built into Postgres / MySQL / SQLite (MySQL's one is not the most efficient one, but it still works). Then there's Elasticsearch and so on. Index sizes are much smaller in Elasticsearch and Postgres (compared to size of indexed data).

@lunny lunny added the type/proposal The new feature has not been accepted yet but needs to be discussed first. label Apr 16, 2019
@lunny
Copy link
Member

lunny commented Apr 16, 2019

We are refactoring issue indexer, after that, we will start to refactor code indexer. Some PRs you can find, i.e. #6150

@adamcavendish
Copy link

Hi I've also seen in the configuration files that there're two types of ISSUE_INDEXER_TYPE available. What is the differences between "db" and "bleve"? Is it safe to change from "bleve" into "db" in config files and then a simple restart?

@lunny
Copy link
Member

lunny commented May 6, 2019

@adamcavendish db will use database's Like to search issues. Your operations are safe. But both types are inefficient.

@alexanderadam
Copy link

alexanderadam commented Jul 3, 2019

A proper search support could fix things like #5694 #5277 #3448, #2967, #2434, #8366, #8386, #7825, #10147 and #10764 if implemented properly. Those might not be the "same" but their cause is [probably] the current indexing/searching implementation.

And I guess it would also help to lower the amount of memory related bugs (i.e. #4807).

@jeffliu27
Copy link

Would love to help with the implementation for the elasticsearch code search backend! @lunny @jeblair

@rcarmo
Copy link

rcarmo commented May 24, 2020

Just a note that SQLite has FTS indexing, and that it is quite efficient (I have gigabytes of plain text files indexed that way)

@rcarmo
Copy link

rcarmo commented Sep 11, 2020

So no built-in, single binary SQLite search?

@lunny
Copy link
Member

lunny commented Sep 11, 2020

@rcarmo issues index support built-in SQLite search, code index support built-in bleve search.

@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
Development

No branches or pull requests

6 participants