Replies: 6 comments 27 replies
-
As you noted, thats not normal, so it's very hard for anyone here to tell you why this is happening but fundamentally it seems an issue with your database (installation) perhaps? |
Beta Was this translation helpful? Give feedback.
-
|
Same problem here. Advanced search: customer AND *Product takes 1:30min. 10000 documents 8 Core 16GB RAM SSD, Postgresql. load under 0.5 @ke-ma-fi your tweaks change nothing for me Unfortunately, I can't use paperless like this. It's a shame because it's otherwise great software. I currently use Devonthink but I would like to switch to paperless. I hope for a high-performance searchengine |
Beta Was this translation helpful? Give feedback.
-
|
I did a bit more testing. Specifically, I wanted to run the same query from the “Advanced Search” on the command line using a PostgreSQL query. To do this, I set up the pg_trgm extension and then created the corresponding index:
After that, I executed the same query as in Paperless and, I got the result in under a second. In Paperless it takes almost a minute, as you can see in the screenrecordings. Why does it take so long in Paperless, and is there any way to change that? paperless.movcli.mov |
Beta Was this translation helpful? Give feedback.
-
|
Hi, My comment will be about title & content search, not the advanced search with Whoosh. Indeed, the title & content search uses the Django ORM. Plus, the search cannot be indexed as it uses a Here is an attempt, working with the three backends (PostgreSQL, MariaDB, and SQLite). I didn't implement the "AND" and "OR" filters from the Whoosh search, but it might be possible: https://github.com/Merinorus/paperless-ngx/tree/feature-faster-title-content-search The Django ORM's advanced features (full-text search, etc.) are almost exclusively available to PostgreSQL. In Paperless-ngx, we have to support multiple database backends. This results in almost plain SQL. It may be error-prone in the future because it is not correlated anymore with the ORM model, but sometimes the ORM is not enough to deliver proper performance. class TitleContentFilter(Filter):
def filter(self, qs, value):
if not value:
return qs
tokens = split_tokens(value)
limit = 1000
fulltext_tokens = [t for t in tokens if len(t) >= FULLTEXT_MINIMAL_TOKEN_LENGTH]
vendor = connection.vendor
if vendor == "postgresql" and tokens:
# PostgreSQL fulltext search benefits from tokens less than 3 characters long
ft_search_exp = " & ".join(tokens) + ":*"
id_query = RawSQL(
"SELECT id FROM documents_document WHERE to_tsvector('simple', title) @@ to_tsquery('simple', %s) OR to_tsvector('simple', content) @@ to_tsquery('simple', %s) limit %s",
params=(ft_search_exp, ft_search_exp, limit),
)
return qs.filter(id__in=(id_query))
elif vendor in {"mysql", "mariadb"} and fulltext_tokens:
# MariaDB needs at least 3 characters to be able to use the full-text search
ft_search_exp = "+" + " +".join(fulltext_tokens) + "*"
like_search_exp = "%" + "_%".join(tokens) + "%"
id_query = RawSQL(
"SELECT id FROM documents_document WHERE (MATCH(title) AGAINST(%s IN BOOLEAN MODE) OR MATCH(content) AGAINST(%s IN BOOLEAN MODE)) AND (title LIKE %s OR content LIKE %s) limit %s",
params=(
ft_search_exp,
ft_search_exp,
like_search_exp,
like_search_exp,
limit,
),
)
return qs.filter(id__in=(id_query))
elif vendor == "sqlite" and fulltext_tokens:
ft_search_exp = " ".join(fulltext_tokens) + "*"
like_search_exp = "%" + "_%".join(tokens) + "%"
if len(fulltext_tokens) < len(tokens):
id_query = (
Document.objects.filter(
id__in=(
RawSQL(
"SELECT rowid FROM documents_document_fts WHERE title MATCH %s AND title LIKE %s OR content MATCH %s AND (length(content) > 100000 OR content LIKE %s) LIMIT %s",
params=(
ft_search_exp,
like_search_exp,
ft_search_exp,
like_search_exp,
limit,
),
)
),
)
.order_by()
.values_list("id", flat=True)
)
else:
# Full-text clause only seems much more performant than being combined with the LIKE %A% clause in SQLite,
# so we use this when possible
id_query = (
Document.objects.filter(
id__in=(
RawSQL(
"SELECT rowid FROM documents_document_fts WHERE title MATCH %s OR content MATCH %s LIMIT %s",
params=(ft_search_exp, ft_search_exp, limit),
)
),
)
.order_by()
.values_list("id", flat=True)
)
ids = [id for id in id_query.all()]
return qs.filter(id__in=ids)
else:
# Fallback to non-indexed legacy search method if no proper solution is available
return qs.filter(Q(title__icontains=value) | Q(content__icontains=value)) |
Beta Was this translation helpful? Give feedback.
-
|
This discussion has been automatically closed due to inactivity. Please see our contributing guidelines for more details. |
Beta Was this translation helpful? Give feedback.
-
|
This discussion has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion for related concerns. See our contributing guidelines for more details. |
Beta Was this translation helpful? Give feedback.






Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
On a dedicated server with 10 cores and 90Go RAM (doing nothing else) the search is really slow: it takes 2 minutes to return the 50 results per page. I have 30 000 documents, 700 tags on Paperless installed via Docker.
When performing the search and/or displaying the documents, the server loads increases a little but nothing more than 0.8 average load. If I refresh the same page, it takes 2 min again.
The global search field works better.
I tried to tune the Mariadb settings in Docker but no luck.
I tried rebooting the containers as well as the server.
If anyone can point me to something, it would be nice, Thanks
EDIT : I enabled the MySQL slowlog. Scary ....
27 seconds for this simple query:
Beta Was this translation helpful? Give feedback.
All reactions