New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not seeing huge performance gain - could be my query? #23
Comments
I think it is because of your prepare code_docs_fl_fts (tsquery) as
SELECT id, node_id,
ts_headline('mcc_natural_config', title, $1, 'highlightall=true') title,
ts_headline('mcc_natural_config', content, $1, 'maxfragments=1') fragment,
ancestors, product_id, product_name, client_name, state_abbr,
0 AS full_count, --window function for getting full count here but removed it for simplicity
tsv_natural_rum <=> $1 AS rank
FROM code_docs_fl
WHERE tsv_natural_rum @@ $1
ORDER BY tsv_natural_rum <=> $1
LIMIT 10 OFFSET 0;
explain analyze execute code_docs_fl_fts(ts_rewrite(plainto_tsquery('mcc_natural_config', 'code'),
'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'', ''code'') @> target')); |
YESSSSSSSSSSSSSSSSSSSSSSSSSS
Execution time went from 100 seconds to 1 second.Thank you!!!!!! I was about to switch to Solr for full-text search because some queries would take over 8 minutes. Just for completeness, I tried using a prepared statement with ts_rank / gin index column, and still got over 100 seconds. Hitting index only is awesome! Do you mind briefly explaining why having the query in FROM clause was so bad? |
On Tue, Aug 15, 2017 at 4:33 PM, Philip Holly ***@***.***> wrote:
YESSSSSSSSSSSSSSSSSSSSSSSSSS
Limit (cost=20.00..52.87 rows=10 width=435) (actual time=388.606..981.048 rows=10 loops=1)
-> Index Scan using code_docs_fl_tsv_natural_rum_idx on code_docs_fl (cost=20.00..528655.06 rows=160825 width=435) (actual time=388.599..980.999 rows=10 loops=1)
Index Cond: (tsv_natural_rum @@ '''code'''::tsquery)
Order By: (tsv_natural_rum <=> '''code'''::tsquery)
Execution time: 990.911 ms
Execution time went from 100 seconds to 1 second.
Thank you!!!!!! I was about to switch to Solr for full-text search because
some queries would take over 8 minutes.
Just for completeness, I tried using a prepared statement with ts_rank /
gin index column, and still got over 100 seconds. Hitting index only is
awesome!
Do you mind briefly explaining why having the query in FROM clause was so
bad?
Because of unneeded join !
http://obartunov.livejournal.com/189806.html
… —
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#23 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGFI4rbtOO8kjKEKWOzZHC1k6tIuW2Rpks5sYZ42gaJpZM4O2ziw>
.
|
Excellent!
As Oleg said, when you put a
|
Interestingly, prepared statement with ts_rank / gin index takes longer than when the query is in the FROM clause. Looks like gin index isn't hit.
Thanks again. I'm still waiting for trial and licensing info for Postgres Pro Enterprise. When I hear back I'd like to try it out. |
For
I checked it and our salesman sent to you a mail about Windows Postgres Pro Enterprise yesterday. Did you receive it? |
Thanks for checking. I found it. It went into my quarantine folder for some reason. |
It is good! |
Postgresql 9.6.4
Windows Server 2016 with 32GB ram / SSD
160K hits, Rum index query time: 80+ seconds
When querying a table partition with 500K rows for a search term that returns 160K hits I am not seeing a large performance increase using
rum tsvector <=>
vsgin ts_rank
. tsvector column includes weights. Each document is pretty large, similar to Wikipedia. They are all legal documents.Judging by explain (analyze, buffers) output, looks like majority of time is reading and not sorting. Could this be why I am not seeing performance gain? Also, output shows there's still a bitmap heap scan performed when searching the RUM index.
I clear shared buffers and restart Postgresql (thanks https://stackoverflow.com/a/43186594/812610) so there's nothing in memory.
Appreciate any help and thanks for putting RUM on github!
RUM tsvector column:
tsv_natural_rum
Index:
CREATE INDEX code_docs_fl_tsv_natural_rum_idx on code_docs_fl using rum (tsv_natural_rum rum_tsvector_ops)
Query:
Result:
Gin TS_Rank_CD tsvector column:
tsv_natural
(exact same contents astsv_natural_rum
)Index:
gin
index on columntsv_natural
Query:
Result:
The text was updated successfully, but these errors were encountered: