Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow degradation of index #1207

Open
alesji opened this issue May 29, 2024 · 6 comments
Open

Slow degradation of index #1207

alesji opened this issue May 29, 2024 · 6 comments
Labels
bug Something isn't working pg_search Issue related to `pg_search/` priority-1-high High priority issue user-request This issue was directly requested by a user

Comments

@alesji
Copy link

alesji commented May 29, 2024

Bug Description
I am creating fulltext index over recipes table on recipe name.
That table is updated frequently, inserted into it frequently too...

CALL paradedb.create_bm25(
        index_name => 'ngram_recipes_name_search',
        schema_name => 'public',
        table_name => 'recipes',
        key_field => 'id',
        text_fields => '{name_search: {tokenizer: {type: "ngram", min_gram: 3, max_gram: 3, prefix_only: false}}}'
);

After few thousands of updates/inserts the search index starts to be incredibly slow to insert incredibly slow to search in (from few ms when fresh to seconds when really let unchecked for a long time)

when i run the create_bm25 function again, it recreates the index and is running fast again... I have no idea what is happening, but this is basically unusable. I tried to do it on fresh docker install (latest version) with simple table and it does it too...
Not sure how can i help, but if i can I will try. I love the speed when the index is fresh, but the degradation of the index is seriously make it unusable for any dynamic data

@neilyio neilyio added bug Something isn't working priority-1-high High priority issue pg_search Issue related to `pg_search/` labels May 29, 2024
@neilyio
Copy link
Contributor

neilyio commented May 29, 2024

Thanks for reporting this. To make sure I understand, you're seeing this slowdown after a few thousand writes to the index.

Could you clarify that means a few thousand separate transactions, each with around one row being updated?

@alesji
Copy link
Author

alesji commented May 29, 2024

Thanks for reporting this. To make sure I understand, you're seeing this slowdown after a few thousand writes to the index.

Could you clarify that means a few thousand separate transactions, each with around one row being updated?

exactly, it is thousands of separate updates and inserts into the indexed column, very often those updates are with the exactly same data (its just how the SW works, it updates columns even if they didnt change).

I let it run for last few hours just now... doing updates on data without really changing the data and it still got the slow down - so maybe that might be some clue? not sure...

@philippemnoel philippemnoel added the user-request This issue was directly requested by a user label Jun 7, 2024
@alesji
Copy link
Author

alesji commented Jun 8, 2024

Hello, are you able to replicate this bug or should i prepare exact step by step to replicate it? I might have some time this weekend to do it if needed

@philippemnoel
Copy link
Collaborator

Hello, are you able to replicate this bug or should i prepare exact step by step to replicate it? I might have some time this weekend to do it if needed

If you have exact steps to replicate it would be extremely helpful. We haven't managed to replicate just yet

@alesji
Copy link
Author

alesji commented Jun 9, 2024

I just tried to do vaccuum full, when the degradation occured and it completely fixed the issue... so I am not sure if it is a bug or just feature, it happens quite fast (only one thousand inserts/updates) and the degradation is quite big (like 10x slower queries)... So I guess I will still prepare the step by step so you can see for yourself, never happened to me before with any other indexes that the slow down would be so fast and so big, so maybe it should be checked anyway.

@rebasedming rebasedming self-assigned this Jun 12, 2024
@rebasedming
Copy link
Collaborator

I've replicated this bug. Using the test table and index in the quickstart, if I do

insert into mock_items (description, rating, category) values ('test', 3, 'Electronics');
\watch 0.001

After 1000 rows have been inserted, the insert time seems to have crept up by about 100ms.

I will prioritize fixing this asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pg_search Issue related to `pg_search/` priority-1-high High priority issue user-request This issue was directly requested by a user
Projects
None yet
Development

No branches or pull requests

4 participants