-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cs_hash_dup_count is too slow #13
Comments
Sorry, which SQL query you are executing? |
the box has 128GB memory and 24 core,and I check again,there is no swapping. this is my test result: postgres=# select samp3_load(); samp3_load
(1 row) Time: 12031.358 ms
("int8:{1,1,1,1,9,1,2,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,2,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,3,1,1,7,1 Time: 1047903.148 ms count297208 Time: 118905.357 ms samp3 ddl and data is here: |
Once again: thank for your help. It was really a bug in imcs_dup_hash_initialize - one of hash tables was not correctly extended. After fixing this bug execution of this query takes at my system about 1.5 sec (and before it was 26921 seconds!) |
for 5 million rows,the PG takes 1min,while takes more than 20min.this is part of the whole data:https://github.com/amutu/data/blob/master/samp2.
I think the slowness is data set relative,the group key is too selective,so may be calc the hash value or solve the hash conflict take the most time?
I see PG sql plan,it use sort.So should we make imcs make ref the group by column statistics and then select sort or hash as the plan?
The text was updated successfully, but these errors were encountered: