-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research improving postgresql counts performance #3586
Comments
helpful links: A note about count(1) vs count(). One might think that count(1) would be faster because count() appears to consult the data for a whole row. However the opposite is true. The star symbol is meaningless here, unlike its use in SELECT . PostgreSQL parses The expression count() as a special case taking no arguments. (Historically the expression ought to have been defined as count().) On the other hand count(1) takes an argument and PostgreSQL has to check at every row to see that ts argument, 1, is indeed still not NULL. |
If the user searched for CMTE_ID = 'C00401224', then the cmte_id index will not be used since ActBlue represent more than 56% of the data in sched_a_2017_2018 (and one can not force postgresql planner to use an index as we do in Oracle). cmte_id index is used when I tried to query all the other cmte_id (for example, C00000935 or C00003418). Same case for entity_tp = 'IND', the index will not be used since >92% of the sched_a_2017_2018 data are 'IND'. But if one choose entity_tp = 'PTY', the index is used and the return is much faster. If user majorly query by contributor name, we might be ok. Since I do not foresee any contributor name will be so screwed in percentage like 'C00401224' or 'IND'. Bigger database power still needed. When I tried 'C00000935' with the biggest 5 cycles in stg-replica1, it still took 3 minutes to return, even index is indeed used. I tried the same query in prd, and have to kill the query after 10 minutes. |
Example: -- index not used in case of 'C00401224', but 'C00000935' will use this index -- index not used in case of 'IND', but 'PTY' will use index -- index used -- index used |
Currently we use exact same query to get the no. of records in the results set. In other words we are running the exact same query twice. If we did not count using the same exact query then we encounter the same slowness issue when we run the same query to get the details. We cannot simply remove the counts. This is a very useful thing for the user. Currently we have set a threshold of 500,000 rows. If the counts are over that limit we do an estimate, other wise we do an actual record count. we will open up a another issue to deal with slow queries by monitoring the logs to find out which queries that users are having difficulties, and deal with them daily basis. |
Closing this issue in favor of #3641 |
row counts are much faster after AWS RDS server upgrade
After
|
What we're after:
Research has shown that postgresql counts could be dragging down performance. By researching ways to improve postgresql counts performance and speed, we may find a viable path to remove the 2 year restriction from our website.
Observations:
Completion criteria:
The text was updated successfully, but these errors were encountered: