Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

call partman.partition_data_proc during partman.run_maintenance #655

Closed
hvanderland opened this issue Apr 19, 2024 · 7 comments
Closed

call partman.partition_data_proc during partman.run_maintenance #655

hvanderland opened this issue Apr 19, 2024 · 7 comments

Comments

@hvanderland
Copy link

This is a very useful set of procedures.
One problem we have is that the run_maintenance task fails when there is data in the default partition for the new parrtition.

Could the partman.run_maintenance not call partman.partition_data_proc instead of generating an P001 error ?

The code could check partman.check_default if there is data in the default partition which needs to be moved for the new partition being added. Then call the partman.partition_data_proc and if need the partman.partition_gap_fill to generate the missing partitions.

After that run the normal maintenatance.

@keithf4
Copy link
Collaborator

keithf4 commented Apr 19, 2024

The problem with this is often when the default gets data, it can be A LOT of data. If that's the case, it could cause an extremely expensive write operation to kick off during normal maintenance.

I instead recommend setting up whatever monitoring application you have in your environment to run check_default() and alert if it detects any data. That way if default data does appear it can be handled appropriately for the situation.

@hvanderland
Copy link
Author

Thank you for the quick response.

Understand, yes was planning to do these calls in our own scheduler.
I was thinking on adding a flag to the run maintenance to enable this, there it should be an exception scenario, but understand the reasoning not to do this.

@keithf4
Copy link
Collaborator

keithf4 commented Apr 19, 2024

Yeah I've thought of doing this as well with a flag. But then I've given people a flag to something that could potentially be very disruptive. I'd rather provide the means to monitor for it and have users need to go out of their way to fix this situation properly.

The other thing is that if you're frequently seeing data go into the default that you feel this needs to be automated, I'd likely say there's other problems that need to be fixed:

  1. If the data is just slightly out of the normal time window, you may just need to adjust the premake value to make sure the necessary tables are there. There's nothing wrong with a premake of 20, 30 or more since partman is only ever usually making one child table per partition set during maintenance.
  2. If the data is frequently far out of the normal range, it may be a bug that needs to be fixed to avoid that situation.
  3. If it's not a bug, it's pretty far out of the normal scenario for range partitioning, IMO. In that case you'd be better off writing your own procedure to monitor the default and take appropriate actions depending on how much data is there.

@hvanderland
Copy link
Author

Yes, but creating high number of future partitions has impact on performance. Queries like 'give me all the rows newer then yesterday' will scan over all these partitions. For our implementations the number of rows ending up in the default will be an exception and as you correctly remark should be low volume else there is a design issue.

Thank you for the clarification

@keithf4
Copy link
Collaborator

keithf4 commented Apr 19, 2024

But those tables are empty for the most part, and as of PG12+ that performance impact of having a higher number of partitions (1000+) is negligible until you start getting into REALLY high numbers. And I'd say if your partition numbers are getting that high, you may want to re-evaluate your partitioning interval and seriously consider retention options to remove unneeded data from that partition set.

I'd encourage you to test and see what that performance impact is. If it's not negligible, I'd write up the scenario and share it on the developer mailing lists so they can see what the problem is.

@hvanderland
Copy link
Author

Adding more partitions increases the parsing time. We prevent using generic plans for these tables to allow partition pruning.

Using the partman code this can simply be tested :
update partman.part_config set premake = 100 where parent_table = 'xxx.time_taptest_table';
select partman.run_maintenance('xxx.time_taptest_table',true);
explain analyze select * from xxx.time_taptest_table where col3 > current_date;
explain analyze select * from xxx.time_taptest_table where col3 between current_date and current_date + interval '1 day';

It shows that the parsing time goes up if we increase the number of partitions.
Postgres goes to every partition to check if the data falls in it's range.
Yes this is milliseconds, but on high volume this makes a difference.

@keithf4
Copy link
Collaborator

keithf4 commented Apr 19, 2024

If that millisecond difference is demonstratively affecting your application, I can certainly understand that. Most cases I've seen myself that difference didn't really matter vs the overhead of having to deal with the cleanup of the default table.

One thing I may suggest if you're really getting down to that level of performance being important is to take advantage of pg_partman's predictable naming pattern and query the child tables directly. If you know the time condition you're asking for at the application level, dynamically generate the query there there to write the exact child tables you're targeting. That completely bypasses all partition pruning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants