Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long time to create partition tables #604

Closed
joyful88 opened this issue Dec 28, 2023 · 10 comments
Closed

long time to create partition tables #604

joyful88 opened this issue Dec 28, 2023 · 10 comments

Comments

@joyful88
Copy link

Postgresql 15.4 Partman 4.7.0 (SSD)
There is a table with 583 partitions by day. Starting to create the next partitions takes a very long time

2023-12-28 13:09:46 MSK [110811-3] 10.14.242.41(58786) streamer@vkstream CONTEXT: SQL statement "ALTER TABLE public."Post" ATTACH PARTITION public."Post_p2024_01_01" FOR VALUES FROM ('2024-01-0
1 00:00:00+03') TO ('2024-01-02 00:00:00+03')"
PL/pgSQL function create_partition_time(text,timestamp with time zone[],boolean,text) line 249 at EXECUTE
PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 284 at assignment
2023-12-28 13:09:46 MSK [110811-4] 10.14.242.41(58786) streamer@vkstream STATEMENT: SELECT partman.run_maintenance('public.Post');
2023-12-28 13:10:04 MSK [110811-5] 10.14.242.41(58786) streamer@vkstream LOG: process 110811 acquired AccessExclusiveLock on relation 182225334 of database 16622 after 18788.593 ms
2023-12-28 13:10:04 MSK [110811-6] 10.14.242.41(58786) streamer@vkstream CONTEXT: SQL statement "ALTER TABLE public."Post" ATTACH PARTITION public."Post_p2024_01_01" FOR VALUES FROM ('2024-01-0
1 00:00:00+03') TO ('2024-01-02 00:00:00+03')"
PL/pgSQL function create_partition_time(text,timestamp with time zone[],boolean,text) line 249 at EXECUTE
PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 284 at assignment
2023-12-28 13:10:04 MSK [110811-7] 10.14.242.41(58786) streamer@vkstream STATEMENT: SELECT partman.run_maintenance('public.Post');

Name |Value |
--------------------------+-----------------------------+
parent_table |public.Post |
control |timestamp |
partition_type |native |
partition_interval |1 day |
constraint_cols |NULL |
premake |14 |
optimize_trigger |4 |
optimize_constraint |30 |
epoch |none |
inherit_fk |true |
retention | |
retention_schema | |
retention_keep_table |true |
retention_keep_index |true |
infinite_time_partitions |true |
datetime_string |YYYY_MM_DD |
automatic_maintenance |on |
jobmon |true |
sub_partition_set_full |false |
undo_in_progress |false |
trigger_exception_handling|false |
upsert | |
trigger_return_null |true |
template_table |partman.vkstream_post_pattern|
publications |NULL |
inherit_privileges |false |
constraint_valid |true |
subscription_refresh | |
drop_cascade_fk |false |
ignore_default_data |false |

waiting:

Name |Value |
----------------+----------------------------------------------+
pid |110811 |
diff |00:28:01.531604 |
state |active |
query |SELECT partman.run_maintenance('public.Post');|
datid |16622 |
datname |vkstream |
pid |110811 |
leader_pid | |
usesysid |16385 |
usename |streamer |
application_name|psql |
client_addr |10.14.242.41 |
client_hostname | |
client_port |58786 |
backend_start |2023-12-28 13:59:59.471 +0400 |
xact_start |2023-12-28 14:00:16.939 +0400 |
query_start |2023-12-28 14:00:16.939 +0400 |
state_change |2023-12-28 14:00:16.939 +0400 |
wait_event_type |IO |
wait_event |DataFileRead |
state |active |
backend_xid |22577429 |
backend_xmin |22577151 |
query_id |1244527056530741779 |
query |SELECT partman.run_maintenance('public.Post');|
backend_type |client backend |

There is another table with 700+ partitions, but there is no such gap on it. The tables are similar, only the column names have been changed

@keithf4
Copy link
Collaborator

keithf4 commented Dec 28, 2023

The wait event seems to indicate it is an IO bound wait. Do you have any data in the default partition?

@joyful88
Copy link
Author

and so in a table of almost 8 TB in size, the default batches were about 480 GB, I cleared it after I started creating new batches, and they were created successfully.
there is also a problem with the call SELECT * FROM partman.partition_gap_fill('public.Post');
It seems like it should create missing partitions, but I waited for more than an hour and the request was not completed.
It seems that either the problem is in the data, or in its massiveness (there are a lot of them)

@keithf4
Copy link
Collaborator

keithf4 commented Dec 29, 2023

Just to clarify, there was data in the default?
If so, have you cleared out all of the data in the default table?
Any new table creation is going to be slower than normal until the default table has been emptied if there is a lot of data in it.

@joyful88
Copy link
Author

the step by day was +4 the command to create new partitions hung on the IO operation, given that the default contained data (430 GB). It was clear from the processes that there was work with the disk, but it was not clear what kind. The operation lasted about 6-8 hours, I then canceled it. Next, I transferred everything from the default table to the temporary table and cleared default. Afterwards, the operation of creating new paritias was completed in a couple of seconds.
I also wanted to create the missing partitions. the default table is cleared, and here is the operation SELECT * FROM partman.partition_gap_fill('public.Post'); behaved the same way, creating new partitions with an uncleaned default. I also worked with the disk but in fact did nothing

@keithf4
Copy link
Collaborator

keithf4 commented Jan 5, 2024

Sorry for the delay in response. I'm really not sure what is happening here if there is no data left in the default and normal maintenance seems to be working fine. Do you have an index on the control column?

Any chance you could capture the pg_stat_activity of the gap fill function while it's taking a long time to run? The wait events might help point in the right direction again.

@keithf4
Copy link
Collaborator

keithf4 commented Jan 9, 2024

So nice coincidence, this may be part of the problem with why the gap fill is taking so long

#606

@aleszeleny
Copy link

Consider something like the following query to check the wait events:

WITH psa_pids(pid) AS (
  SELECT DISTINCT pids
  FROM
    pg_stat_activity psa
  , pg_blocking_pids(psa.pid) pbp
  , unnest(pid || pbp) AS pids
  WHERE cardinality(pbp) > 0
)
SELECT
    psa.pid
  , pg_blocking_pids(psa.pid)
  /* , state, state_change */
  , wait_event_type
  , wait_event
  , backend_type
  , pl.relation::regclass
  , pl.locktype
  , pl.mode
  , pl.granted
  , substr(query, 1, 10) as qry
FROM psa_pids
INNER JOIN pg_stat_activity psa USING (pid)
LEFT JOIN pg_catalog.pg_locks pl on psa.pid = pl.pid
WHERE pl.granted = false
ORDER BY pid, pl.relation::regclass, pl.locktype, pl.mode;

@keithf4
Copy link
Collaborator

keithf4 commented Mar 19, 2024

I have a beta release up for version 5.1 which also includes a fix for the gap fill function in 4.8. This shouldn't be an issue anymore in version 5.0, so if you're able to upgrade and test, please let me know. Otherwise I should have version 4.8 out by the end of this month.

@aleszeleny
Copy link

aleszeleny commented Mar 19, 2024 via email

@keithf4
Copy link
Collaborator

keithf4 commented Apr 5, 2024

Version 5.1 has been released with this fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants