Auto-partitioning build-in feature #62

muntdan · 2023-03-28T11:31:59Z

Hi,

Timescaledb has the built-in auto partitioning on 1 or 2 dimension (time and tenant/deviceid/etc).

Your documentation states Partitioning to be used for time partitioning, but still it can be done only manually.
It would be great if you implement something for auto-creating the partitions based on configuration of intervals and/or secondary criteria. This can help a lot on the usecase of GDPR cleaning where dropping older data would be a lot faster/easier.

wuputah · 2023-03-28T16:20:21Z

pg_partman can help with automating table-based partition management, but yes having user-configurable auto-partitioning is something we would like to add to the storage layer as well.

muntdan · 2023-05-24T11:45:14Z

Based on the fact that query as is on Hydra cannot cope with multiple query in timely fashion and that timescaledb released the version 2.11 with support for update/delete on compressed data, I have started comparing the 2 of them again from point of view of reading the data.
My results are as fallows on same volume of data 234 million rows:
Hydra: 1 Month Avg single query = 5 - 5.5 seconds / 7-8cores (25-32% from 24 core) / 2-8% IOPS
Timescaledb: 1 Month Avg single query = 1 - 1.5 seconds / 5-10cores (20-35% from 24 core) / 2-8% IOPS
with chunk_time_interval & compression set for 1 hour.

The reason I am writing this here on this issue is that I believe there is another config on my timescaledb that helps a lot for such good performance: space partitioning of which I do use on the above setup having configured 10 tenants that divide the customers on each table into 10 slices, so the actual query of a given client would hit only 1/10 of total chunks in order to run.

In conclusion I believe auto-partitioning feature requested here might also help a lot on the actual query performance from related issue and if simple division could be applied (probably not) => 1/10 x 5.5 seconds the equivalent performance would be of 0.5 seconds compared to Timescaledb.

wuputah · 2023-05-24T14:11:06Z

In the near term, the native Postgres partitioning is the way to do this for Hydra. We don't plan to add this as a primitive to the storage layer for quite some time. We have pg_partman on the hosted product but it is really only needed if you want to do "auto time partitioning" - it helps create new partitions for every slice of time you define (e.g. week or month).

For "partition this into X slices by ID" you can use the "partition by hash" function. It's just a matter of writing the SQL, which is a bit repetitive but fortunately need only be done once. Here's an example:

CREATE TABLE customers (id INTEGER, status TEXT, arr NUMERIC) PARTITION BY HASH(id);
CREATE TABLE cust_part1 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 0) USING columnar;
CREATE TABLE cust_part2 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 1) USING columnar;
CREATE TABLE cust_part3 PARTITION OF customers FOR VALUES WITH (modulus 3, remainder 2) USING columnar;

The thing Timescale does here that's pretty tricky to do is partitioning by both time and dimension… in this case, it would be like partitioning by time first, and then by hash on ID. This is possible to do with native partitioning but it gets tricky to manage as there's many partitions to create, and I'm not aware that pg_partman helping with this case.

PostgreSQL multilevel partitions can be created up to N levels. Partition methods LIST-LIST, LIST-RANGE, LIST-HASH, RANGE-RANGE, RANGE-LIST, RANGE-HASH, HASH-HASH, HASH-LIST, and HASH-RANGE can be created in PostgreSQL declarative partitioning.

muntdan · 2023-05-24T14:39:16Z

So if I define the 10 tenant manual partition on a columnar table and than insert the data via ETL every ~ 5 minutes in the main table, will the columnar stripe/chunks fallow the rules and the actual split the batch data into 10 chunks behind it ?
Also will the columnar query on the main table with where condition on the tenant will know which chunks to read ?

wuputah · 2023-05-24T15:03:47Z

Yes to both.

The only caveat I can think of is that vacuum will need to be done on the individual partitions if you are trying to maintain optimal striping.

wuputah added the enhancement New feature or request label Mar 28, 2023

wuputah mentioned this issue Oct 4, 2023

Question/feature request: Stripe splitting/metadata optimization for older data arriving later #162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-partitioning build-in feature #62

Auto-partitioning build-in feature #62

muntdan commented Mar 28, 2023

wuputah commented Mar 28, 2023

muntdan commented May 24, 2023 •

edited

Loading

wuputah commented May 24, 2023 •

edited

Loading

muntdan commented May 24, 2023

wuputah commented May 24, 2023

Auto-partitioning build-in feature #62

Auto-partitioning build-in feature #62

Comments

muntdan commented Mar 28, 2023

wuputah commented Mar 28, 2023

muntdan commented May 24, 2023 • edited Loading

wuputah commented May 24, 2023 • edited Loading

muntdan commented May 24, 2023

wuputah commented May 24, 2023

muntdan commented May 24, 2023 •

edited

Loading

wuputah commented May 24, 2023 •

edited

Loading