Improve backend numeric partitioning on BigQuery #167

nj1973 · 2024-04-25T14:23:46Z

Take this example Oracle partition scheme:

PARTITION BY RANGE ("DT_KEY")
…
 (PARTITION "P2005"  VALUES LESS THAN (20060101),
 PARTITION "P2006"  VALUES LESS THAN (20070101)
…

Currently in addition to lower/upper bounds we need to pass Offload --partition-granularity to define the partition step, for example:

./offload -t OWNER.FACT -x \
--partition-granularity=10000 \
--partition-lower-value=20000000 \
--partition-upper-value=30000000

This adds a synthetic partition key:

PARTITION BY RANGE_BUCKET(`GOE_PART_10000_DT_KEY`, GENERATE_ARRAY(20000000, 30000000, 10000))

And tuncates the synthetic values by 10,000. This is a hangover from Hadoop partitioning.

Instead we should be able to use the real column for partitioning and add a new step option, --partition-granularity, independent of the granularity. Then we could partition by:

PARTITION BY RANGE_BUCKET(`DT_KEY`, GENERATE_ARRAY(20000000, 30000000, 10000))

The text was updated successfully, but these errors were encountered:

nj1973 added the enhancement New feature or request label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve backend numeric partitioning on BigQuery #167

Improve backend numeric partitioning on BigQuery #167

nj1973 commented Apr 25, 2024 •

edited

Loading

Improve backend numeric partitioning on BigQuery #167

Improve backend numeric partitioning on BigQuery #167

Comments

nj1973 commented Apr 25, 2024 • edited Loading

nj1973 commented Apr 25, 2024 •

edited

Loading