Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve backend numeric partitioning on BigQuery #167

Open
nj1973 opened this issue Apr 25, 2024 · 0 comments
Open

Improve backend numeric partitioning on BigQuery #167

nj1973 opened this issue Apr 25, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@nj1973
Copy link
Collaborator

nj1973 commented Apr 25, 2024

Take this example Oracle partition scheme:

PARTITION BY RANGE ("DT_KEY")
…
 (PARTITION "P2005"  VALUES LESS THAN (20060101),
 PARTITION "P2006"  VALUES LESS THAN (20070101)
…

Currently in addition to lower/upper bounds we need to pass Offload --partition-granularity to define the partition step, for example:

./offload -t OWNER.FACT -x \
--partition-granularity=10000 \
--partition-lower-value=20000000 \
--partition-upper-value=30000000

This adds a synthetic partition key:

PARTITION BY RANGE_BUCKET(`GOE_PART_10000_DT_KEY`, GENERATE_ARRAY(20000000, 30000000, 10000))

And tuncates the synthetic values by 10,000. This is a hangover from Hadoop partitioning.

Instead we should be able to use the real column for partitioning and add a new step option, --partition-granularity, independent of the granularity. Then we could partition by:

PARTITION BY RANGE_BUCKET(`DT_KEY`, GENERATE_ARRAY(20000000, 30000000, 10000))
@nj1973 nj1973 added the enhancement New feature or request label Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant