feature request: Hybrid mode #40

Closed
adunstan opened this Issue Dec 5, 2014 · 13 comments

Projects

None yet

2 participants

@adunstan
adunstan commented Dec 5, 2014

I'd like a mode that would use non-dynamic SQL to manage data within premake intervals of the current timestamp and dynamic sql (possibly with partition table creation too) for events outside that range, rather than putting those events in the parent table. So this would be a hybrid of dynamic and static.

@keithf4
Owner
keithf4 commented Dec 5, 2014

Interesting idea. I may actually see about just making that the way dynamic works. Or, now that I think about it, wonder if I could just get rid of the separate modes all together? That would simplify a whole lot of code. See any downsides to only having this single hybrid trigger method?

So partition trigger would do this then:

  • First statically define insert statements for the premake interevals. These child tables should all exist, so no need to check if they do (the way static works now)
  • If none of those match, dynamically determine destination child table. If it exists, insert to it. If not, insert to parent (the way dynamic works now)

I'm not really a fan of the idea of dynamically creating tables based on incoming data however. Seems like a big foot-gun (accidentally create thousands of tables on bad data insertion) and will cause other contention issues waiting for child table creation on high traffic tables. I do automatically create new tables based on incoming data for ID partitioning, but that is just creating future partitions and someone already pointed out some contention issues with it which is why I added the option of allowing ID partitioning to use run_maintenance() instead (see #19) . If I did do it, it would most definitely be optional and default to off.

@adunstan
adunstan commented Dec 7, 2014

A single hybrid mode has a lot of attractions. I can do without dynamically creeating tables, although it would be nice to have it as an option. Especially for an initial load and then have it turned off.

@keithf4
Owner
keithf4 commented Dec 7, 2014

For an initial load, this can be done quite easily as is. Look at the partition_data.py script. The data load would arguably go a whole lot faster going into a single parent table first as well as being able to be a lot more efficient having batched commits.

@adunstan
adunstan commented Dec 7, 2014

Doesn't that steer all the inserts to the parent table and then redistribute them? I'd like to be able to avoid the double writing if possible.

@keithf4
Owner
keithf4 commented Dec 7, 2014

Not really any steering, it's just inserting the data to a normal table first and then redistributing it. So yes there is double-writing. Guess it really depends on what your goal is: getting the data loaded in as quickly as possible to be used as soon as possible, or getting the data in in the final format you'd like it in in one shot. I went with the first option. This allows you to delay index creation until all the data is inserted as well.

@adunstan
adunstan commented Dec 7, 2014

Ok, I can see that. Anyway, doing things the way you suggest would be a significant gain.

@keithf4
Owner
keithf4 commented Dec 7, 2014

I'll have that next on my list because I really like the idea as well. Seems so obvious now, wish I'd thought of it when I first wrote it! Will likely take a while to implement because I both have to change a lot of code and script out a migration process for existing users.
Currently working on the sub-partitioning issue.

@keithf4 keithf4 self-assigned this Mar 4, 2015
@keithf4
Owner
keithf4 commented Mar 18, 2015

I'd initially had an option so you could avoid having the dynamic fallback portion of the trigger and stick with the old static method. Now that I'm looking things over, I'm wondering if that's really even needed at all and I should just force all triggers to be this new hybrid? Thought I'd run that question by you in case you could think of any scenario where that dynamic fallback wouldn't be desired. Without it, the data would just go to the parent anyway.

@adunstan

Presumably it will still go into the parent if the partition doesn't exist. For my purposes I cant think of any good reason not to do this, but I'm not familiar with the way everyone else is using the package. If I were designing it from scratch that's what I'd do, though.

@keithf4
Owner
keithf4 commented Mar 18, 2015

Yup, it goes to the parent. So I have my update with the dynamic option backed up. Will do the release without it for now and see how it goes. Thanks for the quick response!

Just FYI, my next version will be 2.0.0 and 9.4+ only with this feature. I'm adding a background worker and using some other newer features and figured this drastic of a change was a good a time as any to introduce this newer version requirement. So apologies if you won't be able to use this hybrid feature on current clients on older versions. More incentive to upgrade! :D

@adunstan

I can manage without for existing clients. But thanks.

@keithf4
Owner
keithf4 commented May 29, 2015

Apologize for how long this took. If you'd like to test this out, I've got a release candidate for v2.0.0 posted with this feature added. You can migrate existing partition sets to this hybrid trigger as well.

https://github.com/keithf4/pg_partman_v2RC1

@keithf4
Owner
keithf4 commented Jun 11, 2015

Version 2.0.0 has been released with this feature. Let me know if you have any issues.

@keithf4 keithf4 closed this Jun 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment