Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use PG cursors + FETCH to partition tables on commit.
Instead of doing a pre-flight query to first figure out partition boundaries,
then chunking the table by filtering on those boundaries (poor performance on
tables without PKs and foreign tables), use a PG server-side cursor and run
FETCHes to add data to CStore files.
Two changes to behaviour: not making objects in parallel and moving the object
hashing to be after we've grabbed the data from the cursor into a temporary
object (after which we actually rename it).
Other simplifications to object creation:
When renaming the object, do a pre-flight check to make sure it doesn't
already exist instead of setting up savepoints. If we do get a race, it's fine
because we'll just reupsert existing meta(data) with the same values.
Also simplify
create_base_fragment
(delete now-unneeded chunking logic)Benchmarking
Setup: I had a CSV file with some data after running
sgr cloud download
:Without a primary key
Old sgr: started a tqdm progressbar with a high-variance ETA (because objects are committed in parallel, so we get 8 objects bumping the pbar at the same time). Terminated after 25 minutes with tqdm estimating a total time of 1h30m.
New sgr: 3 trials, 5m36s, 5m3s, 5m29s (avg 5m23s)
With a primary key
Old sgr: 5m32s, 6m14s, 5m49s (avg 5m52s)
New sgr: 3m23s, 3m39s, 3m15s (avg 3m26s)