Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make compatible with psycopg 3. #18

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hwalinga
Copy link

I made the code compatible with psycopg 3. The code now works on psycopg2 and psycopg 3.

I made a compat_copy_from as connection.copy_from is not available in psycopg 3 anymore and used that in the copy_from function.

I made a function to get the connection's encoding on both platforms.

The Range types are now all just Range, so made a compatibility block that changes all different types of Range to just Range on psycopg 3.

In the tests

The binary fields just return bytes now, so there is a compatible function tobytes that just returns bytes on both platforms.

The empty nested array is not anymore reduced on psycopg 3, so those tests are now skipped for psycopg 3.

Performance

I am unsure if this is as performant as possible on psycopg 3, but at least it is now compatible with both platforms.

@jerch
Copy link
Contributor

jerch commented Apr 19, 2023

Oh wow, thx for looking into this 😸 . Will check it out later.

@jerch
Copy link
Contributor

jerch commented Apr 25, 2023

@hwalinga
I looked through your changes - looks really good to me.

Still I am hesitating to add this for the following reason - in my early tests about different insert/update tricks I found the new copy.write_row() to outperform my manual value formatting done in python by 2-5 times*. And my value formatting only comes close to that speed by a really bad trick - it omits explicit formatting for types, whose string repr I found to be understood by postgres' TEXT format, so it is def. less reliable than the default formatters provided by psycopg3. Thus I think the whole value formatting as currently done is on stake for psycopg3.

For reference also see https://gist.github.com/jerch/fd0fae0107ce7b153b7540111b2e89ab?permalink_comment_id=4097804#gistcomment-4097804.

TL;DR: With copy.write_row() we get saver value formatting and a speedup of 2-5 times*, depending on data types in the records.


[*] insert time on the interim temp table

@jerch
Copy link
Contributor

jerch commented Nov 7, 2023

Did some preliminary tests with an early psycopg3 rewrite with write_row() (values are the time in seconds to update 10k instances):

10000 instances
bulk_update: 69.83318078517914
fast_update: 2.205981731414795
copy_update: 0.37508416175842285

Thats ~186x faster than bulk_update and ~6x faster than fast_update, while the psycopg2 variant is only ~2.5x faster than fast_update. So by using write_row we gain another perf bonus of ~3x, plus its implicit types checks. Def. the way to go.

Edit: The binary format gives another 5-10% perf bonus in my first tests. So we should get BINARY working, maybe with an option to resort to TEXT later on (in case someone has created a complex field w'o a suitable adapter).

@jerch
Copy link
Contributor

jerch commented Nov 21, 2023

More numbers on the psycopg3 rewrite testing different settings like BINARY/TEXT and using django methods to prepare values.

Trying to utilize django's get_db_prep_save method on fields for value adaption is quite expensive:

  • with get_db_prep_save:
    # format: TEXT
    10000 instances
    fast_update: 2.045956015586853
    copy_update: 0.48638129234313965
    copy3_update: 0.7911456823348999
    
    # format:BINARY
    10000 instances
    fast_update: 2.0687363147735596
    copy_update: 0.5054255723953247
    copy3_update: 0.6903678178787231
  • w'o get_db_prep_save:
    # format: TEXT
    10000 instances
    fast_update: 2.0186638832092285
    copy_update: 0.5016790628433228
    copy3_update: 0.3683037757873535
    
    # format: BINARY
    10000 instances
    fast_update: 2.128391742706299
    copy_update: 0.5029968023300171
    copy3_update: 0.2735722064971924  <-- almost 8x faster than fast_update!

(Note that I had to disable some fields on the model for the update for runtime comparison, as json and inet values must be decorated by their psycopg3 types prehand, which get_db_prep_save does. With these values decorated manually on src data, the difference between copy_update and copy3_update gets even bigger)

Perfwise the best combination is relying on psycopg3's auto adaption (not using get_db_prep_save) with the BINARY protocol. Thats a bit unfortunate, as get_db_prep_save is django's official way to mess with field values on db level, which we also should use here (Note that copy_update doesnt use it either, as the encoders there are meant as a faster replacement for value translation into TEXT format).

In general it is a good thing to be more in line with django default features, so we prolly should use get_db_prep_save by default. Since this lib is about "as fast as possible" we could introduce two more arguments to deal with that:

  • skip_value_preparation whether to use django's value preparation mechanism, if enabled the value must be in the correct format already (e.g. Jsonb(value) for a jsonb field), should default to False
  • use_binary whether to use BINARY transport format. Still unclear, if we get this working for field types, if so it prolly should be enabled by default, as it further gives quite a perf bonus

Edit:
More perf squeezing - with also skipping the sanity checks BINARY+skip_value_preparation ends up being really fast:

copy3_update: 0.22316670417785645  <-- now almost 10x faster than fast_update

Maybe skip_value_preparation should also skip the sanity checks, as the values must be in line anyway...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants