Make compatible with psycopg 3. #18

hwalinga · 2023-04-19T10:43:52Z

I made the code compatible with psycopg 3. The code now works on psycopg2 and psycopg 3.

I made a compat_copy_from as connection.copy_from is not available in psycopg 3 anymore and used that in the copy_from function.

I made a function to get the connection's encoding on both platforms.

The Range types are now all just Range, so made a compatibility block that changes all different types of Range to just Range on psycopg 3.

In the tests

The binary fields just return bytes now, so there is a compatible function tobytes that just returns bytes on both platforms.

The empty nested array is not anymore reduced on psycopg 3, so those tests are now skipped for psycopg 3.

Performance

I am unsure if this is as performant as possible on psycopg 3, but at least it is now compatible with both platforms.

jerch · 2023-04-19T11:05:09Z

Oh wow, thx for looking into this 😸 . Will check it out later.

jerch · 2023-04-25T12:10:01Z

@hwalinga
I looked through your changes - looks really good to me.

Still I am hesitating to add this for the following reason - in my early tests about different insert/update tricks I found the new copy.write_row() to outperform my manual value formatting done in python by 2-5 times*. And my value formatting only comes close to that speed by a really bad trick - it omits explicit formatting for types, whose string repr I found to be understood by postgres' TEXT format, so it is def. less reliable than the default formatters provided by psycopg3. Thus I think the whole value formatting as currently done is on stake for psycopg3.

For reference also see https://gist.github.com/jerch/fd0fae0107ce7b153b7540111b2e89ab?permalink_comment_id=4097804#gistcomment-4097804.

TL;DR: With copy.write_row() we get saver value formatting and a speedup of 2-5 times*, depending on data types in the records.

[*] insert time on the interim temp table

jerch · 2023-11-07T16:22:02Z

Did some preliminary tests with an early psycopg3 rewrite with write_row() (values are the time in seconds to update 10k instances):

10000 instances
bulk_update: 69.83318078517914
fast_update: 2.205981731414795
copy_update: 0.37508416175842285

Thats ~186x faster than bulk_update and ~6x faster than fast_update, while the psycopg2 variant is only ~2.5x faster than fast_update. So by using write_row we gain another perf bonus of ~3x, plus its implicit types checks. Def. the way to go.

Edit: The binary format gives another 5-10% perf bonus in my first tests. So we should get BINARY working, maybe with an option to resort to TEXT later on (in case someone has created a complex field w'o a suitable adapter).

jerch · 2023-11-21T14:27:30Z

More numbers on the psycopg3 rewrite testing different settings like BINARY/TEXT and using django methods to prepare values.

Trying to utilize django's get_db_prep_save method on fields for value adaption is quite expensive:

with get_db_prep_save:

# format: TEXT
10000 instances
fast_update: 2.045956015586853
copy_update: 0.48638129234313965
copy3_update: 0.7911456823348999

# format:BINARY
10000 instances
fast_update: 2.0687363147735596
copy_update: 0.5054255723953247
copy3_update: 0.6903678178787231

w'o get_db_prep_save:

# format: TEXT
10000 instances
fast_update: 2.0186638832092285
copy_update: 0.5016790628433228
copy3_update: 0.3683037757873535

# format: BINARY
10000 instances
fast_update: 2.128391742706299
copy_update: 0.5029968023300171
copy3_update: 0.2735722064971924  <-- almost 8x faster than fast_update!

(Note that I had to disable some fields on the model for the update for runtime comparison, as json and inet values must be decorated by their psycopg3 types prehand, which get_db_prep_save does. With these values decorated manually on src data, the difference between copy_update and copy3_update gets even bigger)

Perfwise the best combination is relying on psycopg3's auto adaption (not using get_db_prep_save) with the BINARY protocol. Thats a bit unfortunate, as get_db_prep_save is django's official way to mess with field values on db level, which we also should use here (Note that copy_update doesnt use it either, as the encoders there are meant as a faster replacement for value translation into TEXT format).

In general it is a good thing to be more in line with django default features, so we prolly should use get_db_prep_save by default. Since this lib is about "as fast as possible" we could introduce two more arguments to deal with that:

skip_value_preparation whether to use django's value preparation mechanism, if enabled the value must be in the correct format already (e.g. Jsonb(value) for a jsonb field), should default to False
use_binary whether to use BINARY transport format. Still unclear, if we get this working for field types, if so it prolly should be enabled by default, as it further gives quite a perf bonus

Edit:
More perf squeezing - with also skipping the sanity checks BINARY+skip_value_preparation ends up being really fast:

copy3_update: 0.22316670417785645  <-- now almost 10x faster than fast_update

Maybe skip_value_preparation should also skip the sanity checks, as the values must be in line anyway...

Make compatible with psycopg 3.

772cfa8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make compatible with psycopg 3. #18

Make compatible with psycopg 3. #18

hwalinga commented Apr 19, 2023

jerch commented Apr 19, 2023

jerch commented Apr 25, 2023 •

edited

Loading

jerch commented Nov 7, 2023 •

edited

Loading

jerch commented Nov 21, 2023 •

edited

Loading

Make compatible with psycopg 3. #18

Are you sure you want to change the base?

Make compatible with psycopg 3. #18

Conversation

hwalinga commented Apr 19, 2023

jerch commented Apr 19, 2023

jerch commented Apr 25, 2023 • edited Loading

jerch commented Nov 7, 2023 • edited Loading

jerch commented Nov 21, 2023 • edited Loading

jerch commented Apr 25, 2023 •

edited

Loading

jerch commented Nov 7, 2023 •

edited

Loading

jerch commented Nov 21, 2023 •

edited

Loading