Optimize batch encoder #250

seriyps · 2021-01-15T20:37:11Z

Before the only way to encode Erlang term to PostgreSQL binary format was to use epgsql_wire:encode_parameters that calls epgsql_binary:encode/3 in a loop. The problem is that epgsql_binary:encode/3 does quite a lot of codec() database lookups and structure transformations and this process repeats for each parameter.
It worked quite ok for normal equery, because you usually don't send too many parameters to it. But now with introduction of COPY .. FROM STDIN WITH (FORMAT binary) we might need to encode a lot of rows of the same structure. So it makes sense to do all the OID table lookups in advance.
The same optimization was applied to execute_batch/3, because we use the same statement for multiple rows.

1st commit optimizes COPY, 2nd optimizes execute_batch

Optimization is done by creation of "row encoder" structure that can be used to encode one whole row at a time - no need to lookup OID tables every time. Also, some edoc improvements.

This structure could be reused to encode similar rows without doing OID database lookups for each element

davidw · 2021-01-16T16:41:19Z

src/commands/epgsql_cmd_batch.erl

@@ -48,6 +48,7 @@ init(Batch) when is_list(Batch) ->
    #batch{batch = Batch}.

 execute(Sock, #batch{batch = Batch, statement = undefined} = State) ->
+    %% Each query has it's own statement


Linguistic nitpick: "its", not "it's". "It's" == "it is"

seriyps · 2021-01-21T14:05:44Z

I'm not merging yet, because I plan to wait for my colleagues to do some benchmarks before.

seriyps · 2021-02-01T11:38:12Z

Ok, we tried to import some multi-terabyte dump using both "current" and "optimized" version, but haven't noticed any difference, primarily because the import happened to be disk-bound 😄 . At least, it haven't made anything worse.
I'll either try to add some microbenchmarks or merge it as is bit later.

daironm-hillrom · 2022-06-01T16:08:13Z

Is this ready to merge?

seriyps added 2 commits January 15, 2021 19:16

Optimize encoder for binary "COPY .. FROM STDIN"

2675778

Optimization is done by creation of "row encoder" structure that can be used to encode one whole row at a time - no need to lookup OID tables every time. Also, some edoc improvements.

Optimize execute_batch/3 by constructing "row encoder" structure

69caf6e

This structure could be reused to encode similar rows without doing OID database lookups for each element

davidw reviewed Jan 16, 2021

View reviewed changes

davidw approved these changes Jan 16, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize batch encoder #250

Optimize batch encoder #250

seriyps commented Jan 15, 2021 •

edited

Loading

davidw Jan 16, 2021

seriyps commented Jan 21, 2021

seriyps commented Feb 1, 2021

daironm-hillrom commented Jun 1, 2022

Optimize batch encoder #250

Are you sure you want to change the base?

Optimize batch encoder #250

Conversation

seriyps commented Jan 15, 2021 • edited Loading

davidw Jan 16, 2021

Choose a reason for hiding this comment

seriyps commented Jan 21, 2021

seriyps commented Feb 1, 2021

daironm-hillrom commented Jun 1, 2022

seriyps commented Jan 15, 2021 •

edited

Loading