getindex(::Column) performance #252

willtebbutt · 2022-07-31T11:08:40Z

Starts to address #250 . Entirely fixes the bug as described in that issue (Float64s, each call to getindex allocates nothing).

The changes include:

introducing a layer of indirection (thanks @tpgillam for this suggestion) in the PQValue type to prevent the need to allocate when result (which is a mutable struct) is used to construct a PQValue.
moving a lot of type information into the type of ~~PQValue and~~ Column so that it can be statically resolved.
Replacing the parse_type closure with the ParseType{T} callable to force the compiler to specialise.

I need advice as to how to proceed (@iamed2 I would like to take you up on your offer in #250 :) ) :

Have I gone about this in a sensible way -- I don't have the overview of the package's architecture that you do @iamed2, so I might be missing something obvious.
how should I write performance tests for this? Is there a good place in runtests into which I should hook? It's a rather large file, so I'm not completely sure what a good entrance point would be.
What types should I be testing for performance here? Clearly floats / ints / dates etc are good targets -- do we have a collection of examples columns instantiated somewhere in the tests that I could attach some performance tests to?

edit: interestingly, it actually looks like the indirection wasn't necessary in order to remove the allocations. It appears to be the case that everything is sorted by being slightly more assertive in the specialisations that the compiler is made / allowed to perform. Even more interestingly, it doesn't even look like we need to specialise on the typeof jl_result in PQValue.

src/parsing.jl

src/tables.jl

src/parsing.jl

iamed2 · 2022-08-12T17:30:24Z

Have I gone about this in a sensible way -- I don't have the overview of the package's architecture that you do @iamed2, so I might be missing something obvious.

This is excellent, I love how small and targeted it is now.

iamed2 · 2022-08-12T18:22:45Z

What types should I be testing for performance here? Clearly floats / ints / dates etc are good targets -- do we have a collection of examples columns instantiated somewhere in the tests that I could attach some performance tests to?

We should test with binary format and with text format, and with custom conversion functions (the conversions argument to execute) and custom types (type_map and column_types arguments to execute). We should also check with not_null and without. Testing with those types are good, as well as strings, and it might make sense to test with array types or interval types since those are a bit more complicated type-wise.

What you could do is create a single row with the data of types you care about (in text), then create a temporary table and insert that row N times. You can then select the contents of the table (several times for different args to execute) and measure parsing on each column or together. The test_data variables in the test file have examples of PostgreSQL literals and what they parse to.

how should I write performance tests for this? Is there a good place in runtests into which I should hook? It's a rather large file, so I'm not completely sure what a good entrance point would be.

Given that the benefits are so very huge, it could be worth just recording the memory estimate and time from a benchmark generated on this branch now and just testing that we haven't exceeded it by ~2-3x. You can add a new testset in the results section. Long term it would be good to set up a proper benchmark suite using https://juliaci.github.io/PkgBenchmark.jl/stable/.

willtebbutt · 2022-08-17T09:02:48Z

What you could do is create a single row with the data of types you care about (in text), then create a temporary table and insert that row N times. You can then select the contents of the table (several times for different args to execute) and measure parsing on each column or together. The test_data variables in the test file have examples of PostgreSQL literals and what they parse to.

I'm actually not sure how to proceed with this. Could you provide an example?

iamed2 · 2022-08-17T22:30:39Z

There are examples of all the pieces in the nebulous runtests.jl so it's just a matter of pulling them together in the right way. Collecting them:

This creates a temporary table that will be destroyed when you close the connection. This particular table has one identity column (you can ignore it) and then one string column, for other types you can just add items to the column list replace varchar with e.g. timestamptz.

        conn = LibPQ.Connection("dbname=postgres user=$DATABASE_USER")

        result = execute(conn, """
            CREATE TEMPORARY TABLE libpqjl_test (
                id    bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
                my_string_column   varchar
            );
        """)

This code inserts a row (you can repeat the line to insert multiple rows; they will get a different id):

    result = execute(conn, "INSERT INTO libpqjl_test (my_string_column) VALUES (\$1)", ["foo"])

or

    result = execute(conn, "INSERT INTO libpqjl_test (my_string_column) VALUES ('foo')")

In the latter version, anything in this variable test_data in the first tuple element can be substituted for 'foo'. When SELECTing, it should parse as the second tuple element by default.

In this other test_data, we have a middle column. When you specify that type in a type_map or column_types argument, you should expect the third tuple element as the result when SELECTing.

I think that's a good place to start, I can explain the other bits once you've made it through the above, so that we have that shared context.

Happy to have a call or a pair-programming session to help you get started. But it's worth saying I don't think performance tests should prevent this MR from going in.

willtebbutt · 2022-08-18T10:48:09Z

Thanks for the advice @iamed2 . I've added some regression tests (wound up only using one of the two test data sets). Let me know if you think I've missed anything!

test/runtests.jl

test_bug/test_bug.jl

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

test/runtests.jl

willtebbutt · 2022-08-19T12:19:37Z

Note: we should definitely squash + merge this one when it goes in, because I accidentally committed a bunch of larger files that we don't want in the history (just profiling-related stuff)

test/runtests.jl

Co-authored-by: Eric Davies <iamed2@gmail.com>

willtebbutt · 2022-08-19T21:40:45Z

Thanks @iamed2 . Will merge when CI passes

test/runtests.jl

willtebbutt · 2022-08-19T21:53:40Z

Having said that, I don't have merge right.

@iamed2 pls squash + merge when you get a minute :) Thanks again for the review and helping me get this across the line!

willtebbutt added 5 commits July 30, 2022 20:22

Initial refactor

fb6e698

Remove allocations

f196715

Remove commented code

3a4a426

Revert code movement

8ea7537

Bump patch

0acd3d1

github-actions bot reviewed Jul 31, 2022

View reviewed changes

willtebbutt added 3 commits July 31, 2022 12:14

Remove indirection

fac8933

Revert wrapper

b047948

Sort out formatting

343158a

github-actions bot reviewed Jul 31, 2022

View reviewed changes

src/tables.jl Outdated Show resolved Hide resolved

src/tables.jl Outdated Show resolved Hide resolved

Remove formatting issues

2abf15c

willtebbutt changed the title ~~[WIP] getindex performance~~ getindex(::Column) performance Aug 12, 2022

iamed2 reviewed Aug 12, 2022

View reviewed changes

src/tables.jl Show resolved Hide resolved

iamed2 reviewed Aug 12, 2022

View reviewed changes

src/parsing.jl Show resolved Hide resolved

iamed2 approved these changes Aug 17, 2022

View reviewed changes

willtebbutt added 2 commits August 18, 2022 10:45

Initial performance regression tests

a861dc9

Uncommnt tests

f2173d7

github-actions bot reviewed Aug 18, 2022

View reviewed changes

Make allocation tests more robust

6a53642

github-actions bot reviewed Aug 18, 2022

View reviewed changes

iamed2 reviewed Aug 18, 2022

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

willtebbutt added 4 commits August 18, 2022 19:26

Naming and count_allocs and binary_format

91dff36

Add string test

eb2039a

Uncomment code

135daa0

Remove test code

1f65639

github-actions bot reviewed Aug 19, 2022

View reviewed changes

willtebbutt and others added 2 commits August 19, 2022 08:07

Revert test data extraction

b4e7a9b

Formatting

aaf9419

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

github-actions bot reviewed Aug 19, 2022

View reviewed changes

Merge reversion

e6bff36

iamed2 reviewed Aug 19, 2022

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

Update test/runtests.jl

c7b9636

Co-authored-by: Eric Davies <iamed2@gmail.com>

github-actions bot reviewed Aug 19, 2022

View reviewed changes

test/runtests.jl Show resolved Hide resolved

test/runtests.jl Show resolved Hide resolved

test/runtests.jl Show resolved Hide resolved

test/runtests.jl Show resolved Hide resolved

test/runtests.jl Show resolved Hide resolved

iamed2 merged commit dca8d75 into master Aug 22, 2022

iamed2 deleted the wct/getindex-performance branch August 22, 2022 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getindex(::Column) performance #252

getindex(::Column) performance #252

willtebbutt commented Jul 31, 2022 •

edited

Loading

iamed2 commented Aug 12, 2022

iamed2 commented Aug 12, 2022 •

edited

Loading

willtebbutt commented Aug 17, 2022 •

edited

Loading

iamed2 commented Aug 17, 2022 •

edited

Loading

willtebbutt commented Aug 18, 2022

willtebbutt commented Aug 19, 2022 •

edited

Loading

willtebbutt commented Aug 19, 2022

willtebbutt commented Aug 19, 2022

getindex(::Column) performance #252

getindex(::Column) performance #252

Conversation

willtebbutt commented Jul 31, 2022 • edited Loading

iamed2 commented Aug 12, 2022

iamed2 commented Aug 12, 2022 • edited Loading

willtebbutt commented Aug 17, 2022 • edited Loading

iamed2 commented Aug 17, 2022 • edited Loading

willtebbutt commented Aug 18, 2022

willtebbutt commented Aug 19, 2022 • edited Loading

willtebbutt commented Aug 19, 2022

willtebbutt commented Aug 19, 2022

willtebbutt commented Jul 31, 2022 •

edited

Loading

iamed2 commented Aug 12, 2022 •

edited

Loading

willtebbutt commented Aug 17, 2022 •

edited

Loading

iamed2 commented Aug 17, 2022 •

edited

Loading

willtebbutt commented Aug 19, 2022 •

edited

Loading