Improve performance of Record.into(Table) and related calls #11058

alacoste · 2020-12-01T12:01:08Z

Hi, and first of all thanks for a this great tool, we've been able to do almost everything we wanted so far on top of this clean abstraction that is jOOQ!

We are starting to run into performance issues as our DB fills up, and I tracked down part of it to what I believe is an inefficiency when converting "untyped" Records into "typed" MyTableRecords.

During my debugging, I found out that Record.into(Table) takes a small but non-negligible time, in my example debugging case ~200us. Unfortunately on a query returning 200 results this quickly adds up to ~40ms, a very large overhead for this simple query that itself takes ~10ms.

Profiling and looking at the code itself, it seems that most of the time is spent computing the mapping between the fields in the resulting Record and the target MyTableRecord:

From that observation, I think it should be possible to greatly reduce the overhead for queries returning relatively large result sets by only computing this mapping once and re-using it for all resulting records, h and I hoped this would be implemented in Result<Record>.into(Table), but unfortunately all this does is looping over all resutling record and calling Record.into(Table).

Obviously I know quite little of the jOOQ codebase so please lmk if I missed something obvious, or if there is a reason preventing to factorize the mapping computation to happen only once per query as opposed to once per record.

I'm planning to try and implement an alternative version of Result<>.into to try and do that as a PoC/workaround when I have the time, but obviously it would be lovely if the library itself supported it, and if I'm right a lot of other jOOQ users would benefit too!

Thanks,
-Arthur

Note:
I had posted a probably better written version of this as a comment on https://www.jooq.org/doc/latest/manual/sql-execution/performance-considerations/ a couple weeks ago, but it seems the discussion threads on the documentation pages have been entirely removed basically right after :(
So sorry if this issue is a bit less precise than it could be, I'm reconstructing my findings mostly from memory.

The text was updated successfully, but these errors were encountered:

lukaseder · 2020-12-01T12:52:23Z

Thanks a lot for your report, and I'm sorry for the inconvenience about the removed comment on the manual. We hadn't removed only your comment, but all comments from the website, see #10962. I replied to your comment, and you should have received a notification message. You can still access it on disqus.com:
https://disqus.com/home/discussion/jooq/performance_considerations/#comment-5156026924

If you haven't received a notification, this just shows once more that disqus does not work well enough for us. I've lost way too many notifications myself, with messages having gone unanswered for months!

You're right, we should extract a mapper for the duration of the loop instead of looking it up on every record. We already do that for a variety of other mappings (e.g. into(Class)), but we're not doing that for into(Table) yet. This will be a pretty simple fix, I think. Will try to fix it right away.

lukaseder · 2020-12-01T13:17:25Z

Related: #6639

lukaseder · 2020-12-01T13:24:02Z

I'm not sure how to read your flame graph, but in a profiling session, it seems obvious that a lot of CPU cycles are wasted in wrapping the Table.fields() array in a temporary Row instance:

Another hot spot is looking up fields in the fields array with the O(N) lookup algorithm:

Both can obviously be improved.

lukaseder · 2020-12-01T13:43:07Z

Well, caching the fieldsRow() lookup in the table seems to be a very low hanging fruit.

Before

Benchmark                                Mode  Cnt        Score       Error  Units
BenchmarkRecordMapper.intoTableCache            thrpt    5  1458359.918 ±  9696.187  ops/s
BenchmarkRecordMapper.intoTableNoCache          thrpt    5  1479192.078 ± 64283.812  ops/s

After

Benchmark                                Mode  Cnt        Score       Error  Units
BenchmarkRecordMapper.intoTableCache            thrpt    5  1762523.256 ±  8437.748  ops/s
BenchmarkRecordMapper.intoTableNoCache          thrpt    5  1780473.360 ±  8192.952  ops/s

Other mapping benchmarks seem unaffected by the change, neither positively nor negatively.

lukaseder · 2020-12-01T13:55:39Z

With the cached Table.fieldsRow(), we can now attempt to avoid calling Tools.row0() in the AbstractRecord constructor:

Further improvement can be seen:

Benchmark                                Mode  Cnt        Score       Error  Units
BenchmarkRecordMapper.intoTableCache    thrpt    5  1918579.192 ± 17475.524  ops/s
BenchmarkRecordMapper.intoTableNoCache  thrpt    5  1909308.100 ± 16318.026  ops/s

lukaseder · 2020-12-01T14:02:37Z

These were the low hanging fruit, which brought a 30% improvement, which isn't bad. Now the elephant in the room is the unnecessary, repeated call to Row.indexOf(field), which is an O(N) field look up to map between source and target fields, something that can definitely be cached for the duration of the loop within the Result, but it's not as simple as the other improvements:

lukaseder · 2020-12-01T14:10:28Z

It's not worth looking at Result.into(Table) in isolation. There are many other looping mapping methods on Result that could profit from this, including e.g. Result.intoMap(Table), Result.intoMap(Table, Table), and many more. I'll need to think about this a bit more.

I'm planning to try and implement an alternative version of Result<>.into to try and do that as a PoC/workaround when I have the time, but obviously it would be lovely if the library itself supported it, and if I'm right a lot of other jOOQ users would benefit too!

Did you have anything specific in mind?

lukaseder · 2020-12-01T14:12:09Z

Some improvements will be backported to 3.14.5 (#11060)

alacoste · 2020-12-01T14:20:15Z

I probably missed the notification, but had guessed you migrated away from that comment system. Github does have its many flaws but it does an OK job at reporting and discussing issues!

Thanks for your super-prompt action on this, and it seems I'd be more getting in your way than anything else, but please do let me know if I can do anything that would help!

alacoste · 2020-12-01T14:23:12Z

It's not worth looking at Result.into(Table) in isolation. There are many other looping mapping methods on Result that could profit from this, including e.g. Result.intoMap(Table), Result.intoMap(Table, Table), and many more. I'll need to think about this a bit more.

I'm planning to try and implement an alternative version of Result<>.into to try and do that as a PoC/workaround when I have the time, but obviously it would be lovely if the library itself supported it, and if I'm right a lot of other jOOQ users would benefit too!

Did you have anything specific in mind?

Well, nothing clear obviously since I don't know the inner working of jOOQ very well, but basically hoped to compute the "mapping" of the kind of records returned by the actual query to the kind of records I want once, then apply that mapping 200 times, as my investigation led me to believe constructing the mapping was the slower part, and moving the data around was fast.

lukaseder · 2020-12-01T14:29:56Z

Hmm, something that isn't being measured currently in our benchmarks is your usage of non-TableField fields in your projection, which leads to the less efficient TableFieldImpl::equals call. I wonder how that came to be... Would be interesting to see your jOOQ query. It doesn't happen when you use the table fields from the code generator, as that would produce short circuiting identity comparisons in Row.indexOf(Field)...

lukaseder · 2020-12-01T14:34:59Z

as my investigation led me to believe constructing the mapping was the slower part, and moving the data around was fast.

Yes, the remaining bottleneck is the lookup of source column index -> target column index, which is done on each iteration.

lukaseder · 2020-12-01T14:35:26Z

but please do let me know if I can do anything that would help!

If you could add the query leading to your flamegraph, that would be very useful

alacoste · 2020-12-01T15:04:40Z

Here is the SQL query in my debugging case. It lets us do a batch query for all messages in several conversations ("experiences", here 60 of them), with pagination (order by ... limit 6) of messages inside each conversation.

SELECT "tmp_table"."target_id",
       "message"."id",
       "message"."uuid",
       "message"."created_at",
       "message"."event_time",
       "message"."text_content",
       "message"."experience_id",
       "message"."type",
       "message"."starred",
       "message"."doctor_id",
       "message"."visible_to_patient",
       "message"."file_upload_id",
       "message"."retry_counter",
       "message"."reply_message_id",
       "message"."is_reply",
       "message"."sender_type",
       "message"."is_automatic",
       "message"."has_deleted_content"
FROM (VALUES (1),
             (2),
             (3),
             (4),
             (5),
             (6),
             (7),
             (8),
             (9),
             (10),
             (11),
             (12),
             (13),
             (14),
             (15),
             (16),
             (17),
             (18),
             (19),
             (20),
             (21),
             (22),
             (23),
             (24),
             (25),
             (26),
             (27),
             (28),
             (29),
             (30),
             (31),
             (32),
             (33),
             (34),
             (35),
             (36),
             (37),
             (38),
             (39),
             (40),
             (41),
             (42),
             (43),
             (44),
             (45),
             (46),
             (48),
             (47),
             (49),
             (50),
             (51),
             (52),
             (53),
             (54),
             (55),
             (56),
             (57),
             (58),
             (59),
             (60)) AS "tmp_table" ("target_id"),
     LATERAL (SELECT "public"."message"."id",
                     "public"."message"."uuid",
                     "public"."message"."created_at",
                     "public"."message"."event_time",
                     "public"."message"."text_content",
                     "public"."message"."experience_id",
                     "public"."message"."type",
                     "public"."message"."starred",
                     "public"."message"."doctor_id",
                     "public"."message"."visible_to_patient",
                     "public"."message"."file_upload_id",
                     "public"."message"."retry_counter",
                     "public"."message"."reply_message_id",
                     "public"."message"."is_reply",
                     "public"."message"."sender_type",
                     "public"."message"."is_automatic",
                     "public"."message"."has_deleted_content"
              FROM "public"."message"
              WHERE "public"."message"."experience_id" = "tmp_table"."target_id"
              ORDER BY "public"."message"."event_time" DESC, "public"."message"."created_at" DESC
              LIMIT 6) AS "message";

This query returns 168 records here:

2020-12-01 15:56:32.487 [DefaultDispatcher-worker-4 @coroutine#31] DEBUG ace.db.jooq.utils.JooqLoggerListener - Fetched row(s)           : 168

Then we manually group the resulting rows by conversation (target_id) and convert them into MessageRecords, which is the slow step.

I can try to inspect the jOOQ Query object right before executing it too if that would help.

lukaseder · 2020-12-03T14:11:57Z

A minor, related improvement would be if AbstractKey had a reference of an AbstractRow instead of TableField[], see: #7451

lukaseder · 2020-12-03T15:35:08Z

A lot of improvements have been committed to 3.15 and 3.14 branches.

If any of you, @alacoste and @arlampin are interested in verifying if this fixes your issues, early feedback would be very useful prior to releasing. You can build the jOOQ Open Source Edition from github, or access snapshots builds from here, if you're licensed:
https://www.jooq.org/download/versions

3.15.0-SNAPSHOT will be published by tomorrow
3.14.5-SNAPSHOT has been built just now

I think this issue here is still not fixed, as I haven't benchmarked any TableAlias comparisons from derived tables yet, but the RecordImpl[N] issues from #11064 should hopefully be gone.

alacoste · 2020-12-04T11:36:30Z

Will give it a try this afternoon!

alacoste · 2020-12-04T19:12:43Z

OK, I am running 100 times the same block of code linked earlier and exhibiting the performance issue, measuring on each run the time taken by Result<Record>.into(Table) for the problematic query.

On that metric I can't see any major difference between jooq v3.14.4 and my locally built v3.15.0-SNAPSHOT. There is quite a bit of variance, but both take ~50ms on average.

Here is the profiling results on 3.15.0-SNAPSHOT: server_[run]_2020-12-04-200049.jfr.zip

Screenshot of the interesting part:

lukaseder · 2020-12-11T12:59:39Z

Thanks for the feedback, @alacoste. I'll be looking into this further. There are a few other, related issues I'm benchmarking, currently, including #11099 and also #8762. It seems the caching of fieldsRow is not effective for aliased tables, which are created afresh all the time, not shared like the generated tables. This also impacts performance of the implicit join algorithm (see #8762)

lukaseder · 2020-12-14T13:53:28Z

Looking into the JFR file now with VisualVM.

As mentioned elsewhere, we should absolutely try to avoid calling AbstractQueryPart::equals, and by consequence, DefaultDSLContext::renderInlined. That's a big task. When throwing an exception from AbstractQueryPart::equals, there are 84 integration tests failing, showing how often we're transitively calling this equals method.

A lower hanging fruit might be to use a cached version of AbstractQueryPart::configuration, instead of creating a new one all the time, just for equals (!):

We already have a global instance in Tools.CTX, introduced with #6025 in jOOQ 3.10. We could re-use this instance

lukaseder · 2020-12-14T14:03:23Z

Will fix this in a separate issue: #11125.

alacoste · 2020-12-14T14:07:43Z

Thanks again @lukaseder, please do lmk if I can be of assistance!

lukaseder · 2020-12-14T14:19:03Z

Well, if you have time, an self-contained MCVE (we have a template here: https://github.com/jOOQ/jOOQ-mcve) for the reproducer here would be very useful: #11058 (comment)

Otherwise, I'll be able to look into generally improvable things, but maybe not the exact thing you're running into...

lukaseder · 2020-12-14T14:23:04Z

For the record, I'll investigate all the locations where we call AbstractQueryPart::equals from our integration tests in a separate issue: #11126. Chances are, this will also cover your case, "by accident".

lukaseder · 2020-12-14T15:48:12Z

Without a full MCVE, you could put a breakpoint in AbstractQueryPart::equals, and report all the types of this and that that you encounter on your side to #11126

alacoste · 2021-01-14T11:14:31Z

Hi @lukaseder and sorry for the long radio silence over the holiday break!

I just checked again since you already merged a couple of fixes but our use-case still exhibits the performance problem, so I finally got around to creating a MCVE :)

Here it is: https://github.com/nabla/jOOQ-mcve

I'm running it against a local postgres DB, and the setup is the following:

A table EXPERIENCE, of which I create 10 rows
A table MESSAGE with an EXPERIENCE_ID column, of which I create 20 rows per experience (100 total).
The other columns of message are approximately what we have in our real use-case, just because I think without them the .into() slowness might not be as apparent since there would be less columns to "match".
Repeat 5 times the perf measurement:
- Perform our query using "lateral" to fetch messages for the experiences (which normally serves to paginate on messages inside each experience, here pagination is removed as it was not necessary to reproduce the issue)
- Perform Result.into(MESSAGE), measuring the time taken.

On my machine it takes ~100ms to perform Result.into(MESSAGE) for the 100 rows of the result set.

As always let me know if I can be of any additional help :)
And thanks again for the attention you are dedicating to this issue!

alacoste added the T: Support request label Dec 1, 2020

lukaseder added this to the Version 3.15.0 milestone Dec 1, 2020

lukaseder added C: Functionality E: All Editions P: Medium T: Defect and removed T: Support request labels Dec 1, 2020

lukaseder added this to To do in 3.15 Other improvements via automation Dec 1, 2020

lukaseder added P: High and removed P: Medium labels Dec 1, 2020

lukaseder added a commit that referenced this issue Dec 1, 2020

[#11058] Cache AbstractTable.fieldsRow()

23110d0

lukaseder added a commit that referenced this issue Dec 1, 2020

[#11058] Cache AbstractTable.fieldsRow()

885db2a

lukaseder added a commit that referenced this issue Dec 1, 2020

[#11058] Further reuse of cached Table.fieldsRow()

c83a986

lukaseder added a commit that referenced this issue Dec 1, 2020

[#11058] Further reuse of cached Table.fieldsRow()

0ee48ef

lukaseder changed the title ~~Performance of Record.into(Table) is problematic on large result sets.~~ Improve performance of Record.into(Table) and related calls Dec 1, 2020

lukaseder mentioned this issue Dec 1, 2020

Improve performance of Record.into(Table) and related calls - Merge [#11058] #11060

Closed

lukaseder added a commit that referenced this issue Dec 3, 2020

[#11058] Reference AbstractRow if possible

030df30

lukaseder added a commit that referenced this issue Dec 3, 2020

[#11058] Reference AbstractRow if possible

b8f36e6

This was referenced Dec 10, 2020

High latency when using fetch() API #11099

Open

Add AbstractRow::hasAttachables to speed up AbstractStore::getAttachables #11111

Closed

This was referenced Dec 11, 2020

Improve performance of implicit join algorithm #8762

Closed

Cache implicit join path references #11116

Closed

lukaseder mentioned this issue Dec 14, 2020

Avoid calling AbstractQueryPart::equals #11126

Open

13 tasks

lukaseder modified the milestones: Version 3.15.0, Version 3.16.0 Jun 7, 2021

lukaseder removed this from To do in 3.15 Other improvements Jun 18, 2021

lukaseder modified the milestones: Version 3.16.0, Version 3.17.0 Dec 17, 2021

lukaseder modified the milestones: Version 3.17.0, Version 3.18.0 May 16, 2022

lukaseder modified the milestones: Version 3.18.0, Version 3.19.0 Feb 10, 2023

lukaseder modified the milestones: Version 3.19.0, Version 3.20.0 Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of Record.into(Table) and related calls #11058

Improve performance of Record.into(Table) and related calls #11058

alacoste commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

alacoste commented Dec 1, 2020

alacoste commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

alacoste commented Dec 1, 2020 •

edited

lukaseder commented Dec 3, 2020

lukaseder commented Dec 3, 2020

alacoste commented Dec 4, 2020

alacoste commented Dec 4, 2020

lukaseder commented Dec 11, 2020

lukaseder commented Dec 14, 2020

lukaseder commented Dec 14, 2020

alacoste commented Dec 14, 2020

lukaseder commented Dec 14, 2020

lukaseder commented Dec 14, 2020

lukaseder commented Dec 14, 2020 •

edited

alacoste commented Jan 14, 2021 •

edited

Improve performance of Record.into(Table) and related calls #11058

Improve performance of Record.into(Table) and related calls #11058

Comments

alacoste commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

Before

After

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

alacoste commented Dec 1, 2020

alacoste commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

lukaseder commented Dec 1, 2020

alacoste commented Dec 1, 2020 • edited

lukaseder commented Dec 3, 2020

lukaseder commented Dec 3, 2020

alacoste commented Dec 4, 2020

alacoste commented Dec 4, 2020

lukaseder commented Dec 11, 2020

lukaseder commented Dec 14, 2020

lukaseder commented Dec 14, 2020

alacoste commented Dec 14, 2020

lukaseder commented Dec 14, 2020

lukaseder commented Dec 14, 2020

lukaseder commented Dec 14, 2020 • edited

alacoste commented Jan 14, 2021 • edited

alacoste commented Dec 1, 2020 •

edited

lukaseder commented Dec 14, 2020 •

edited

alacoste commented Jan 14, 2021 •

edited