Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Record.into(Table) and related calls #11058

Open
alacoste opened this issue Dec 1, 2020 · 38 comments
Open

Improve performance of Record.into(Table) and related calls #11058

alacoste opened this issue Dec 1, 2020 · 38 comments

Comments

@alacoste
Copy link

alacoste commented Dec 1, 2020

Hi, and first of all thanks for a this great tool, we've been able to do almost everything we wanted so far on top of this clean abstraction that is jOOQ!

We are starting to run into performance issues as our DB fills up, and I tracked down part of it to what I believe is an inefficiency when converting "untyped" Records into "typed" MyTableRecords.

During my debugging, I found out that Record.into(Table) takes a small but non-negligible time, in my example debugging case ~200us. Unfortunately on a query returning 200 results this quickly adds up to ~40ms, a very large overhead for this simple query that itself takes ~10ms.

Profiling and looking at the code itself, it seems that most of the time is spent computing the mapping between the fields in the resulting Record and the target MyTableRecord:

Screen Shot 2020-11-17 at 4 04 39 PM

From that observation, I think it should be possible to greatly reduce the overhead for queries returning relatively large result sets by only computing this mapping once and re-using it for all resulting records, h and I hoped this would be implemented in Result<Record>.into(Table), but unfortunately all this does is looping over all resutling record and calling Record.into(Table).

Obviously I know quite little of the jOOQ codebase so please lmk if I missed something obvious, or if there is a reason preventing to factorize the mapping computation to happen only once per query as opposed to once per record.

I'm planning to try and implement an alternative version of Result<>.into to try and do that as a PoC/workaround when I have the time, but obviously it would be lovely if the library itself supported it, and if I'm right a lot of other jOOQ users would benefit too!

Thanks,
-Arthur

Note:
I had posted a probably better written version of this as a comment on https://www.jooq.org/doc/latest/manual/sql-execution/performance-considerations/ a couple weeks ago, but it seems the discussion threads on the documentation pages have been entirely removed basically right after :(
So sorry if this issue is a bit less precise than it could be, I'm reconstructing my findings mostly from memory.

@lukaseder
Copy link
Member

Thanks a lot for your report, and I'm sorry for the inconvenience about the removed comment on the manual. We hadn't removed only your comment, but all comments from the website, see #10962. I replied to your comment, and you should have received a notification message. You can still access it on disqus.com:
https://disqus.com/home/discussion/jooq/performance_considerations/#comment-5156026924

If you haven't received a notification, this just shows once more that disqus does not work well enough for us. I've lost way too many notifications myself, with messages having gone unanswered for months!

You're right, we should extract a mapper for the duration of the loop instead of looking it up on every record. We already do that for a variety of other mappings (e.g. into(Class)), but we're not doing that for into(Table) yet. This will be a pretty simple fix, I think. Will try to fix it right away.

@lukaseder
Copy link
Member

Related: #6639

@lukaseder
Copy link
Member

I'm not sure how to read your flame graph, but in a profiling session, it seems obvious that a lot of CPU cycles are wasted in wrapping the Table.fields() array in a temporary Row instance:

image

image

Another hot spot is looking up fields in the fields array with the O(N) lookup algorithm:

image

Both can obviously be improved.

@lukaseder
Copy link
Member

Well, caching the fieldsRow() lookup in the table seems to be a very low hanging fruit.

Before

Benchmark                                Mode  Cnt        Score       Error  Units
BenchmarkRecordMapper.intoTableCache            thrpt    5  1458359.918 ±  9696.187  ops/s
BenchmarkRecordMapper.intoTableNoCache          thrpt    5  1479192.078 ± 64283.812  ops/s

After

Benchmark                                Mode  Cnt        Score       Error  Units
BenchmarkRecordMapper.intoTableCache            thrpt    5  1762523.256 ±  8437.748  ops/s
BenchmarkRecordMapper.intoTableNoCache          thrpt    5  1780473.360 ±  8192.952  ops/s

Other mapping benchmarks seem unaffected by the change, neither positively nor negatively.

@lukaseder
Copy link
Member

With the cached Table.fieldsRow(), we can now attempt to avoid calling Tools.row0() in the AbstractRecord constructor:

image

Further improvement can be seen:

Benchmark                                Mode  Cnt        Score       Error  Units
BenchmarkRecordMapper.intoTableCache    thrpt    5  1918579.192 ± 17475.524  ops/s
BenchmarkRecordMapper.intoTableNoCache  thrpt    5  1909308.100 ± 16318.026  ops/s

@lukaseder
Copy link
Member

These were the low hanging fruit, which brought a 30% improvement, which isn't bad. Now the elephant in the room is the unnecessary, repeated call to Row.indexOf(field), which is an O(N) field look up to map between source and target fields, something that can definitely be cached for the duration of the loop within the Result, but it's not as simple as the other improvements:

image

@lukaseder
Copy link
Member

It's not worth looking at Result.into(Table) in isolation. There are many other looping mapping methods on Result that could profit from this, including e.g. Result.intoMap(Table), Result.intoMap(Table, Table), and many more. I'll need to think about this a bit more.

I'm planning to try and implement an alternative version of Result<>.into to try and do that as a PoC/workaround when I have the time, but obviously it would be lovely if the library itself supported it, and if I'm right a lot of other jOOQ users would benefit too!

Did you have anything specific in mind?

@lukaseder lukaseder changed the title Performance of Record.into(Table) is problematic on large result sets. Improve performance of Record.into(Table) and related calls Dec 1, 2020
@lukaseder
Copy link
Member

Some improvements will be backported to 3.14.5 (#11060)

@alacoste
Copy link
Author

alacoste commented Dec 1, 2020

I probably missed the notification, but had guessed you migrated away from that comment system. Github does have its many flaws but it does an OK job at reporting and discussing issues!

Thanks for your super-prompt action on this, and it seems I'd be more getting in your way than anything else, but please do let me know if I can do anything that would help!

@alacoste
Copy link
Author

alacoste commented Dec 1, 2020

It's not worth looking at Result.into(Table) in isolation. There are many other looping mapping methods on Result that could profit from this, including e.g. Result.intoMap(Table), Result.intoMap(Table, Table), and many more. I'll need to think about this a bit more.

I'm planning to try and implement an alternative version of Result<>.into to try and do that as a PoC/workaround when I have the time, but obviously it would be lovely if the library itself supported it, and if I'm right a lot of other jOOQ users would benefit too!

Did you have anything specific in mind?

Well, nothing clear obviously since I don't know the inner working of jOOQ very well, but basically hoped to compute the "mapping" of the kind of records returned by the actual query to the kind of records I want once, then apply that mapping 200 times, as my investigation led me to believe constructing the mapping was the slower part, and moving the data around was fast.

@lukaseder
Copy link
Member

Hmm, something that isn't being measured currently in our benchmarks is your usage of non-TableField fields in your projection, which leads to the less efficient TableFieldImpl::equals call. I wonder how that came to be... Would be interesting to see your jOOQ query. It doesn't happen when you use the table fields from the code generator, as that would produce short circuiting identity comparisons in Row.indexOf(Field)...

@lukaseder
Copy link
Member

as my investigation led me to believe constructing the mapping was the slower part, and moving the data around was fast.

Yes, the remaining bottleneck is the lookup of source column index -> target column index, which is done on each iteration.

@lukaseder
Copy link
Member

but please do let me know if I can do anything that would help!

If you could add the query leading to your flamegraph, that would be very useful

@alacoste
Copy link
Author

alacoste commented Dec 1, 2020

Here is the SQL query in my debugging case. It lets us do a batch query for all messages in several conversations ("experiences", here 60 of them), with pagination (order by ... limit 6) of messages inside each conversation.

SELECT "tmp_table"."target_id",
       "message"."id",
       "message"."uuid",
       "message"."created_at",
       "message"."event_time",
       "message"."text_content",
       "message"."experience_id",
       "message"."type",
       "message"."starred",
       "message"."doctor_id",
       "message"."visible_to_patient",
       "message"."file_upload_id",
       "message"."retry_counter",
       "message"."reply_message_id",
       "message"."is_reply",
       "message"."sender_type",
       "message"."is_automatic",
       "message"."has_deleted_content"
FROM (VALUES (1),
             (2),
             (3),
             (4),
             (5),
             (6),
             (7),
             (8),
             (9),
             (10),
             (11),
             (12),
             (13),
             (14),
             (15),
             (16),
             (17),
             (18),
             (19),
             (20),
             (21),
             (22),
             (23),
             (24),
             (25),
             (26),
             (27),
             (28),
             (29),
             (30),
             (31),
             (32),
             (33),
             (34),
             (35),
             (36),
             (37),
             (38),
             (39),
             (40),
             (41),
             (42),
             (43),
             (44),
             (45),
             (46),
             (48),
             (47),
             (49),
             (50),
             (51),
             (52),
             (53),
             (54),
             (55),
             (56),
             (57),
             (58),
             (59),
             (60)) AS "tmp_table" ("target_id"),
     LATERAL (SELECT "public"."message"."id",
                     "public"."message"."uuid",
                     "public"."message"."created_at",
                     "public"."message"."event_time",
                     "public"."message"."text_content",
                     "public"."message"."experience_id",
                     "public"."message"."type",
                     "public"."message"."starred",
                     "public"."message"."doctor_id",
                     "public"."message"."visible_to_patient",
                     "public"."message"."file_upload_id",
                     "public"."message"."retry_counter",
                     "public"."message"."reply_message_id",
                     "public"."message"."is_reply",
                     "public"."message"."sender_type",
                     "public"."message"."is_automatic",
                     "public"."message"."has_deleted_content"
              FROM "public"."message"
              WHERE "public"."message"."experience_id" = "tmp_table"."target_id"
              ORDER BY "public"."message"."event_time" DESC, "public"."message"."created_at" DESC
              LIMIT 6) AS "message";

This query returns 168 records here:

2020-12-01 15:56:32.487 [DefaultDispatcher-worker-4 @coroutine#31] DEBUG ace.db.jooq.utils.JooqLoggerListener - Fetched row(s)           : 168

Then we manually group the resulting rows by conversation (target_id) and convert them into MessageRecords, which is the slow step.

I can try to inspect the jOOQ Query object right before executing it too if that would help.

@lukaseder
Copy link
Member

A minor, related improvement would be if AbstractKey had a reference of an AbstractRow instead of TableField[], see: #7451

@lukaseder
Copy link
Member

A lot of improvements have been committed to 3.15 and 3.14 branches.

If any of you, @alacoste and @arlampin are interested in verifying if this fixes your issues, early feedback would be very useful prior to releasing. You can build the jOOQ Open Source Edition from github, or access snapshots builds from here, if you're licensed:
https://www.jooq.org/download/versions

  • 3.15.0-SNAPSHOT will be published by tomorrow
  • 3.14.5-SNAPSHOT has been built just now

I think this issue here is still not fixed, as I haven't benchmarked any TableAlias comparisons from derived tables yet, but the RecordImpl[N] issues from #11064 should hopefully be gone.

@alacoste
Copy link
Author

alacoste commented Dec 4, 2020

Will give it a try this afternoon!

@alacoste
Copy link
Author

alacoste commented Dec 4, 2020

OK, I am running 100 times the same block of code linked earlier and exhibiting the performance issue, measuring on each run the time taken by Result<Record>.into(Table) for the problematic query.

On that metric I can't see any major difference between jooq v3.14.4 and my locally built v3.15.0-SNAPSHOT. There is quite a bit of variance, but both take ~50ms on average.

Here is the profiling results on 3.15.0-SNAPSHOT: server_[run]_2020-12-04-200049.jfr.zip

Screenshot of the interesting part:
Screen Shot 2020-12-04 at 8 12 12 PM

@lukaseder
Copy link
Member

Thanks for the feedback, @alacoste. I'll be looking into this further. There are a few other, related issues I'm benchmarking, currently, including #11099 and also #8762. It seems the caching of fieldsRow is not effective for aliased tables, which are created afresh all the time, not shared like the generated tables. This also impacts performance of the implicit join algorithm (see #8762)

@lukaseder
Copy link
Member

Looking into the JFR file now with VisualVM.

As mentioned elsewhere, we should absolutely try to avoid calling AbstractQueryPart::equals, and by consequence, DefaultDSLContext::renderInlined. That's a big task. When throwing an exception from AbstractQueryPart::equals, there are 84 integration tests failing, showing how often we're transitively calling this equals method.

A lower hanging fruit might be to use a cached version of AbstractQueryPart::configuration, instead of creating a new one all the time, just for equals (!):

image

We already have a global instance in Tools.CTX, introduced with #6025 in jOOQ 3.10. We could re-use this instance

@lukaseder
Copy link
Member

Will fix this in a separate issue: #11125.

@alacoste
Copy link
Author

Thanks again @lukaseder, please do lmk if I can be of assistance!

@lukaseder
Copy link
Member

Well, if you have time, an self-contained MCVE (we have a template here: https://github.com/jOOQ/jOOQ-mcve) for the reproducer here would be very useful: #11058 (comment)

Otherwise, I'll be able to look into generally improvable things, but maybe not the exact thing you're running into...

@lukaseder
Copy link
Member

For the record, I'll investigate all the locations where we call AbstractQueryPart::equals from our integration tests in a separate issue: #11126. Chances are, this will also cover your case, "by accident".

@lukaseder
Copy link
Member

lukaseder commented Dec 14, 2020

Without a full MCVE, you could put a breakpoint in AbstractQueryPart::equals, and report all the types of this and that that you encounter on your side to #11126

@alacoste
Copy link
Author

alacoste commented Jan 14, 2021

Hi @lukaseder and sorry for the long radio silence over the holiday break!

I just checked again since you already merged a couple of fixes but our use-case still exhibits the performance problem, so I finally got around to creating a MCVE :)

Here it is: https://github.com/nabla/jOOQ-mcve

I'm running it against a local postgres DB, and the setup is the following:

  • A table EXPERIENCE, of which I create 10 rows
  • A table MESSAGE with an EXPERIENCE_ID column, of which I create 20 rows per experience (100 total).
  • The other columns of message are approximately what we have in our real use-case, just because I think without them the .into() slowness might not be as apparent since there would be less columns to "match".
  • Repeat 5 times the perf measurement:
    • Perform our query using "lateral" to fetch messages for the experiences (which normally serves to paginate on messages inside each experience, here pagination is removed as it was not necessary to reproduce the issue)
    • Perform Result.into(MESSAGE), measuring the time taken.

On my machine it takes ~100ms to perform Result.into(MESSAGE) for the 100 rows of the result set.

As always let me know if I can be of any additional help :)
And thanks again for the attention you are dedicating to this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants