Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefilter join build side when it's too large #22667

Merged
merged 1 commit into from
May 15, 2024

Conversation

kaikalur
Copy link
Contributor

@kaikalur kaikalur commented May 3, 2024

Description

Optimize the build side of join using the distinct keys from left when the right (and left too) are large.

Motivation and Context

SELECT .. FROM T1 JOIN T2 USING(x)

can be very slow/memory intense when T2 is big (and T1 is also big). So the idea is to do something like dynamic filter except on the build side! So the above query becomes:

SELECT ... FROM T1 LEFT JOIN (SELECT * FROM T2 WHERE x IN (SELECT DISTINCT x FROM T1)) T2 USING(x)

This has helped us tremendously in some of our production workloads. So making it an optimization.

Impact

Test Plan

Added tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

Added a new optimization for prefiltering the build side of a join with distinct keys from the probe side.  This can be enabled with the ``join_prefilter_build_side `` session property. :pr:`22667`

     join_prefilter_build_side

@kaikalur kaikalur requested review from jaystarshot, feilong-liu and a team as code owners May 3, 2024 23:43
@kaikalur kaikalur requested a review from presto-oss May 3, 2024 23:43
@kaikalur kaikalur force-pushed the prefilter branch 6 times, most recently from 0eea294 to 6b56668 Compare May 5, 2024 03:22
@agrawaldevesh
Copy link
Contributor

Awesome ! Is this strictly opt in or can it be hbo or cbo'd ?

Is there also a way to fail the select distinct early if its cardinality is too big ?

Finally, I did not follow: can it be applied to multiple equi join keys too ?

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

@kaikalur kaikalur force-pushed the prefilter branch 4 times, most recently from 0c67ec3 to 8cf9cdc Compare May 5, 2024 22:09
Copy link
Contributor

@ClarenceThreepwood ClarenceThreepwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share some performance numbers that you see in your workloads? Maybe even add a SqlBenchmark that showcases this optimization?

IIUC - this optimization reduces the size of the hash table that is built out of T2. In order to do this it adds a second table scan on T1 and then builds a second hash table to compute the distinct join key from T1. I'm curious where the benefit comes from? Is it just the improved performance of the semijoin?

Any thoughts on how this can be used in practice? I ask because this is not a cost based decision and can degrade performance in many usecases

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, if we already know the distinct keys in T1, why not just make it as the build side? No need to calculate the hash values, just use the distinct values as the hash values. This way there is no need to scan T1 twice.

@steveburnett
Copy link
Contributor

suggest minor revision of the release notes entry

== RELEASE NOTES ==

General Changes
* Add optimization for prefiltering the build side of a join with distinct keys from the probe side. This can be enabled with the ``join_prefilter_enabled`` session property. :pr:`22667`

@kaikalur
Copy link
Contributor Author

kaikalur commented May 9, 2024

Can you share some performance numbers that you see in your workloads? Maybe even add a SqlBenchmark that showcases this optimization?

IIUC - this optimization reduces the size of the hash table that is built out of T2. In order to do this it adds a second table scan on T1 and then builds a second hash table to compute the distinct join key from T1. I'm curious where the benefit comes from? Is it just the improved performance of the semijoin?

Any thoughts on how this can be used in practice? I ask because this is not a cost based decision and can degrade performance in many usecases

Two potential cases - a) build side is very big and only a few keys actually match so we shuffle a lot less right side, b) after the semijoin, the build side becomes small enough to broadcast which can eliminate shuffling the full left side which could have a lot of payload.

@kaikalur
Copy link
Contributor Author

kaikalur commented May 9, 2024

I'm curious, if we already know the distinct keys in T1, why not just make it as the build side? No need to calculate the hash values, just use the distinct values as the hash values. This way there is no need to scan T1 twice.

You need to get the rest of the fields!

@kaikalur kaikalur force-pushed the prefilter branch 2 times, most recently from 2db1b6e to 74cf802 Compare May 9, 2024 15:06
@kaikalur
Copy link
Contributor Author

kaikalur commented May 9, 2024

Can you share some performance numbers that you see in your workloads? Maybe even add a SqlBenchmark that showcases this optimization?
IIUC - this optimization reduces the size of the hash table that is built out of T2. In order to do this it adds a second table scan on T1 and then builds a second hash table to compute the distinct join key from T1. I'm curious where the benefit comes from? Is it just the improved performance of the semijoin?
Any thoughts on how this can be used in practice? I ask because this is not a cost based decision and can degrade performance in many usecases

Two potential cases - a) build side is very big and only a few keys actually match so we shuffle a lot less right side, b) after the semijoin, the build side becomes small enough to broadcast which can eliminate shuffling the full left side which could have a lot of payload.

Added benchmark with:

select count(1) from part join lineitem using (partkey) where part.name like '%x%'

Original:
          join_prefilter_build_side :: 2158.693 cpu ms :: 4.17MB peak memory :: in 75.2K,      0B,   34.8K/s,      0B/s :: out   381,  26.7KB,     176/s,  12.4KB/s
With optimization:
          join_prefilter_build_side :: 2189.438 cpu ms :: 2.02MB peak memory :: in 90.2K,      0B,   41.2K/s,      0B/s :: out   381,  26.7KB,     174/s,  12.2KB/s

See the memory reduction
@kaikalur
Copy link
Contributor Author

kaikalur commented May 9, 2024

Add a task for making it cost based - needs tracking some new stats in hbo:

https://fburl.com/2aui8j1i

@kaikalur
Copy link
Contributor Author

@ClarenceThreepwood - can you take a look when you get a chance?

Copy link
Contributor

@ClarenceThreepwood ClarenceThreepwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the release note with the new name of the session property

@ClarenceThreepwood
Copy link
Contributor

Add a task for making it cost based - needs tracking some new stats in hbo:

https://fburl.com/2aui8j1i

This is meta internal only?

@kaikalur
Copy link
Contributor Author

Add a task for making it cost based - needs tracking some new stats in hbo:
https://fburl.com/2aui8j1i

This is meta internal only?

Oops. Sorry. here correct link:

#22706

@kaikalur
Copy link
Contributor Author

Please update the release note with the new name of the session property

Done

@kaikalur
Copy link
Contributor Author

addressed comments

@ClarenceThreepwood
Copy link
Contributor

ClarenceThreepwood commented May 14, 2024

Please update the release note with the new name of the session property

Done

It still has the old name here
"Added a new optimization for prefiltering the build side of a join with distinct keys from the probe side. This can be enabled with the join_prefilter_enabled session property. :pr:22667"

@kaikalur
Copy link
Contributor Author

Please update the release note with the new name of the session property

Done

It still has the old name here Added a new optimization for prefiltering the build side of a join with distinct keys from the probe side. This can be enabled with the join_prefilter_enabled session property. :pr:22667

Please update the release note with the new name of the session property

Done

It still has the old name here Added a new optimization for prefiltering the build side of a join with distinct keys from the probe side. This can be enabled with the join_prefilter_enabled session property. :pr:22667

OK for real this time lol - damn scrollbar!

@kaikalur
Copy link
Contributor Author

OK all comments addressed (again)

@kaikalur kaikalur requested a review from pranjalssh May 15, 2024 00:03
@kaikalur kaikalur merged commit 20f6640 into prestodb:master May 15, 2024
56 checks passed
@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
@yingsu00
Copy link
Contributor

I'm curious, if we already know the distinct keys in T1, why not just make it as the build side? No need to calculate the hash values, just use the distinct values as the hash values. This way there is no need to scan T1 twice.

You need to get the rest of the fields!

@kaikalur Could you explain a bit more? If we just make T1 the build side, we can still get the rest of the fields, can't we? For example, select part.partkey, lineitem.quantity from part join lineitem using (partkey) where part.name like '%x%' if we make part the build side, the dynamic filter on partkey can be propergated to lineitem, and part.partkey, lineitem.quantity can be output as well. It's still inner equijoin so the semantics should not change. Does the SQL standard say T2 must be the build side in T1 join T2?

@kaikalur
Copy link
Contributor Author

I'm curious, if we already know the distinct keys in T1, why not just make it as the build side? No need to calculate the hash values, just use the distinct values as the hash values. This way there is no need to scan T1 twice.

You need to get the rest of the fields!

@kaikalur Could you explain a bit more? If we just make T1 the build side, we can still get the rest of the fields, can't we? For example, select part.partkey, lineitem.quantity from part join lineitem using (partkey) where part.name like '%x%' if we make part the build side, the dynamic filter on partkey can be propergated to lineitem, and part.partkey, lineitem.quantity can be output as well. It's still inner equijoin so the semantics should not change. Does the SQL standard say T2 must be the build side in T1 join T2?

No but that's a join reordering problem?

@yingsu00
Copy link
Contributor

No but that's a join reordering problem?

Yes, that's what I was asking. Can't we just reorder the join? Or does the SQL standard say the table on the right side of the JOIN keyword has to be the build side? It seems to me the join order is a decision of the engine, isn't it?

@feilong-liu
Copy link
Contributor

No but that's a join reordering problem?

Yes, that's what I was asking. Can't we just reorder the join? Or does the SQL standard say the table on the right side of the JOIN keyword has to be the build side? It seems to me the join order is a decision of the engine, isn't it?

My understanding is that it works when the probe side has large payload to output. For example:
select T1.k, T1.map, T1.row, T1.array, T2.k, T2.v from T1 Join T2 using(k)
here T1.map, T1.row, T1.array are large columns, if we put T1 on build side, we will pay a large memory cost to store these payloads, which can OOM.

@yingsu00
Copy link
Contributor

My understanding is that it works when the probe side has large payload to output. For example:
select T1.k, T1.map, T1.row, T1.array, T2.k, T2.v from T1 Join T2 using(k)
here T1.map, T1.row, T1.array are large columns, if we put T1 on build side, we will pay a large memory cost to store these payloads, which can OOM.

@feilong-liu Thanks for your message. It makes sense. Maybe we can keep the pointers instead of copying all these fields to the hash table in that case.

@feilong-liu
Copy link
Contributor

My understanding is that it works when the probe side has large payload to output. For example:
select T1.k, T1.map, T1.row, T1.array, T2.k, T2.v from T1 Join T2 using(k)
here T1.map, T1.row, T1.array are large columns, if we put T1 on build side, we will pay a large memory cost to store these payloads, which can OOM.

@feilong-liu Thanks for your message. It makes sense. Maybe we can keep the pointers instead of copying all these fields to the hash table in that case.

Keep pointers sounds an interesting idea. Can you elaborate more? One challenge I can think of is that, even we keep pointers, we still need to read these payload and attach to the result later, which still need to load the payload from memory.

@yingsu00
Copy link
Contributor

yingsu00 commented Jul 3, 2024

The payload is already in memory from the incoming pages. But you're right, if both sides are too big to be held in memory, then the side that produces smaller hash table (including the output payload) should be the build side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants