Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize element_at for maps with complex type keys #7365

Closed
wants to merge 1 commit into from

Conversation

mbasmanova
Copy link
Contributor

Implementation of element_at(map, search) uses linear search to find matching
entry. This is slow when maps are large (100s of entries). This optimization
sorts map keys and uses binary search producing 4x speedup on a real-world
workload. The optimization applies only if there is a single map (constant or
dictionary encoded) since in this case the cost of sorting the map amortizes
across many searches.

There is a standard query optimization technique that generates element_at
over a single map.

A simple join with a small lookup table u:

SELECT * FROM t, u WHERE t.id = u.id

Can be written as

SELECT
   element_at(
      (SELECT map_agg(u.id) FROM u), t.id
   ) 
FROM t

This ensures that records from u will be collected into a single map and
broadcasted to all workers to be "joined" with t. The original join query may
run as partitioned join (if optimizer is not smart enough) and be very slow.

Copy link

netlify bot commented Nov 1, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 7c768af
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6543c73a335272000863f796

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 1, 2023
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@Yuhta
Copy link
Contributor

Yuhta commented Nov 1, 2023

How is this related to #7191? CC: @laithsakka

size_t offset = rawOffsets[mapIndex];
// Fast path for the case of a single map. It may be constant or dictionary
// encoded. Sort map keys, then use binary search.
if (baseMap->size() == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laithsakka How does this criteria compare to your method to check identity between batches?

@mbasmanova Laith is using a method to check if map vector between batches are the same. Your method looks simpler and less error-prone, while his method covers more cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta I'm aware of Laith's work. This optimization is complimentary as it covers maps with complex-type keys while #7191 applies only to maps with primitive type keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, #7191 applies to cases where there are many maps, not just one.

Copy link
Contributor

@Yuhta Yuhta Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible to expand this one to all key types and use Laith's triggering criteria to detect MapVector identity. Then we can use one single optimization to cover all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta @laithsakka Indeed, it should be straightforward to extend this optimization to primitive types. We can also optimize Nested Loop Join to detect the case of single-row build side and use Constant encoding for build-side columns + sort build-side maps. If we do that, we'll have maps coming to element_at already sorted and that will make this optimization even more effective.

Laith, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta @laithsakka There is another pattern where map is a constant defined in the query. In this case it is not produced by the NLJ. We can add logic to always sort constant map literals so that element_at receives maps sorted in that case too. If we do that, we won't need to make element_at stateful and can keep it stateless and relatively simple.

Copy link
Contributor

@laithsakka laithsakka Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about both ideas initially and discussed with @oerling and @kevinwilfong .
My concern was that we have to sort for every batch because I do not have so much confidence
in the isSorted flag is reset whenever a map is updated. we can do the same game of holding a reference though to make sure the map is not updated and hence assert that if it was sorted in one batch it will always be sorted when received.

I do not have string opinion for either, maybe run this opt on the query i am working on also and check the results?
if you strongly feel this is better you can take over and abandon the other PR but also make sure that you double check with @oerling

@Yuhta
Copy link
Contributor

Yuhta commented Nov 1, 2023

If we just sort everything (not just complex typed keys) and binary search if the input is a single map, how does the result compare to the one in #7191?

@laithsakka
Copy link
Contributor

laithsakka commented Nov 1, 2023

one more thing i recall from the discussions with orri is that those base maps can be shared across threads, we need to add locks in sorting if we are doing that. in the functions . unless we sort pre before distributing the vector

@mbasmanova
Copy link
Contributor Author

If we just sort everything (not just complex typed keys) and binary search if the input is a single map, how does the result compare to the one in #7191?

@Yuhta I tried this on Laith's query and it didn't work very well. In his case, map has 70K entries and each batch has only 1K rows. It seems that sorting 70K entry map doesn't get amortized over 1K binary searches and cached hash table works better.

@@ -23,6 +23,22 @@

namespace facebook::velox::functions {

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No anonymous namespace in header. Either put it in detail or the main class SubscriptImpl

const vector_size_t baseIndex;
const vector_size_t index;

static bool lessThen(const MapKey& left, const MapKey& right) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lessThan, or you can just overload operator< since it's a custom struct already

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 34aa5ac.

Copy link

Conbench analyzed the 1 benchmark run on commit 34aa5ac9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants