Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition Pruning - SQL Part [HZ-2515] #24813

Merged

Conversation

ivanthescientist
Copy link
Contributor

@ivanthescientist ivanthescientist commented Jun 14, 2023

Fixes HZ-2515

@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
Copy link
Member

@Fly-Style Fly-Style left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I have nothing to add/fix for MVP version, I am approving this.

}
allPartitions = partitionAssignment.values().stream()
.flatMapToInt(a -> Arrays.stream(a))
.sorted()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fly-Style sorted increases complexity from O(N) to O(NlogN) N - number of partitions. Any comments on that? TDD update? Big clusters have lots of partitions and this change impacts every Jet job.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation was quick and simple for demo but also important assumption changes here, allPartitions javadoc has to be updated. Also localPartitions no longer may contain all local partitions. What is the impact of that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted increases complexity from O(N) to O(NlogN) N - number of partitions. Any comments on that?

You wrote this, can't comment it much. I will remove sorted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also localPartitions no longer may contain all local partitions.

Nothing wrong happens here from "contain all local partitions" perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote this

this was quick fix for the demo to be reviewed by you if it is correct and makes sense. If you did not change it I assumed that you analyzed it and determined that it is correct - right?

I will remove sorted.

You restored it (BTW, we assume that some arrays are sorted even though it is not documented as guaranteed), so my question is still valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that you avoided analysis of the impact or other solutions, but at least now it does not increase runtime for non-pruned jobs with large partitions numbers.
I will close this comment when this is described in TDD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will close this comment when this is described in TDD

It is not related to design, it's implementation detail. Why is it worth describing it in TDD?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increased complexity is not a small deal (everything > O(N) should be treated with care). When using member pruning with thousands of partitions this could be a place that limits performace. It should be at least mentioned that some code in case of partition pruning has worse than linear runtime.

@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
Signed-off-by: Sasha Syrotenko <oleksandr.syrotenko@hazelcast.com>
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 17, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 18, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Jul 19, 2023
@Fly-Style Fly-Style requested a review from k-jamroz July 19, 2023 12:57
@@ -108,6 +108,9 @@ public class CreateTopLevelDagVisitor extends CreateDagVisitorBase<Vertex> {

private final DagBuildContextImpl dagBuildContext;

private Integer requiredRootPartitionId;
private Object coordinatorPartitioningKey;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fly-Style requiredRootPartitionId and coordinatorPartitioningKey will be embedded in the DAG, so will be part of cached plan. However, when partition migration happens they may no longer belong to local member and the job will crash.
I am afraid that we have to determine this dynamically, we cannot give up SQL plan caching.

Copy link
Collaborator

@k-jamroz k-jamroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved in order to unblock other PRs, however problem with plan caching and migrations has to be fixed.

@Fly-Style Fly-Style merged commit 2aea4a1 into hazelcast:master Jul 21, 2023
8 checks passed
@ivanthescientist ivanthescientist deleted the partition_pruning_sql_draft branch July 24, 2023 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants