Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchStage.sort() #2469

Merged
merged 39 commits into from Sep 16, 2020
Merged

Conversation

MohamedMandouh
Copy link
Contributor

@MohamedMandouh MohamedMandouh commented Aug 27, 2020

Introduces sorting feature for batch use cases of hazelcast jet.
Usage:
BatchStage.sort() for sorting based on natural ordering of items.
BatchStage.sort(ComparatorEx<V> comparator) for sorting based on user-defined comparator.

Breaking change: this changes the Edge and EdgeDef serialization format, breaking the backwards compatibility of a suspended job if it resumes on a newer Jet version.

Checklist

  • Tags Set
  • Milestone Set
  • New public APIs have @Nonnull/@Nullable annotations
  • New public APIs have @since tags in Javadoc

…y-sorting

# Conflicts:
#	hazelcast-jet-core/src/main/java/com/hazelcast/jet/core/Edge.java
#	hazelcast-jet-core/src/main/java/com/hazelcast/jet/impl/execution/init/EdgeDef.java
@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

5 similar comments
@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

headItem = headObject;
}
if (headItem == null) {
return NO_PROGRESS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may already have made progress by consuming some items in previous while loop iterations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@cangencer
Copy link
Contributor

This also needs adding to /site/docs/api/stateful-transforms.md

}

/**
* Returns the comparator defined on this edge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add a reference back to monoticOrder here

* @since 4.3
*/
@Nonnull
BatchStage<T> sort(@Nonnull ComparatorEx<T> comparator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ? super T

import static com.hazelcast.jet.core.processor.Processors.sortPrepareP;


public class SortTransform<V> extends AbstractTransform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use T here instead of V

import java.util.TreeSet;

public class SortPrepareP<V> extends AbstractProcessor {
private final SortedSet<V> set;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think using SortedSet is correct here. It will skip duplicates. PriorityQueue would be more correct.

@cangencer
Copy link
Contributor

@MohamedMandouh if possible, you should do partialSort in another PR to keep the scope small for this one.

@mtopolnik
Copy link
Contributor

Just stage.aggregate(topN(10)) will give you a sorted list as output, there is no need for any other upstream action. Theoretically, for optimization purposes, we could have every upstream processor limit its output to 10 items, and this could then become a higher-level stage operation that could be captured with apply.

@mtopolnik
Copy link
Contributor

verify

@mtopolnik mtopolnik merged commit 73fe09d into hazelcast:master Sep 16, 2020
@mtopolnik mtopolnik changed the title In-Memory Sorting BatchStage.sort() Sep 16, 2020
@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

5 similar comments
@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

@devOpsHazelcast
Copy link
Contributor

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants