Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ordering #19

Open
piccolbo opened this issue Mar 2, 2014 · 0 comments
Open

ordering #19

piccolbo opened this issue Mar 2, 2014 · 0 comments

Comments

@piccolbo
Copy link
Collaborator

piccolbo commented Mar 2, 2014

In a distributed context, with the data in general partitioned, it's not clear what it means to sort exactly. In the case of terasort, it means that the data is partitioned by range and then sorted within each partition. What are alternatives to ordering and what are the use cases that these solve? Hive has an order by and sort by. Sort is within each reducer. Order has to be followed by a limit clause, which makes it equivalent to top k, which plyrmr has already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant