You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a distributed context, with the data in general partitioned, it's not clear what it means to sort exactly. In the case of terasort, it means that the data is partitioned by range and then sorted within each partition. What are alternatives to ordering and what are the use cases that these solve? Hive has an order by and sort by. Sort is within each reducer. Order has to be followed by a limit clause, which makes it equivalent to top k, which plyrmr has already.
The text was updated successfully, but these errors were encountered:
In a distributed context, with the data in general partitioned, it's not clear what it means to sort exactly. In the case of terasort, it means that the data is partitioned by range and then sorted within each partition. What are alternatives to ordering and what are the use cases that these solve? Hive has an order by and sort by. Sort is within each reducer. Order has to be followed by a limit clause, which makes it equivalent to top k, which plyrmr has already.
The text was updated successfully, but these errors were encountered: