New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve "arrangement" in collapse and compute #2281
Comments
@hadley: I wonder if we should preserve the order after collapse() and compute(). |
Hmmmm, I think that's reasonable as row order is not fixed in database tables. |
I would not expect compute() to preserve order (as databases don't have that concept). However, the documentation of collapse() doesn't seem claim the result is landed as a table: "collapse doesn't force computation, but collapses a complex tbl into a form that additional restrictions can be placed on." I was reading "complex tbl" as shorthand for "complex tbl calculation." Wouldn't you want an invariant that adding compute() and collapse() don't change the semantics of a calculation? The workflow I am thinking of is grouped ranking as in:
|
I meant it's reasonable for dplyr to preserve the row order, since the database doesn't. It's just a matter of copying the ordering attribute. |
Ah, sorry I misread that as "it is reasonable to not have row-order." Actually another thing I am clamoring for is to make the ordering attribute user visible like "Groups" is. Seeing that would really help users going forward. |
Trying to land results with dplyr::collapse and dplyr::compute appears to lose dplyr::arrange ordering on PostgreSQL (probably will happen on all other DB backends including sparklyr). At the very least the annotation of the ordering being present is lost (hence the warning messages in the example below), but it also seems likely the order is lost (though it isn't obvious in this example).
Details here and pasted below.
Check durability of
dplyr::arrange
throughdplyr::compute
.Notice below: no warnings in frame or runtime.
Notice below: warning "Warning: Windowed expression 'sum("ccol")' does not have explicit order.". Result may appear the same, but we do not seem to be able to depend on that.
Notice below: warning "Warning: Windowed expression 'sum("ccol")' does not have explicit order.". Result may appear the same, but we do not seem to be able to depend on that.
The text was updated successfully, but these errors were encountered: