-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group by adjacent values (i.e. no sort) #237
Comments
Not sure I understand what is needed here. I can't really make too much sense of the SO thread. |
Basically need a function with good name that does this: group <- function(x) cumsum(c(1, diff(x) != 0))
group(c(2,2,3,4,2)) i.e. it increments the group number every time the value changes. or Rcpp::cppFunction("IntegerVector group(NumericVector x) {
int n = x.size();
IntegerVector y(n);
int grp = 1;
y[0] = 1;
for (int i = 1; i < n; ++i) {
if (x[i] != x[i - 1]) grp++;
y[i] = grp;
}
return y;
}") but it needs to work for all basic vector types. |
Thanks. I understand better now. Could this be one of these new ways to group. Something like:
This way we could then summarise, etc ... |
Ooh, yes, and it would be a really nice performance optimisation if the data was already sorted into groups. |
Maybe |
Some initial code:
As for Indeed one interesting thing with these is that for a given group, all data is adjacent. This could lead to interesting optimizations, essentially bring back what we used to have when group_by used to arrange the data. I would not make these optimisations a priority though as they are likely to need quite some code and care. Perhaps it is best to first experiment with various grouping strategies. |
This doesn't seem so important to me now. We might come back to again if we invest more in performance. |
Is there are status or up-vote on this feature? I would like the ability to capture the next group in a group_by sequence and use that in a summarise or mutate. Syntax aside, it would be nice to get lead_by(.x, type = group) or lead(group_id + 1) to capture the next group. I have solved this in data.table with .I before, and similar shaped data frames with rle() and custom functions, but both get ugly switching back and forth. What I have are long running tasks where I'm trying to create a rollup date. df <- data.frame(polling_date = c(rep(as.Date("2016-10-16"), 3), Ideally I'd like to be able to lead to the next group and mutate a complete_date to look like below. polling_date task_id completed_date But a naive group_by() and lead() does not work. df %>% group_by(polling_date) %>% polling_date task_id completed_date |
a la http://stackoverflow.com/questions/21511257. Used to work with only adjacent groups. Easy enough to do with
cumsum()
anddiff()
, but probably not very efficient, and definitively not expressive.The text was updated successfully, but these errors were encountered: