Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think about rowwise grouping #270

Closed
hadley opened this issue Feb 19, 2014 · 8 comments
Closed

Think about rowwise grouping #270

hadley opened this issue Feb 19, 2014 · 8 comments
Labels
feature a feature request or enhancement
Milestone

Comments

@hadley
Copy link
Member

hadley commented Feb 19, 2014

rowwise <- function(df) {
  n <- nrow(df)

  attr(df, "indices") <- as.list(seq_len(n) - 1) 
  attr(df, "drop") <- FALSE
  attr(df, "group_sizes") <- rep(1L, n)
  attr(df, "biggest_group_size") <- 1L
  attr(df, "labels") <- data.frame(row = seq_len(n))
  attr(df, "vars") <- list(quote(row)) # list(substitute(bootstrap(m)))
  class(df) <- c("grouped_df", "tbl_df", "tbl", "data.frame")

  df
}

# Not useful for vectorised functions. But can be useful for
# non vectorised
summarise(rowwise(mtcars), vs + am)
@hadley hadley added this to the v0.2 milestone Mar 17, 2014
@hadley
Copy link
Member Author

hadley commented Mar 26, 2014

@romainfrancois any thoughts on implementing this? We could either abuse the existing grouping mechanism (as in the code example) or create a new class.

I think the main use case will be in conjunction with do(), so that you can work with data frames whose columns are lists containing (e.g.) linear models.

@romainfrancois
Copy link
Member

Abusing the existing grouping mechanism is cheap, most functions will work (perhaps we would need to be careful about vars.

Conceptually, it might be interesting to structure the data differently so that internally we know there is only one index per group, but that might be quite an investment of time to implement versions that take advantage of that knowledge.

Perhaps at least we could do:

class(df) <- c("rowwise_df", "grouped_df", "tbl_df", "tbl", "data.frame")

so that we can adapt do.

@hadley
Copy link
Member Author

hadley commented Mar 26, 2014

I think we also need to specialise summarise for that case (so we can conceptually use [[ instead of [ internally), but probably everything else is the same. i.e. it should be possible to do:

models <- mtcars %.% group_by(cyl) %.% do(mod = lm(mpg ~ vs, data = .)
models %.% summarise(rsq = summary(mod)$r.squared)

currently you need

models %.% summarise(rsq = summary(mod[[1]])$r.squared)

@romainfrancois
Copy link
Member

mod can only be expanded to mod[[1]] when we know we only have one row per group. So we need a way to communicate this information from rowwise and do.

do here would create a data frame with 3 rows right ? one for each value of cyl?

@hadley
Copy link
Member Author

hadley commented Mar 26, 2014

Right. We could either add a flag or just use the additional class, as you suggested.

@romainfrancois
Copy link
Member

I've put some code in so that:

> models <- mtcars %.% group_by(cyl) %.% do(mod = lm(mpg ~ vs, data = .) )
>
> rowwise(models) %>% mutate( rsq = summary(mod)$r.squared )
Source: local data frame [3 x 3]
Groups: cyl

  cyl     mod         rsq
1   4 <S3:lm> 0.002381953
2   6 <S3:lm> 0.281055142
3   8 <S3:lm> 0.000000000
> rowwise(models) %>% summarise( rsq = summary(mod)$r.squared )
Source: local data frame [3 x 1]

          rsq
1 0.002381953
2 0.281055142
3 0.000000000

With:

> rowwise
function(data){
  structure( data, class = c("rowwise_df", "tbl_df", "data.frame") )
}
<environment: namespace:dplyr>

@romainfrancois
Copy link
Member

rowwise should probably respect a bit more the groupings. Not sure what to do with the other verbs. At the moment, verb( rowwise(data) ) does verb( ungroup(data) ), which might not be ideal.

But most of the internal stuff knows about rowwise:

> mtcars %>% rowwise() %>% summarise( a = n() )
Source: local data frame [32 x 1]

   a
1  1
2  1
3  1
4  1
5  1
6  1
7  1
8  1
9  1
10 1
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
23 1
24 1
25 1
26 1
27 1
28 1
29 1
30 1
31 1
32 1

@hadley
Copy link
Member Author

hadley commented Apr 16, 2014

Thanks - this looks great! I'll fill in the missing pieces and then we should be good to submit to cran :)

@hadley hadley closed this as completed Apr 16, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants