-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vec_split() #196
Comments
Or would it return a tibble? One row would be the keys ( |
This can do some pretty neat things, especially if using data frames as suppressPackageStartupMessages({
library(vctrs)
library(gapminder)
library(tidyr)
library(dplyr)
})
#> Warning: package 'dplyr' was built under R version 3.5.2
vec_split <- function(x, f) {
keys <- vec_unique(f)
prxy <- vec_duplicate_id(f)
prxy_keys <- vec_unique(prxy)
values <- vctrs:::map(prxy_keys, function(prxy_key) {
vec_slice(x, vec_equal(prxy, prxy_key))
})
tibble::tibble(.keys = keys, .values = values)
}
vec_split(iris, iris$Species)
#> # A tibble: 3 x 2
#> .keys .values
#> <fct> <list>
#> 1 setosa <data.frame [50 × 5]>
#> 2 versicolor <data.frame [50 × 5]>
#> 3 virginica <data.frame [50 × 5]>
vec_split(iris, 1)
#> # A tibble: 1 x 2
#> .keys .values
#> <dbl> <list>
#> 1 1 <data.frame [150 × 5]>
vec_split(iris, NA)
#> # A tibble: 1 x 2
#> .keys .values
#> <lgl> <list>
#> 1 NA <data.frame [150 × 5]>
vec_split(iris, NULL)
#> Error: All columns in a tibble must be 1d or 2d objects:
#> * Column `.keys` is NULL
#> Backtrace:
#> █
#> 1. └─global::vec_split(iris, NULL)
#> 2. └─tibble::tibble(.keys = keys, .values = values)
#> 3. └─tibble:::lst_to_tibble(xlq$output, .rows, .name_repair, lengths = xlq$lengths)
#> 4. └─tibble:::check_valid_cols(x)
# split by unique combinations of 2 columns
gap_nest <- nest(gapminder, continent, country)
vec_split(gap_nest, gap_nest$data)
#> # A tibble: 142 x 2
#> .keys .values
#> <list> <list>
#> 1 <tibble [1 × 2]> <tibble [12 × 5]>
#> 2 <tibble [1 × 2]> <tibble [12 × 5]>
#> 3 <tibble [1 × 2]> <tibble [12 × 5]>
#> 4 <tibble [1 × 2]> <tibble [12 × 5]>
#> 5 <tibble [1 × 2]> <tibble [12 × 5]>
#> 6 <tibble [1 × 2]> <tibble [12 × 5]>
#> 7 <tibble [1 × 2]> <tibble [12 × 5]>
#> 8 <tibble [1 × 2]> <tibble [12 × 5]>
#> 9 <tibble [1 × 2]> <tibble [12 × 5]>
#> 10 <tibble [1 × 2]> <tibble [12 × 5]>
#> # … with 132 more rows
# or this way
vec_split(gapminder, select(gapminder, continent, country))
#> # A tibble: 142 x 2
#> .keys$continent $country .values
#> <fct> <fct> <list>
#> 1 Asia Afghanistan <tibble [12 × 6]>
#> 2 Europe Albania <tibble [12 × 6]>
#> 3 Africa Algeria <tibble [12 × 6]>
#> 4 Africa Angola <tibble [12 × 6]>
#> 5 Americas Argentina <tibble [12 × 6]>
#> 6 Oceania Australia <tibble [12 × 6]>
#> 7 Europe Austria <tibble [12 × 6]>
#> 8 Asia Bahrain <tibble [12 × 6]>
#> 9 Asia Bangladesh <tibble [12 × 6]>
#> 10 Europe Belgium <tibble [12 × 6]>
#> # … with 132 more rows
mat <- matrix(1:50, 10, 5)
mat_f <- rep(1:2, times = 5)
vec_split(mat, mat_f)
#> # A tibble: 2 x 2
#> .keys .values
#> <int> <list>
#> 1 1 <int [5 × 5]>
#> 2 2 <int [5 × 5]> Created on 2019-02-27 by the reprex package (v0.2.1.9000) |
And the fact that the implementation is so simple is a good signal that the underlying primitives are correct. At some point, we will need to rewrite in C to avoid creating the internal dictionary twice (once for the unique values and once for the duplicates). |
But implemented more efficiently internally, and using
vec_slice()
The text was updated successfully, but these errors were encountered: