Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_intersect(), set_union(), set_diff(), and set_equal() #548

Closed
DavisVaughan opened this issue Aug 27, 2019 · 7 comments · Fixed by #1755
Closed

set_intersect(), set_union(), set_diff(), and set_equal() #548

DavisVaughan opened this issue Aug 27, 2019 · 7 comments · Fixed by #1755
Labels
feature a feature request or enhancement

Comments

@DavisVaughan
Copy link
Member

I really could have used these when doing some Date set comparisons, since base R support is not great.

intersect(as.Date("2019-01-01"), as.Date(c("2019-01-01", "2019-01-02")))
#> [1] 17897

Created on 2019-08-27 by the reprex package (v0.2.1)

@lionel- mentioned that it may make sense to use this new namespaced naming scheme rather than vec_intersect() vec_union() vec_set_diff() and vec_set_equal()

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Aug 27, 2019

First implementation pass. Open to improvement!

library(vctrs)

set_union <- function(x, y) {
  vec_unique(vec_c(x, y))
}

set_diff <- function(x, y) {
  x_in_y <- vec_in(x, y)
  vec_slice(x, !x_in_y)
}

set_intersect <- function(x, y) {
  loc_in_y <- vec_match(x, y)
  loc_in_y <- vec_slice(loc_in_y, !is.na(loc_in_y))
  vec_slice(y, loc_in_y)
}

set_equal <- function(x, y) {
  x_in_y <- vec_in(x, y)
  all_x_in_y <- all(x_in_y)
  
  if (all_x_in_y) {
    y_in_x <- vec_in(y, x)
    all_y_in_x <- all(y_in_x)
    all_y_in_x
  } else {
    FALSE
  }
}

set_union(1:5, 5:10)
#>  [1]  1  2  3  4  5  6  7  8  9 10
set_union(1:5, c(5, 5, 6))
#> [1] 1 2 3 4 5 6

set_diff(1:5, 1:5)
#> integer(0)
set_diff(1:5, 1:2)
#> [1] 3 4 5
set_diff(1, 2:3)
#> [1] 1
set_diff(1, 1:2)
#> numeric(0)

set_intersect(1:5, 1:6)
#> [1] 1 2 3 4 5
set_intersect(1, 2)
#> numeric(0)
set_intersect(1:3, 1:2)
#> [1] 1 2

set_equal(1:5, 1:5)
#> [1] TRUE
set_equal(1:6, 1:5)
#> [1] FALSE
set_equal(1:5, 1:6)
#> [1] FALSE

Created on 2019-08-27 by the reprex package (v0.2.1)

@lionel-
Copy link
Member

lionel- commented Oct 21, 2019

The implementation of set_intersect() would be simpler with a nomatch parameter (cf #568):

set_intersect <- function(x, y) {
  vctrs::vec_slice(x, match(y, x, 0))
}

Also if we keep the names we probably should be consistently slicing the LHS.

@lionel-
Copy link
Member

lionel- commented Nov 7, 2019

Should not return duplicate elements, as in:

setdiff(c(1, 1), 0)
#> [1] 1

But then if we keep the names, which ones do we keep? Maybe that's why we shouldn't keep them. Alternatively, the set could be made of name-value pairs, but will that be surprising or error prone?

@lionel-
Copy link
Member

lionel- commented May 8, 2020

set-intersect needs to sort the matched locations to preserve the order of the first input, see r-lib/tidyselect#186

@DavisVaughan
Copy link
Member Author

Should these go in funs?

@lionel-
Copy link
Member

lionel- commented May 11, 2021

I think so too.

@DavisVaughan
Copy link
Member Author

Moved to tidyverse/funs#67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants