better operators for interval-to-interval relations #6

mgacc0 · 2016-12-06T22:37:08Z

It would be really great to implement some new operators for interval-to-interval overlapping.

Instead of having the %[o]% operator for checking the overlapping between the closed intervals [a, b] and [x, y]
expressed as c(a, b) %[o]% c(x, y),
it would be great having the %[]o[]% operator
expressed as c(a, b) %[]o[]% c(x, y).

And then, for example:

c(1, 2) %[]o[]% c(2, 3)
# TRUE

c(2, 3) %[]o[]% c(1, 2)
# TRUE

c(1, 2) %()o()% c(2, 3)
# FALSE

c(2, 3) %()o()% c(1, 2)
# FALSE

Consequently, that would imply creating 32 operators (for all the combinations of open/closed endpoints):

interval-to-interval operator
`%[]o[]%`
`%[]o[)%`
`%[]o(]%`
`%[]o()%`
`%[)o[]%`
`%[)o[)%`
`%[)o(]%`
`%[)o()%`
... (continues)

The text was updated successfully, but these errors were encountered:

psolymos · 2016-12-07T01:22:03Z

It crossed my mind to implement these, however, here is why I did not in the 1st go round:

I found the %[]o[]% notation too long (almost looks like a TIE fighter), but it is kind of unavoidable.
It is a lot of functions (4x4=16) that needs a lot of unit-tests.
I did not see an immediate use case.

Out of these, point 3 is the most important, the other 2 are just stylistic/technical details. Do you have any suggestions on point 3?

The symmetric cases could be shortened, e.g. %[]o[]%=%[o]%, %()o()%=%(o)%.

mgacc0 · 2016-12-07T08:39:30Z

The notation would be a bit long but, anyway, is intuitive (and quite equivalent to the mathematical notation).
So that would make it easy to adopt.
They are 16 functions. (I don't know why I wrote 32 previously...)
In administrative data processing we use a lot of datetime ranges.
And it's very usual to write them as:
"from 01.12.2016 to 08.12.2016",
meaning the semiopen interval ["2016-12-01 00:00:00", "2016-12-08 00:00:00").

It would be a bit tedious:
- having to write ["2016-12-01 00:00:00", "2016-12-07 23:59:59"]
- or having to make two comparisons: one for overlapping with closed ranges and the next to check for the coincidence of endpoints

mgacc0 · 2016-12-07T09:39:56Z

I propose something like this:

.intrval3 <-
  function(interval1, type1, interval2, type2)
  {
    ab <- .get_intrval(interval1)
    cd <- .get_intrval(interval2)
    
    type1 <- match.arg(type1, c("[]", "[)", "(]", "()"))
    type2 <- match.arg(type2, c("[]", "[)", "(]", "()"))

    if (ab$a > cd$a) {
      tmp <- ab
      ab <- cd
      cd <- tmp

      tmp <- type1
      type1 <- type2
      type2 <- tmp
    }

    ! .lssthan(ab$b, cd,
               ifelse(substr(type1, 2L, 2L)=="]" & substr(type2, 1L, 1L)=="[",
                      "[", "("))
  }

.intrval3(c(1, 2), "()", c(2, 3), "()")
#  [1] FALSE

.intrval3(c(1, 2), "[]", c(2, 3), "[)")
#  [1] TRUE

psolymos · 2016-12-07T16:43:14Z

OK, I will make a branch and start development with your proposal.

psolymos · 2016-12-08T02:55:37Z

@mgacc0 see here what I have so far in overlap-operators branch: https://github.com/psolymos/intrval/blob/overlap-operators/extras/playground.R

psolymos · 2016-12-08T04:22:13Z

@mgacc0 I could test your function and it passed all the comparisons I came up with. The only issue right now is that it works for simple case but not yet vectorized (i.e. cannot accept matrices, lists because of the if (iv1$a > iv2$a) {...} statement).

mgacc0 · 2016-12-08T21:44:02Z

Firstly, a necessary correction: checking for empty intervals.

is_empty_interval <- function(interval, type) {
  interval[[1]]==interval[[2]] & type=="()"
}

overlaps <- function (interval1, interval2, type1, type2)
{
    stopifnot(is.vector(interval1) && length(interval1)<=2)
    stopifnot(is.vector(interval2) && length(interval2)<=2)

    iv1 <- .get_intrval(interval1)
    iv2 <- .get_intrval(interval2)

    type1 <- match.arg(type1, c("[]", "[)", "(]", "()"))
    type2 <- match.arg(type2, c("[]", "[)", "(]", "()"))

    if (is_empty_interval(iv1, type1) | is_empty_interval(iv2, type2)) {
      return(FALSE)
    }

    
    if (iv1$a > iv2$a) {
        tmp <- iv1
        iv1 <- iv2
        iv2 <- tmp

        tmp <- type1
        type1 <- type2
        type2 <- tmp
    }

    !.lssthan(iv1$b, iv2,
        ifelse(substr(type1, 2L, 2L)=="]" && substr(type2, 1L, 1L)=="[",
        "[", "("))
}

I would rename the .intrval3 function to overlaps and make it visible.

For easier testing, I would recommend the testthat package:

library(testthat)
expect_false(overlaps(c(2, 4), c(2, 2), "[]", "()"))
expect_false(overlaps(c(2, 2), c(2, 4), "()", "[]"))

For calling over dataframes, you could do something like this:

library(dplyr, purrr)

mm <- expand.grid(
  iv1_a=c(1, 2, 3, 4, 5),
  iv1_b=c(1, 2, 3, 4, 5),
  iv2_a=c(1, 2, 3, 4, 5),
  iv2_b=c(1, 2, 3, 4, 5),
  type1=c("[]", "[)", "(]", "()"),
  type2=c("[]", "[)", "(]", "()"),
  stringsAsFactors=FALSE)
nrow(mm)
# [1] 10000

mm %>%
  sample_n(4)
#      iv1_a iv1_b iv2_a iv2_b type1 type2
# 694      4     4     3     1    [)    []
# 612      2     3     5     5    []    []
# 1320     5     4     3     1    (]    []
# 6273     3     5     1     1    (]    (]

mm %>%
  sample_n(10) %>%
  by_row(~ overlaps(c(.[[1]], .[[2]]), c(.[[3]], .[[4]]),
                    .[[5]], .[[6]]), .collate="cols")
# # A tibble: 10 × 7
#    iv1_a iv1_b iv2_a iv2_b type1 type2  .out
#    <dbl> <dbl> <dbl> <dbl> <chr> <chr> <lgl>
# 1      5     2     2     2    (]    []  TRUE
# 2      1     4     4     5    ()    () FALSE
# 3      3     5     2     5    ()    []  TRUE
# 4      4     2     4     3    (]    (]  TRUE
# 5      5     3     2     1    (]    (] FALSE
# 6      2     3     2     4    (]    []  TRUE
# 7      2     1     4     2    ()    [) FALSE
# 8      5     2     1     4    (]    (]  TRUE
# 9      4     1     2     5    [)    []  TRUE
# 10     4     2     3     5    (]    [)  TRUE

mm %>%
  .[, 1:4] %>%
  sample_n(10) %>%
  by_row(~ c(.[[1]], .[[2]]) %[]o[]% c(.[[3]], .[[4]]), .collate="cols")
# # A tibble: 10 × 5
#    iv1_a iv1_b iv2_a iv2_b  .out
#    <dbl> <dbl> <dbl> <dbl> <lgl>
# 1      5     3     3     4  TRUE
# 2      3     1     2     2  TRUE
# 3      3     2     5     4 FALSE
# 4      3     2     2     4  TRUE
# 5      2     3     4     2  TRUE
# 6      3     4     2     5  TRUE
# 7      3     3     3     3  TRUE
# 8      3     3     4     1  TRUE
# 9      1     5     2     3  TRUE
# 10     2     5     2     4  TRUE

The notation for single vectors (intervals) is intuitive.
But I think that the intrval library doesn't need to provide an operator for data.frames.
You could place an example (on the README) of using the function purrr::by_row and overlaps.

Anyway, if you want to know how a function for data.frames would be:

intrval4 <- function (df1)
{
  stopifnot(is.data.frame(df1))
  stopifnot(nrow(df1)==1)

  iv1 <- .get_intrval(c(df1$iv1_a, df1$iv1_b))
  iv2 <- .get_intrval(c(df1$iv2_a, df1$iv2_b))

  type1 <- match.arg(df1$type1, c("[]", "[)", "(]", "()"))
  type2 <- match.arg(df1$type2, c("[]", "[)", "(]", "()"))

  if (is_empty_interval(iv1, type1) | is_empty_interval(iv1, type1)) {
    return (FALSE)
  }

  if (iv1$a > iv2$a) {
    tmp <- iv1
    iv1 <- iv2
    iv2 <- tmp

    tmp <- type1
    type1 <- type2
    type2 <- tmp
  }

  !.lssthan(iv1$b, iv2,
            ifelse(substr(type1, 2L, 2L)=="]" && substr(type2, 1L, 1L)=="[",
                   "[", "("))
}

mm %>%
  sample_n(20) %>%
  by_row(intrval4, .collate=c("cols"))

psolymos · 2016-12-10T03:22:59Z

Thanks. I just wanted to make the operators consistent with the %[o]% cases that can accept different formats on both ends and recycle values if needed. I agree it is not absolutely necessary for all use cases, but still nice to have.

I like your testing approach, but want to avoid too much dependencies. Even if I ignore /tests for CRAN versions, CI will complain about missing Suggests declarations.

psolymos · 2016-12-10T04:49:13Z

This function seems to do the job:

.intrval3 <-
function(interval1, interval2, type1, type2)
{
    iv1 <- .get_intrval(interval1)
    iv2 <- .get_intrval(interval2)

    type1 <- match.arg(type1, c("[]", "[)", "(]", "()"))
    type2 <- match.arg(type2, c("[]", "[)", "(]", "()"))

    b1 <- ifelse(iv1$a < iv2$a, iv1$b, iv2$b)
    a2 <- ifelse(iv1$a < iv2$a, iv2$a, iv1$a)
    type1v <- ifelse(iv1$a < iv2$a, substr(type1, 2L, 2L), substr(type2, 2L, 2L))
    type2v <- ifelse(iv1$a < iv2$a, substr(type2, 1L, 1L), substr(type1, 1L, 1L))

    ifelse(type1v == "]" & type2v == "[",
        b1 >= a2,
        b1 > a2)
}

The idea is similar to your proposal:

sort intervals based on lower endpoints (a1 < a2)
compare b1 to a2 based on open/closed endpoints.

mgacc0 · 2016-12-10T16:18:48Z

Could you have a look at ../pull/13 ?

psolymos · 2016-12-22T00:27:10Z

@mgacc0 I merged the overlap-operators branch, see comments on #15

Next steps include more testing and improving code coverage (it dropped because right now only internals are tested).

psolymos added the enhancement label Dec 7, 2016

psolymos closed this as completed Jan 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better operators for interval-to-interval relations #6

better operators for interval-to-interval relations #6

mgacc0 commented Dec 6, 2016

psolymos commented Dec 7, 2016

mgacc0 commented Dec 7, 2016 •

edited

mgacc0 commented Dec 7, 2016 •

edited

psolymos commented Dec 7, 2016

psolymos commented Dec 8, 2016

psolymos commented Dec 8, 2016

mgacc0 commented Dec 8, 2016

psolymos commented Dec 10, 2016

psolymos commented Dec 10, 2016

mgacc0 commented Dec 10, 2016

psolymos commented Dec 22, 2016

better operators for interval-to-interval relations #6

better operators for interval-to-interval relations #6

Comments

mgacc0 commented Dec 6, 2016

psolymos commented Dec 7, 2016

mgacc0 commented Dec 7, 2016 • edited

mgacc0 commented Dec 7, 2016 • edited

psolymos commented Dec 7, 2016

psolymos commented Dec 8, 2016

psolymos commented Dec 8, 2016

mgacc0 commented Dec 8, 2016

psolymos commented Dec 10, 2016

psolymos commented Dec 10, 2016

mgacc0 commented Dec 10, 2016

psolymos commented Dec 22, 2016

mgacc0 commented Dec 7, 2016 •

edited

mgacc0 commented Dec 7, 2016 •

edited