New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last.default() isn't working properly for data.frames with indices as row names #226

Closed
DavisVaughan opened this Issue Mar 1, 2018 · 7 comments

Comments

Projects
None yet
2 participants
@DavisVaughan

DavisVaughan commented Mar 1, 2018

Super quick reprex:

library(xts)
ex <- data.frame(x = c(1,2), row.names = c("2017-01-01", "2017-01-02"))
last(ex)
#> Error in `[.data.frame`(x, ((NROW(x) - n + 1):NROW(x))): undefined columns selected

Theoretically, if a data.frame has row names that correspond to indices it can be converted to xts with no issue. Because of this, this block of the if statement in xts:::last.default() is run:

xx <- try.xts(x, error = FALSE)
if (is.xts(xx)) {
    xx <- last.xts(x, n = n, keep = keep, ...)
    return(reclass(xx))
}

However, inside last.xts(x, ...) the row-wise subsetting of xts is used. This causes problems with data.frames as you can see in the above error.

Is there any reason you can't just pass xx to last.xts() rather than x? The reclass() should bring it back to a data.frame() in the end. There may be issues with other objects that can be converted to xts that I am not immediately seeing.

So maybe this?

xx <- try.xts(x, error = FALSE)
if (is.xts(xx)) {
    xxx <- last.xts(xx, n = n, keep = keep, ...)
    return(reclass(xxx))
}
@DavisVaughan

This comment has been minimized.

Show comment
Hide comment
@DavisVaughan

DavisVaughan Mar 1, 2018

FWIW, I forked and changed that bit and your unit tests all still pass. However, I don't know if anything was specifically testing for this.

Yay

library(xts)
ex <- data.frame(x = c(1,2), row.names = c("2017-01-01", "2017-01-02"))
ex_last <- last(ex, n = "day")
ex_last
#>            x
#> 2017-01-02 2

class(ex_last)
#> [1] "data.frame"

DavisVaughan commented Mar 1, 2018

FWIW, I forked and changed that bit and your unit tests all still pass. However, I don't know if anything was specifically testing for this.

Yay

library(xts)
ex <- data.frame(x = c(1,2), row.names = c("2017-01-01", "2017-01-02"))
ex_last <- last(ex, n = "day")
ex_last
#>            x
#> 2017-01-02 2

class(ex_last)
#> [1] "data.frame"
@joshuaulrich

This comment has been minimized.

Show comment
Hide comment
@joshuaulrich

joshuaulrich Mar 2, 2018

Owner

Thanks for the report! I don't believe there are any unit tests for this... so I'll investigate the code and change history to see if I can uncover any lurking dragons.

Owner

joshuaulrich commented Mar 2, 2018

Thanks for the report! I don't believe there are any unit tests for this... so I'll investigate the code and change history to see if I can uncover any lurking dragons.

@DavisVaughan

This comment has been minimized.

Show comment
Hide comment
@DavisVaughan

DavisVaughan Mar 2, 2018

I appreciate it!

DavisVaughan commented Mar 2, 2018

I appreciate it!

@joshuaulrich

This comment has been minimized.

Show comment
Hide comment
@joshuaulrich

joshuaulrich Mar 2, 2018

Owner

Notes to future me: This first appears in xts in commit 17ca5ea, which appears to be a direct move from quantmod somewhere close to commit joshuaulrich/quantmod@8f9c1bc. Code was finally deleted in joshuaulrich/quantmod@5b2197c.

Owner

joshuaulrich commented Mar 2, 2018

Notes to future me: This first appears in xts in commit 17ca5ea, which appears to be a direct move from quantmod somewhere close to commit joshuaulrich/quantmod@8f9c1bc. Code was finally deleted in joshuaulrich/quantmod@5b2197c.

@joshuaulrich

This comment has been minimized.

Show comment
Hide comment
@joshuaulrich

joshuaulrich Mar 6, 2018

Owner

first.default() is also broken, but in an insidious way:

ex <- data.frame(x = 1:9, y = 9:1, row.names = format(Sys.Date()-9:1))
first(ex)
#            x
# 2018-02-25 1
# 2018-02-26 2
# 2018-02-27 3
# 2018-02-28 4
# 2018-03-01 5
# 2018-03-02 6
# 2018-03-03 7
# 2018-03-04 8
# 2018-03-05 9
first(as.xts(ex))
#            x y
# 2018-02-25 1 9
Owner

joshuaulrich commented Mar 6, 2018

first.default() is also broken, but in an insidious way:

ex <- data.frame(x = 1:9, y = 9:1, row.names = format(Sys.Date()-9:1))
first(ex)
#            x
# 2018-02-25 1
# 2018-02-26 2
# 2018-02-27 3
# 2018-02-28 4
# 2018-03-01 5
# 2018-03-02 6
# 2018-03-03 7
# 2018-03-04 8
# 2018-03-05 9
first(as.xts(ex))
#            x y
# 2018-02-25 1 9

@joshuaulrich joshuaulrich self-assigned this Mar 6, 2018

@joshuaulrich joshuaulrich added the bug label Mar 6, 2018

@DavisVaughan

This comment has been minimized.

Show comment
Hide comment
@DavisVaughan

DavisVaughan Mar 6, 2018

Seems to be from xx <- x[1:n] in first.xts(). So the same rowwise subsetting is the culprit.

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

Update: Speed looks about the same.

library(xts)
data("sample_matrix")
ex <- as.xts(sample_matrix)

idx <- 1:30

identical(ex["2007-01"], ex[idx])
#> [1] TRUE

microbenchmark::microbenchmark(
  ex["2007-01",],
  ex["2007-01"],
  ex[idx,],
  ex[idx],
  times = 1000
)
#> Unit: microseconds
#>             expr     min       lq      mean  median       uq       max
#>  ex["2007-01", ] 652.028 713.3365 833.68885 752.507 856.4175  6895.690
#>    ex["2007-01"] 650.126 711.9830 900.25484 749.143 843.0755 74472.160
#>        ex[idx, ]  14.795  19.0500  25.52245  24.656  27.2795   122.928
#>          ex[idx]  14.997  18.8765  25.11938  24.213  26.7725   161.784
#>  neval
#>   1000
#>   1000
#>   1000
#>   1000

DavisVaughan commented Mar 6, 2018

Seems to be from xx <- x[1:n] in first.xts(). So the same rowwise subsetting is the culprit.

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

Update: Speed looks about the same.

library(xts)
data("sample_matrix")
ex <- as.xts(sample_matrix)

idx <- 1:30

identical(ex["2007-01"], ex[idx])
#> [1] TRUE

microbenchmark::microbenchmark(
  ex["2007-01",],
  ex["2007-01"],
  ex[idx,],
  ex[idx],
  times = 1000
)
#> Unit: microseconds
#>             expr     min       lq      mean  median       uq       max
#>  ex["2007-01", ] 652.028 713.3365 833.68885 752.507 856.4175  6895.690
#>    ex["2007-01"] 650.126 711.9830 900.25484 749.143 843.0755 74472.160
#>        ex[idx, ]  14.795  19.0500  25.52245  24.656  27.2795   122.928
#>          ex[idx]  14.997  18.8765  25.11938  24.213  26.7725   161.784
#>  neval
#>   1000
#>   1000
#>   1000
#>   1000
@joshuaulrich

This comment has been minimized.

Show comment
Hide comment
@joshuaulrich

joshuaulrich Mar 7, 2018

Owner

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

While hypotheses about various fixes are potentially helpful, it's even more helpful to have unit tests that validate the intended functionality. For example, see my commits on the 226-first-last branch. The tests for ts came from failing examples after my first attempted fix idea. So even though that fix didn't work, the attempt did provide a new unit test case.

Owner

joshuaulrich commented Mar 7, 2018

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

While hypotheses about various fixes are potentially helpful, it's even more helpful to have unit tests that validate the intended functionality. For example, see my commits on the 226-first-last branch. The tests for ts came from failing examples after my first attempted fix idea. So even though that fix didn't work, the attempt did provide a new unit test case.

joshuaulrich added a commit that referenced this issue Jul 9, 2018

Add unit tests for first() and last()
These tests show that first() and last() work as expected for zoo
objects and vectors. The tests for data.frame and matrix objects are
deactivated because they either error or fail.

Note that first(x, -n) returns the same result as tail(x, -n), and
last(x, -n) returns the same result as head(x, -n).

See #226.

joshuaulrich added a commit that referenced this issue Jul 9, 2018

Do not drop dims from row subset
Make first() and last() keep dims when a regular row subset would
normally drop them. This is consistent with head() and tail().

Re-activate all unit tests for first() and last().

See #226.

@joshuaulrich joshuaulrich added this to the Release 0.11-0 milestone Jul 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment