Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last.default() isn't working properly for data.frames with indices as row names #226

Closed
DavisVaughan opened this issue Mar 1, 2018 · 7 comments
Assignees
Labels
bug

Comments

@DavisVaughan
Copy link

@DavisVaughan DavisVaughan commented Mar 1, 2018

Super quick reprex:

library(xts)
ex <- data.frame(x = c(1,2), row.names = c("2017-01-01", "2017-01-02"))
last(ex)
#> Error in `[.data.frame`(x, ((NROW(x) - n + 1):NROW(x))): undefined columns selected

Theoretically, if a data.frame has row names that correspond to indices it can be converted to xts with no issue. Because of this, this block of the if statement in xts:::last.default() is run:

xx <- try.xts(x, error = FALSE)
if (is.xts(xx)) {
    xx <- last.xts(x, n = n, keep = keep, ...)
    return(reclass(xx))
}

However, inside last.xts(x, ...) the row-wise subsetting of xts is used. This causes problems with data.frames as you can see in the above error.

Is there any reason you can't just pass xx to last.xts() rather than x? The reclass() should bring it back to a data.frame() in the end. There may be issues with other objects that can be converted to xts that I am not immediately seeing.

So maybe this?

xx <- try.xts(x, error = FALSE)
if (is.xts(xx)) {
    xxx <- last.xts(xx, n = n, keep = keep, ...)
    return(reclass(xxx))
}
@DavisVaughan
Copy link
Author

@DavisVaughan DavisVaughan commented Mar 1, 2018

FWIW, I forked and changed that bit and your unit tests all still pass. However, I don't know if anything was specifically testing for this.

Yay

library(xts)
ex <- data.frame(x = c(1,2), row.names = c("2017-01-01", "2017-01-02"))
ex_last <- last(ex, n = "day")
ex_last
#>            x
#> 2017-01-02 2

class(ex_last)
#> [1] "data.frame"
@joshuaulrich
Copy link
Owner

@joshuaulrich joshuaulrich commented Mar 2, 2018

Thanks for the report! I don't believe there are any unit tests for this... so I'll investigate the code and change history to see if I can uncover any lurking dragons.

@DavisVaughan
Copy link
Author

@DavisVaughan DavisVaughan commented Mar 2, 2018

I appreciate it!

@joshuaulrich
Copy link
Owner

@joshuaulrich joshuaulrich commented Mar 2, 2018

Notes to future me: This first appears in xts in commit 17ca5ea, which appears to be a direct move from quantmod somewhere close to commit joshuaulrich/quantmod@8f9c1bc. Code was finally deleted in joshuaulrich/quantmod@5b2197c.

@joshuaulrich
Copy link
Owner

@joshuaulrich joshuaulrich commented Mar 6, 2018

first.default() is also broken, but in an insidious way:

ex <- data.frame(x = 1:9, y = 9:1, row.names = format(Sys.Date()-9:1))
first(ex)
#            x
# 2018-02-25 1
# 2018-02-26 2
# 2018-02-27 3
# 2018-02-28 4
# 2018-03-01 5
# 2018-03-02 6
# 2018-03-03 7
# 2018-03-04 8
# 2018-03-05 9
first(as.xts(ex))
#            x y
# 2018-02-25 1 9
@joshuaulrich joshuaulrich self-assigned this Mar 6, 2018
@joshuaulrich joshuaulrich added the bug label Mar 6, 2018
@DavisVaughan
Copy link
Author

@DavisVaughan DavisVaughan commented Mar 6, 2018

Seems to be from xx <- x[1:n] in first.xts(). So the same rowwise subsetting is the culprit.

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

Update: Speed looks about the same.

library(xts)
data("sample_matrix")
ex <- as.xts(sample_matrix)

idx <- 1:30

identical(ex["2007-01"], ex[idx])
#> [1] TRUE

microbenchmark::microbenchmark(
  ex["2007-01",],
  ex["2007-01"],
  ex[idx,],
  ex[idx],
  times = 1000
)
#> Unit: microseconds
#>             expr     min       lq      mean  median       uq       max
#>  ex["2007-01", ] 652.028 713.3365 833.68885 752.507 856.4175  6895.690
#>    ex["2007-01"] 650.126 711.9830 900.25484 749.143 843.0755 74472.160
#>        ex[idx, ]  14.795  19.0500  25.52245  24.656  27.2795   122.928
#>          ex[idx]  14.997  18.8765  25.11938  24.213  26.7725   161.784
#>  neval
#>   1000
#>   1000
#>   1000
#>   1000
@joshuaulrich
Copy link
Owner

@joshuaulrich joshuaulrich commented Mar 7, 2018

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

While hypotheses about various fixes are potentially helpful, it's even more helpful to have unit tests that validate the intended functionality. For example, see my commits on the 226-first-last branch. The tests for ts came from failing examples after my first attempted fix idea. So even though that fix didn't work, the attempt did provide a new unit test case.

joshuaulrich added a commit that referenced this issue Jul 9, 2018
These tests show that first() and last() work as expected for zoo
objects and vectors. The tests for data.frame and matrix objects are
deactivated because they either error or fail.

Note that first(x, -n) returns the same result as tail(x, -n), and
last(x, -n) returns the same result as head(x, -n).

See #226.
joshuaulrich added a commit that referenced this issue Jul 9, 2018
Make first() and last() keep dims when a regular row subset would
normally drop them. This is consistent with head() and tail().

Re-activate all unit tests for first() and last().

See #226.
@joshuaulrich joshuaulrich added this to the Release 0.11-0 milestone Jul 30, 2018
joshuaulrich added a commit that referenced this issue Jul 4, 2019
The fix for #226 introduced a regression for the case when 'x' is an
xtsible vector and 'n' is a character period (e.g. "1 day"). This
occurred because the default methods for first() and last() call the
xts methods--on the *original* object--if 'x' can be coerced to xts.

For example, a call to
    first(.Date(1:3), "1 day")
would call
    first.xts(.Date(1:3), "1 day")

This was a problem because first.xts() and last.xts() assumed 'x'
would always have dims.

Fixes #303.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.