Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time of day subsetting by hour only giving unexpected result #326

Closed
claymoremarshall opened this issue Feb 10, 2020 · 2 comments · Fixed by #327
Closed

Time of day subsetting by hour only giving unexpected result #326

claymoremarshall opened this issue Feb 10, 2020 · 2 comments · Fixed by #327
Milestone

Comments

@claymoremarshall
Copy link

claymoremarshall commented Feb 10, 2020

The recent time of day subsetting changes have generated some unexpected results.

library(quantstrat)  # get some intraday bar data
.from='2002-10-21'
.to='2002-10-31'
getSymbols.FI(Symbols='GBPUSD',
              dir=system.file('extdata',package='quantstrat'),
              from=.from, to=.to
              , extension = 'rda'
              , use_identifier=NA
)

Subsetting by just hour used to work, but now gives incorrect results, and also seems to still depend on indexTZ:


head(GBPUSD["T02/T03"])
# Open   High    Low  Close Volume
# 2002-10-22 1.5429 1.5430 1.5429 1.5430      0
# 2002-10-23 1.5464 1.5464 1.5464 1.5464      0
# 2002-10-24 1.5473 1.5473 1.5472 1.5472      0
# 2002-10-25 1.5544 1.5544 1.5544 1.5544      0
# 2002-10-28 1.5474 1.5474 1.5474 1.5474      0
# 2002-10-29 1.5583 1.5583 1.5583 1.5583      0
# Warning message:
#   'indexTZ' is deprecated.
# Use 'tzone' instead.
# See help("Deprecated") and help("xts-deprecated"). 

But we do at least get expected results if including the minutes (but still have the indexTZ issue)

head(GBPUSD["T02:00/T03:00"])
# Open   High    Low  Close Volume
# 2002-10-21 02:00:00 1.5480 1.5480 1.5480 1.5480      0
# 2002-10-21 02:01:00 1.5480 1.5480 1.5480 1.5480      0
# 2002-10-21 02:02:00 1.5479 1.5480 1.5479 1.5480      0
# 2002-10-21 02:03:00 1.5482 1.5482 1.5482 1.5482      0
# 2002-10-21 02:04:00 1.5481 1.5482 1.5481 1.5482      0
# 2002-10-21 02:05:00 1.5481 1.5482 1.5481 1.5482      0
# Warning message:
#   'indexTZ' is deprecated.
# Use 'tzone' instead.
# See help("Deprecated") and help("xts-deprecated"). 

Happy to work on a PR to fix this issue if it hasn't been identified yet?

@joshuaulrich
Copy link
Owner

joshuaulrich commented Feb 11, 2020

Thanks for the report! You are the first to identify and report this bug. Some background: this functionality used to be handled by .parseISO8601(), but is now handled by .subsetTimeOfDay().

I just pushed a change to replace the indexTZ() call with tzone(), so that warning should go away. Thanks for catching that too.

It would be great if you could try to fix this, and even better if you add unit tests to cover this case. As you suggested in an email, it would make sense to add some intraday data to the package to use for examples and tests. Similar to how we use sample_matrix.

I imagine there are going to be several edge cases we need to cover. So please don't hesitate to discuss them here before spending time coding a solution.

@claymoremarshall
Copy link
Author

claymoremarshall commented Feb 17, 2020

A couple of edge cases related to a "standard" time of day subset, e.g.

x["T02:00/T03:00"]

might be:

  1. x["T0200/T0300"] (currently fails because the as.POSIXct conversion in timestringToSeconds fails for "1970-01-01 0200").

Does the subsetting want to support this, or enforce users to do "T02:00/T03:00"?

  1. Do you want to support full support for odd edge cases such as:
    i) x["T02/T02"] to return all data for the second hour
    ii) Just throwing it out there, but would support for x["T02"] to return all data with 02:xx:xx timestamps (which would be same as subsetting with .indexhour(x) == 2)? I'd be inclined not to permit this, but maybe there would be some value in it

In the case of just supplying the hours, would you now expect
x["T02/T03"] to be equivalent to x[.indexhour(x) %in% c(2,3) or x[.indexhour(x) ==2 | .indexhour(x) == 3 & .indexmin(x) == 0 & .indexsec(x) == 0]. IIRC xts supported the former previously. If we use the former, we'll have to modify timeStringToSeconds for the start and end supplied timepoints.

claymoremarshall pushed a commit to claymoremarshall/xts that referenced this issue Feb 19, 2020
e.g.  subsetting with "T01/T03" does not currently give expected
results.

This requires fixing .subsetTimeOfDay()

Unit tests also updated for time of day subsetting.

Fixes joshuaulrich#326
joshuaulrich added a commit to claymoremarshall/xts that referenced this issue Sep 7, 2020
Remove support for non-zero-padded minutes and seconds. The ISO 8601
standard requires all time components be zero-padded. That said, we
do support hours that aren't zero-padded, for convenience.

Also make the validation regex stricter. The first digit for minutes
and seconds cannot be > 5. Separate regex into multiple pieces, to
make them easier to understand later.

See joshuaulrich#326. See joshuaulrich#327.
joshuaulrich added a commit to claymoremarshall/xts that referenced this issue Sep 7, 2020
The initial change for this fix only supports the extended format.
(hh:mm:ss.sss). We supported the basic format (hhmmss.sss) previously.
This restores the original functionality, making the colon optional.

I noticed how lastof() was being used to get the last timestamp for
the time string, and thought it would be good to find the first
timestamp in a similar manner. The getTimeComponents() function
extracts the hours, minutes, seconds, and sub-seconds into a list
that we can use with firstof() and lastof().

Note that firstof() returns the first timestamp in 1970, and lastof()
returns a timestamp at the end of 1970. The difference in years,
months, and days does not matter because we're only concerned with the
time of day components.

See joshuaulrich#326. See joshuaulrich#327.
joshuaulrich pushed a commit to claymoremarshall/xts that referenced this issue Sep 7, 2020
e.g.  subsetting with "T01/T03" does not currently give expected
results.

This requires fixing .subsetTimeOfDay()

Unit tests also updated for time of day subsetting.

Fixes joshuaulrich#326
joshuaulrich added a commit to claymoremarshall/xts that referenced this issue Sep 7, 2020
Remove support for non-zero-padded minutes and seconds. The ISO 8601
standard requires all time components be zero-padded. That said, we
do support hours that aren't zero-padded, for convenience.

Also make the validation regex stricter. The first digit for minutes
and seconds cannot be > 5. Separate regex into multiple pieces, to
make them easier to understand later.

See joshuaulrich#326. See joshuaulrich#327.
joshuaulrich added a commit to claymoremarshall/xts that referenced this issue Sep 7, 2020
The initial change for this fix only supports the extended format.
(hh:mm:ss.sss). We supported the basic format (hhmmss.sss) previously.
This restores the original functionality, making the colon optional.

I noticed how lastof() was being used to get the last timestamp for
the time string, and thought it would be good to find the first
timestamp in a similar manner. The getTimeComponents() function
extracts the hours, minutes, seconds, and sub-seconds into a list
that we can use with firstof() and lastof().

Note that firstof() returns the first timestamp in 1970, and lastof()
returns a timestamp at the end of 1970. The difference in years,
months, and days does not matter because we're only concerned with the
time of day components.

See joshuaulrich#326. See joshuaulrich#327.
joshuaulrich pushed a commit that referenced this issue Sep 7, 2020
e.g.  subsetting with "T01/T03" does not currently give expected
results.

This requires fixing .subsetTimeOfDay()

Unit tests also updated for time of day subsetting.

Fixes #326
joshuaulrich added a commit that referenced this issue Sep 7, 2020
Remove support for non-zero-padded minutes and seconds. The ISO 8601
standard requires all time components be zero-padded. That said, we
do support hours that aren't zero-padded, for convenience.

Also make the validation regex stricter. The first digit for minutes
and seconds cannot be > 5. Separate regex into multiple pieces, to
make them easier to understand later.

See #326. See #327.
joshuaulrich added a commit that referenced this issue Sep 7, 2020
The initial change for this fix only supports the extended format.
(hh:mm:ss.sss). We supported the basic format (hhmmss.sss) previously.
This restores the original functionality, making the colon optional.

I noticed how lastof() was being used to get the last timestamp for
the time string, and thought it would be good to find the first
timestamp in a similar manner. The getTimeComponents() function
extracts the hours, minutes, seconds, and sub-seconds into a list
that we can use with firstof() and lastof().

Note that firstof() returns the first timestamp in 1970, and lastof()
returns a timestamp at the end of 1970. The difference in years,
months, and days does not matter because we're only concerned with the
time of day components.

See #326. See #327.
@joshuaulrich joshuaulrich added this to the 0.12-1 milestone Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants