Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission tsbox: Class-Agnostic Time Series #464

Open
10 of 11 tasks
christophsax opened this issue Sep 17, 2021 · 29 comments
Open
10 of 11 tasks

Submission tsbox: Class-Agnostic Time Series #464

christophsax opened this issue Sep 17, 2021 · 29 comments

Comments

@christophsax
Copy link

@christophsax christophsax commented Sep 17, 2021

Submitting Author Name: Christoph Sax
Submitting Author Github Handle: @christophsax
Repository: https://github.com/christophsax/tsbox
Version submitted: 3.1.9001
Submission type: Stats
Badge grade: silver
Editor: @rkillick
Reviewers: @chamberlinc, @brunaw

Due date for @chamberlinc: 2021-11-23

Due date for @brunaw: 2021-11-23
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: tsbox
Type: Package
Title: Class-Agnostic Time Series
Version: 0.3.1.9001
Authors@R: person("Christoph", "Sax", email = "christoph.sax@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-7192-7044"))
Description: Time series toolkit with identical behavior for all
  time series classes: 'ts','xts', 'data.frame', 'data.table', 'tibble', 'zoo',
  'timeSeries', 'tsibble', 'tis' or 'irts'. Also converts reliably between these classes.
Imports:
    data.table (>= 1.10.0),
    anytime
Suggests:
    testthat,
    dplyr,
    tibble,
    tidyr,
    forecast,
    seasonal,
    dygraphs,
    xts,
    ggplot2,
    scales,
    knitr,
    rmarkdown,
    tsibble (>= 0.8.2),
    tsibbledata,
    tibbletime,
    tseries,
    units,
    zoo,
    tis,
    timeSeries,
    nycflights13,
    imputeTS,
    spelling
License: GPL-3
Encoding: UTF-8
URL: https://www.tsbox.help, https://github.com/christophsax/tsbox
BugReports: https://github.com/christophsax/tsbox/issues
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.1.2
VignetteBuilder: knitr
Depends:
    R (>= 2.10)
Config/testthat/parallel: true
Config/testthat/edition: 3
Language: en-US

Pre-submission Inquiry

  • A pre-submission inquiry has been approved in issue#457

General Information

  • Who is the target audience and what are scientific applications of this package?

Anyone who works with time series. Many statistical packages require time series to be in a certain object (ts, xts, tsibble, data.frame). tsbox facilitates the conversion between these objects. It also provides a general toolkit that works the same way with all time series classes. {tsbox} is also mentioned in the rOpenSci Statistical Software Peer Review Section on Time Series.

  • Paste your responses to our General Standard G1.1 here, describing whether your software is:

    • The first implementation of a novel algorithm; or
    • The first implementation within R of an algorithm which has previously been implemented in other languages or contexts; or
    • An improvement on other implementations of similar algorithms in R.

In the rOpenSci classification, this package is An improvement on other implementations of similar algorithms in R. Many time series packages, e.g., zoo or tsibble contain converter functions from one class to another. They often convert from their class to ts objects and back, but lack converters to other time series class.

In most cases, tsbox transforms an object into an augmented data.table. And uses the data.table infrastructure for efficient joining and reshaping. After computation, it restores the original input class. This restoring feature is
was also used in the xts::reclass() function of the xts package.

  • Please include hyperlinked references to all other relevant software.

data.table: For efficient joining and reshaping
xts: Similar reclassing mechanism
tsibble: Tidy Temporal Data Frames and Tools

Not applicable.

Badging

I probably need some advice on this. I think that tsbox complies with most standards that are applicable. Generality of usage is a particular feature that should qualify the package for silver.

The most outstanding point is probably the one on generality:

Have a demonstrated generality of usage beyond one single envisioned use case.

The package facilitates work with time series in general. It can also be used to ease the burden of object testing and time series conversion for other time series packages.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

    Already published on CRAN.

  • Do you intend for this package to go on Bioconductor?

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Sep 17, 2021

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Sep 17, 2021

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Sep 17, 2021

Oops, something went wrong with our automatic package checks. Our developers have been notified and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

@mpadge
Copy link
Member

@mpadge mpadge commented Sep 17, 2021

@christophsax Sorry about the mess up here - the checks server is currently running R4.0.3, so doesn't recognise native pipes which you use in some test code. Will upgrade asap and deliver your checks.

@christophsax
Copy link
Author

@christophsax christophsax commented Sep 17, 2021

But I did't intend to use native pipes, that was an accident. I will remove them right now.

@christophsax
Copy link
Author

@christophsax christophsax commented Sep 17, 2021

I did so now. Perhaps rerunning is sufficient.

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Sep 18, 2021

Checks for tsbox (v0.3.1.9001)

git hash: aad57cea

  • ✔️ Package uses 'roxygen2'.
  • ✔️ Package has a 'contributing.md' file.
  • ✔️ Package has a 'CITATION' file.
  • ✔️ Package has a 'codemeta.json' file.
  • ✔️ All functions have examples.
  • ✔️ Package has at least one HTML vignette
  • ✔️ Package 'DESCRIPTION' has a URL field.
  • ✔️ Package 'DESCRIPTION' has a BugReports field.
  • ✔️ Package is already on CRAN.
  • ✔️ Package has continuous integration checks.
  • ✖️ Package coverage failed
  • ✖️ R CMD check found 1error.
  • ✔️ R CMD check found no warnings.
  • ✔️ All applicable standards [v0.0.1] have been documented in this package.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL-3


1. rOpenSci Statistical Standards (srr package)

This package is in the following category:

  • Time Series

✔️ All applicable standards [v0.0.1] have been documented in this package

Click here to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 53 files) and
  • 1 authors
  • 3 vignettes
  • no internal data file
  • 2 imported packages
  • 63 exported functions (median 7 lines of code)
  • 277 non-exported functions in R (median 9 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 53 96.0
files_vignettes 3 90.8
files_tests 44 98.7
loc_R 2649 88.3
loc_vignettes 344 80.6
loc_tests 1951 92.1
num_vignettes 3 93.1
n_fns_r 340 93.4
n_fns_r_exported 63 90.4
n_fns_r_not_exported 277 93.8
n_fns_per_file_r 4 51.4
num_params_per_fn 1 1.1 TRUE
loc_per_fn_r 8 29.0
loc_per_fn_r_exp 7 14.2
loc_per_fn_r_not_exp 9 41.9
rel_whitespace_R 25 91.6
rel_whitespace_vignettes 32 87.8
rel_whitespace_tests 28 98.0 TRUE
doclines_per_fn_exp 41 51.1
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 475 94.6

2a. Network visualisation

Interactive network visualisation of calls between objects in package can be viewed by clicking here


3. goodpractice and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

github

GitHub Workflow Results

name conclusion sha date
R-CMD-check success aad57c 2021-09-18

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following error:

  1. checking tests ...
    Running ‘spelling.R’
    Running ‘testthat.R’
    ERROR
    Running the tests in ‘tests/testthat.R’ failed.
    Last 13 lines of output:
    1. ├─fl[i]
    2. │ └─tsbox::ts_apply(x, ff, ...)
    3. │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
    4. │ ├─x[, fun(.SD, ...), by = eval(.by)]
    5. │ └─data.table:::[.data.table(x, , fun(.SD, ...), by = eval(.by))
    6. └─tsbox:::fun(.SD, ...)
    7. └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...)
    8. ├─seasonal::final(seasonal::seas(x, ...))
      
    9. └─seasonal::seas(x, ...)
      
    10.   └─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE)
      

[ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ]
Error: Test failures
Execution halted

R CMD check generated the following test_fail:

  1. library(testthat)
    Warning message:
    package 'testthat' was built under R version 4.1.0

library(tsbox)

test_check("tsbox")
Starting 2 test processes
══ Skipped tests ═══════════════════════════════════════════════════════════════
• On CRAN (17)

══ Failed tests ════════════════════════════════════════════════════════════════
── Error (test-units.R:41:5): tsbox works with units ───────────────────────────
Error: Process terminated
Backtrace:

  1. ├─fl[i]
  2. │ └─tsbox::ts_apply(x, ff, ...)
  3. │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
  4. │ ├─x[, fun(.SD, ...), by = eval(.by)]
  5. │ └─data.table:::[.data.table(x, , fun(.SD, ...), by = eval(.by))
  6. └─tsbox:::fun(.SD, ...)
  7. └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...)
  8. ├─seasonal::final(seasonal::seas(x, ...))
    
  9. └─seasonal::seas(x, ...)
    
  10.   └─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE)
    

[ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ]
Error: Test failures
Execution halted

R CMD check generated the following check_fail:

  1. rcmdcheck_tests_pass

Test coverage with covr

ERROR: Test Coverage Failed

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
ts_span 34
time_shift 21
ts_plot 21
copy_class 19

Static code analyses with lintr

lintr found the following 178 potential issues:

message number of times
Lines should not be more than 80 characters. 178

✔️ Package has at least one HTML vignette


Package Versions

package version
pkgstats 0.0.0.311
pkgcheck 0.0.1.486
srr 0.0.1.107


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@christophsax
Copy link
Author

@christophsax christophsax commented Sep 18, 2021

R CMD check

There seems to be an issue with the package seasonal on your system. The package is special since it uses x13binary which downloads pre-built binaries from CRAN. Usually this works fine and on most CRAN platforms.

Is there anything special about your system? There was an issue for Mac M1.

If you want to check, this should works:

library(seasonal)
seas(AirPassengers)
checkX13()

But I don't want you to drag into this. I can also remove the few tests that are using seasonal, which should then resolve the issue?

Coverage

Perhaps this is just because of the failing R CMD check? If not, is there a way to see what ist the measured coverage? I had 81.2% on my system, which passed the threshold (see below).

Report on my side

If I run pkgcheck::pkgcheck(".") on my side, I get:

#### CoveragePackage uses 'roxygen2'.Package has a 'contributing.md' file.Package has a 'CITATION' file.Package has a 'codemeta.json' file.All functions have examples.Package has at least one HTML vignettePackage 'DESCRIPTION' has a URL field.Package 'DESCRIPTION' has a BugReports field.Package is already on CRAN.Package has continuous integration checks.Package coverage is 81.2%.R CMD check found no errors.R CMD check found no warnings.All applicable standards [v0.0.1] have been documented in this package.Current status:This package may be submitted.

@noamross
Copy link
Contributor

@noamross noamross commented Sep 20, 2021

@ropensci-review-bot assign @rkillick as editor

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Sep 20, 2021

Assigned! @rkillick is now the editor

@rkillick
Copy link

@rkillick rkillick commented Oct 19, 2021

The goodpractice() checks came back with:

  • Use <- instead of =
    ts_frequency.R L120
  • Lines longer than 80 characters:
    ts_bind.R L58, L62
    ts_first_of_period.R L12, L13, L18
    test-ts_frequency.R L33

I had problems running covr() but see that it is passing on the checks above. Please fix the above and I'll look to assign reviewers as these are small changes.

@rkillick
Copy link

@rkillick rkillick commented Oct 28, 2021

Reviewer: @chamberlinc
Due date: 2021-11-18

@rkillick
Copy link

@rkillick rkillick commented Nov 1, 2021

Reviewer: @brunaw
Due date: 2021-11-22

@rkillick
Copy link

@rkillick rkillick commented Nov 2, 2021

@ropensci-review-bot add @chamberlinc to reviewers

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Nov 2, 2021

I'm sorry @rkillick, I'm afraid I can't do that. That's something only editors are allowed to do.

@maelle
Copy link
Member

@maelle maelle commented Nov 2, 2021

@rkillick I've now added you to the editors team, sorry about this! You can now repeat the comment and it should work. Thanks for your patience!

@rkillick
Copy link

@rkillick rkillick commented Nov 2, 2021

@ropensci-review-bot add @chamberlinc to reviewers

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Nov 2, 2021

@chamberlinc added to the reviewers list. Review due date is 2021-11-23. Thanks @chamberlinc for accepting to review! Please refer to our reviewer guide.

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Nov 2, 2021

@chamberlinc: If you haven't done so, please fill this form for us to update our reviewers records.

@rkillick
Copy link

@rkillick rkillick commented Nov 2, 2021

@ropensci-review-bot add @brunaw to reviewers

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Nov 2, 2021

@brunaw added to the reviewers list. Review due date is 2021-11-23. Thanks @brunaw for accepting to review! Please refer to our reviewer guide.

@ropensci-review-bot
Copy link
Collaborator

@ropensci-review-bot ropensci-review-bot commented Nov 2, 2021

@brunaw: If you haven't done so, please fill this form for us to update our reviewers records.

@chamberlinc
Copy link

@chamberlinc chamberlinc commented Dec 6, 2021

@rkillick @christophsax , thank you for the opportunity to review, and my apologies that this has taken so long. Below is my review with some comments. I also have uploaded the knitted html notebook that details the tests I ran at https://github.com/chamberlinc/tsbox-review. The code I used to test the package functions is also available in this repo.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.

Errors encountered in ts_plot(), ts_xts() and ts_long(). I am including the structure of the data frame I was working with in the code below:
Errors with ts_xts() and ts_long():

> Conn_discharge_DO <- dataRetrieval::readNWISuv(
+   siteNumbers = "01193050",
+   parameterCd = c("00060", "00300"),
+   startDate = "2019-01-01",
+   endDate = "2021-01-01"
+ )
> 
> str(Conn_discharge_DO)
'data.frame':	70519 obs. of  8 variables:
 $ agency_cd       : chr  "USGS" "USGS" "USGS" "USGS" ...
 $ site_no         : chr  "01193050" "01193050" "01193050" "01193050" ...
 $ dateTime        : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
 $ X_00060_00000   : num  39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
 $ X_00060_00000_cd: chr  "A" "A" "A" "A" ...
 $ X_00300_00000   : num  13.9 13.9 13.9 14 14 14 14 14 14 14 ...
 $ X_00300_00000_cd: chr  "A" "A" "A" "A" ...
 $ tz_cd           : chr  "UTC" "UTC" "UTC" "UTC" ...
 - attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=01193050&format=waterml,1.1&ParameterCd=00060,00300&startDT=2"| __truncated__
 - attr(*, "siteInfo")='data.frame':	1 obs. of  13 variables:
  ..$ station_nm          : chr "CONNECTICUT RIVER AT MIDDLE HADDAM, CT"
  ..$ site_no             : chr "01193050"
  ..$ agency_cd           : chr "USGS"
  ..$ timeZoneOffset      : chr "-05:00"
  ..$ timeZoneAbbreviation: chr "EST"
  ..$ dec_lat_va          : num 41.5
  ..$ dec_lon_va          : num -72.6
  ..$ srs                 : chr "EPSG:4326"
  ..$ siteTypeCd          : chr "ST-TS"
  ..$ hucCd               : chr "01080205"
  ..$ stateCd             : chr "09"
  ..$ countyCd            : chr "09007"
  ..$ network             : chr "NWIS"
 - attr(*, "variableInfo")='data.frame':	2 obs. of  7 variables:
  ..$ variableCode       : chr [1:2] "00060" "00300"
  ..$ variableName       : chr [1:2] "Streamflow, ft&#179;/s" "Dissolved oxygen, water, unfiltered, mg/L"
  ..$ variableDescription: chr [1:2] "Discharge, cubic feet per second" "Dissolved oxygen, water, unfiltered, milligrams per liter"
  ..$ valueType          : chr [1:2] "Derived Value" "Derived Value"
  ..$ unit               : chr [1:2] "ft3/s" "mg/l"
  ..$ options            : chr [1:2] "" ""
  ..$ noDataValue        : logi [1:2] NA NA
 - attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
 - attr(*, "statisticInfo")='data.frame':	1 obs. of  2 variables:
  ..$ statisticCd  : chr "00000"
  ..$ statisticName: chr ""
 - attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:53:57"
> 
> # Try viewing both parameters together
> try(str(ts_xts(Conn_discharge_DO)))
more than one [value] column detected after [time] - using the outermost.
Are you using a wide data frame? To convert, use 'ts_long()'.

[time]: 'dateTime' [value]: 'X_00300_00000' 
Error : cannot allocate vector of size 4.5 Gb
> try(str(ts_xts(ts_long(Conn_discharge_DO))))
[id] columns left of [time] column: 'agency_cd', 'site_no'
[time]: 'dateTime' 
Error : 'value' column [value] is not numeric.

Errors with ts_plot():

> character_date_dat <- data.frame(
+   DateTime = c(
+   "2011-11-11 11:11:11", "2012-12-12 12:12:12", "2021-09-25 20:07:00"
+   ),
+   Value = c(1, 2, 8)
+ )
> try({ts_plot(character_date_dat)})
[time]: 'DateTime' [value]: 'Value' 
Error in colnamesInt(x, neworder, check_dups = FALSE) : 
  argument specifying columns specify non existing column(s): cols[3]='Value'
> # I am downloading hydrological data timeseries from the USGS. This is my most frequent way of accessing timeseries data. Data comes as a data frame.
> Eno_discharge <- dataRetrieval::readNWISuv(
+   siteNumbers = "02085070",
+   parameterCd = "00060",
+   startDate = "2019-01-01",
+   endDate = "2021-01-01"
+ )
> # View the structure of the data frame
> str(Eno_discharge)
'data.frame':	70267 obs. of  6 variables:
 $ agency_cd       : chr  "USGS" "USGS" "USGS" "USGS" ...
 $ site_no         : chr  "02085070" "02085070" "02085070" "02085070" ...
 $ dateTime        : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
 $ X_00060_00000   : num  424 421 424 428 428 428 428 428 435 435 ...
 $ X_00060_00000_cd: chr  "A" "A" "A" "A" ...
 $ tz_cd           : chr  "UTC" "UTC" "UTC" "UTC" ...
 - attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=02085070&format=waterml,1.1&ParameterCd=00060&startDT=2019-01"| __truncated__
 - attr(*, "siteInfo")='data.frame':	1 obs. of  13 variables:
  ..$ station_nm          : chr "ENO RIVER NEAR DURHAM, NC"
  ..$ site_no             : chr "02085070"
  ..$ agency_cd           : chr "USGS"
  ..$ timeZoneOffset      : chr "-05:00"
  ..$ timeZoneAbbreviation: chr "EST"
  ..$ dec_lat_va          : num 36.1
  ..$ dec_lon_va          : num -78.9
  ..$ srs                 : chr "EPSG:4326"
  ..$ siteTypeCd          : chr "ST"
  ..$ hucCd               : chr "03020201"
  ..$ stateCd             : chr "37"
  ..$ countyCd            : chr "37063"
  ..$ network             : chr "NWIS"
 - attr(*, "variableInfo")='data.frame':	1 obs. of  7 variables:
  ..$ variableCode       : chr "00060"
  ..$ variableName       : chr "Streamflow, ft&#179;/s"
  ..$ variableDescription: chr "Discharge, cubic feet per second"
  ..$ valueType          : chr "Derived Value"
  ..$ unit               : chr "ft3/s"
  ..$ options            : chr ""
  ..$ noDataValue        : logi NA
 - attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
 - attr(*, "statisticInfo")='data.frame':	1 obs. of  2 variables:
  ..$ statisticCd  : chr "00000"
  ..$ statisticName: chr ""
 - attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:59:00"
> # Try viewing the discharge data using the ts_plot function
> try({ts_plot(Eno_discharge)})
[time]: 'dateTime' [value]: 'X_00060_00000' 
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) : 
  Items of 'old' not found in column names: [X_00060_00000]. Consider skip_absent=TRUE.
> try({ts_plot(Eno_discharge, skip_absent=TRUE)})
Error in FUN(X[[i]], ...) : ts_boxable(x) is not TRUE
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
> library(testthat)
> library(tsbox)
> 
> test_check("tsbox")
Starting 2 test processes
== Skipped tests ===============================================================
* empty test (1)
== Warnings ====================================================================
-- Warning (test-ts_first_of_period.R:18:3): ts_first_of_period works ----------
no non-missing arguments to min; returning Inf
Backtrace:
 1. tsbox::ts_first_of_period(x)
 2. tsbox::ts_apply(x, dts_first_of_period)
 3. tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
 4. tsbox:::fun(x, ...)
 8. base::NextMethod("[")
== Failed tests ================================================================
-- Error (test-ts_first_of_period.R:18:3): ts_first_of_period works ------------
Error in `max(which(time <= smry$start)):min(which(time >= smry$end))`: result would be too long a vector
Backtrace:
    x
 1. \-tsbox::ts_first_of_period(x)
 2.   \-tsbox::ts_apply(x, dts_first_of_period)
 3.     \-tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
 4.       \-tsbox:::fun(x, ...)
 5.         +-time[(max(which(time <= smry$start)):min(which(time >= smry$end)))]
 6.         +-base::`[.POSIXct`(...)
 7.         | \-base::.POSIXct(NextMethod("["), attr(x, "tzone"), oldClass(x))
 8.         \-base::NextMethod("[")
[ FAIL 1 | WARN 1 | SKIP 1 | PASS 770 ]
Error: Test failures
Execution halted
1 error x | 1 warning x | 0 notes √
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 7

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

First, I enjoyed the opportunity to review this code. This is the first code review I have done for rOpenSci, and I hope the comments below are helpful.

I like the general purpose of this package and agree there is probably a lot of use for a package that allows easy conversion between R object types. The documentation I think is all ok, though I think it could be improved with a little more description of the expected structure of data. I did not have any issues with installation. There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review). The code snippets in the three vignettes all worked on my machine.

I will preface my comments by saying I am not an experienced user of data.tables, so following the code was a little tricky for me. I usually work with timeseries as data frames, and I was having trouble following the code and figuring out which columns were being assigned as data, timestamps, or identifiers. This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered. When working with data frames that only had one timeseries, the functions all worked simply and, it seemed, as intended.

I was mostly experimenting with the ts_plot(), ts_long(), and ts_xts() functions. ts_xts() produced an error while working with the data frame that had multiple timeseries. Following the prompting, I tried using ts_long(), however this also produced an error that I did not understand, because it seemed to be accessing columns that I did not intend. I didn't see any way to input the correct column names to the function. Once I manually pivoted the data frame using pivot_longer() though, I was able to use ts_xts() and it performed everything correctly I believe. ts_plot() produced error messages that it might have mattered how the columns were organized, and that would have been helpful to know. I was not able to get this function to work for several formats of data frame data, even when following the suggested prompts in the error messages.

The code I used to access data and experiment is provided as well at the linked repo. I hope this is helpful!

@rkillick
Copy link

@rkillick rkillick commented Dec 7, 2021

@chamberlinc thank you for your valuable review of the package.

@rkillick
Copy link

@rkillick rkillick commented Dec 7, 2021

@brunaw are you able to provide your review soon? It was due on 23 Nov.

@christophsax
Copy link
Author

@christophsax christophsax commented Dec 8, 2021

@chamberlinc Thank you so much for your review! I will look into it in more detail, but the error messages/feedback in non-standard situations certainly seems worth to be addressed.

@christophsax
Copy link
Author

@christophsax christophsax commented Apr 19, 2022

First of all, please apologize for my late answer. I originally planned to wait for the second review and lost track afterward.

@chamberlinc, thanks again for your very helpful review! I dealt with your points in PR #211.

Most of your problems arose when applying the functions to non-standard data frames, especially wide ones. In some cases, the messages already pointed to the use use of ts_long(), but the error messages were not too helpful. In PR #211, I tried to address some of these problems. The error messages are improved, and ts_long() can handle some more complex cases. It works now with a toy example of your data set.

A bug in ts_plot() also caused a weird error message. This bug is now resolved as well.

To sum up, I addressed the following three points from your review:

description of the expected data structure

I think it could be improved with a little more description of the expected structure of data.

Extended paragraph on data frames: https://www.tsbox.help/articles/tsbox.html#time-series-in-data-frames-1

Failing test

There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review).

The error and the warning did not occur on my system, but the computation in ts_first_of_period() did not work as expected. I added a new test and a fix. Both are green now on my systems (local and GHA tests in the repository).

Error Messages on wide data frames

This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered.

I improved some of the error messages when functions were applied to wide data structures.

library(nycflights13)
suppressPackageStartupMessages(library(dplyr))
library(tsbox)
packageVersion("tsbox")
#> [1] '0.3.1.9002'

d3 <- weather |>
  select(origin, time_hour, temp, humid, precip)
d3
#> # A tibble: 26,115 × 5
#>    origin time_hour            temp humid precip
#>    <chr>  <dttm>              <dbl> <dbl>  <dbl>
#>  1 EWR    2013-01-01 01:00:00  39.0  59.4      0
#>  2 EWR    2013-01-01 02:00:00  39.0  61.6      0
#>  3 EWR    2013-01-01 03:00:00  39.0  64.4      0
#>  4 EWR    2013-01-01 04:00:00  39.9  62.2      0
#>  5 EWR    2013-01-01 05:00:00  39.0  64.4      0
#>  6 EWR    2013-01-01 06:00:00  37.9  67.2      0
#>  7 EWR    2013-01-01 07:00:00  39.0  64.4      0
#>  8 EWR    2013-01-01 08:00:00  39.9  62.2      0
#>  9 EWR    2013-01-01 09:00:00  39.9  62.2      0
#> 10 EWR    2013-01-01 10:00:00  41    59.6      0
#> # … with 26,105 more rows

This is a wide data frame, so tsbox errors with a meaningful message:

ts_ts(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern

This also works if the order is different

d4 <- weather |>
  select(origin, temp, humid, precip, time_hour)
d4
#> # A tibble: 26,115 × 5
#>    origin  temp humid precip time_hour          
#>    <chr>  <dbl> <dbl>  <dbl> <dttm>             
#>  1 EWR     39.0  59.4      0 2013-01-01 01:00:00
#>  2 EWR     39.0  61.6      0 2013-01-01 02:00:00
#>  3 EWR     39.0  64.4      0 2013-01-01 03:00:00
#>  4 EWR     39.9  62.2      0 2013-01-01 04:00:00
#>  5 EWR     39.0  64.4      0 2013-01-01 05:00:00
#>  6 EWR     37.9  67.2      0 2013-01-01 06:00:00
#>  7 EWR     39.0  64.4      0 2013-01-01 07:00:00
#>  8 EWR     39.9  62.2      0 2013-01-01 08:00:00
#>  9 EWR     39.9  62.2      0 2013-01-01 09:00:00
#> 10 EWR     41    59.6      0 2013-01-01 10:00:00
#> # … with 26,105 more rows

ts_ts(d4)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern

Using ts_plot() on these malformed data frames has returned the following nonsensical error:

[time]: 'time_hour' [value]: 'precip'
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) :
  Items of 'old' not found in column names: [precip]. Consider skip_absent=TRUE.
>

The skip_absent=TRUE is a data.table suggestion, not meant for use in tsbox. I fixed this in commit 50e694b8. The message now makes more sense:

ts_plot(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> 
#> [time]: 'time_hour' [value]: 'precip' 
#> too many series. Only showing the first 20.

reflecting the fact that we have a large number of absurdly short time series (since value columns were classified as id columns)

I also tried to improve a bit on your examples. I construct a data frame that looks similar to yours:

# > str(Conn_discharge_DO)
# 'data.frame':   70519 obs. of  8 variables:
#  $ agency_cd       : chr  "USGS" "USGS" "USGS" "USGS" ...
#  $ site_no         : chr  "01193050" "01193050" "01193050" "01193050" ...
#  $ dateTime        : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
#  $ X_00060_00000   : num  39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
#  $ X_00060_00000_cd: chr  "A" "A" "A" "A" ...

Conn_discharge_DO <- tibble(
  agency_cd = c("USGS", "USGS", "USGS", "USGS"),
  site_no = c("01193050", "01193050", "01193050", "01193050"),
  dateTime = as.POSIXct(c("2019-01-01 05:00:00", "2019-01-01 05:15:00", "2019-01-01 05:30:00", "2019-01-01 05:45:00")),
  unit = "ft3/s",
  X_00060_00000 = 1:4,
  noDataValue = NA,
  X_00060_000002 = 11:14,
  X_00060_00000_cd = "A"
)

tsbox now detects X_00060_00000 as an id column. (As it says in the documentation, it starts on the right). And it suggests using ts_long().

ts_xts(Conn_discharge_DO)
#> Found numeric [id] column(s): 'X_00060_00000'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'dateTime' [value]: 'X_00060_000002'
#> Loading required namespace: xts
#>                     USGS_01193050_ft3/s_1_NA_A USGS_01193050_ft3/s_2_NA_A
#> 2019-01-01 05:00:00                         11                         NA
#> 2019-01-01 05:15:00                         NA                         12
#> 2019-01-01 05:30:00                         NA                         NA
#> 2019-01-01 05:45:00                         NA                         NA
#>                     USGS_01193050_ft3/s_3_NA_A USGS_01193050_ft3/s_4_NA_A
#> 2019-01-01 05:00:00                         NA                         NA
#> 2019-01-01 05:15:00                         NA                         NA
#> 2019-01-01 05:30:00                         13                         NA
#> 2019-01-01 05:45:00                         NA                         14

ts_long() works now on this. It classifies character and factor columns right of the time column as id columns (with a message).

ts_long(Conn_discharge_DO)
#> found columns right to the [time] column that will be treated as [id] columns (character or factor): 'unit', 'X_00060_00000_cd'.
#> Additional [id] column(s): 'agency_cd', 'site_no', 'unit', 'X_00060_00000_cd'
#> [time]: 'dateTime'
#> # A tibble: 12 × 7
#>    agency_cd site_no  unit  X_00060_00000_cd id        dateTime            value
#>    <chr>     <chr>    <chr> <chr>            <chr>     <dttm>              <int>
#>  1 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:00:00     1
#>  2 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:15:00     2
#>  3 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:30:00     3
#>  4 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:45:00     4
#>  5 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:00:00    NA
#>  6 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:15:00    NA
#>  7 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:30:00    NA
#>  8 USGS      01193050 ft3/s A                noDataVa… 2019-01-01 05:45:00    NA
#>  9 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:00:00    11
#> 10 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:15:00    12
#> 11 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:30:00    13
#> 12 USGS      01193050 ft3/s A                X_00060_… 2019-01-01 05:45:00    14

@chamberlinc
Copy link

@chamberlinc chamberlinc commented May 28, 2022

Hello Christoph,
Apologies for the delay again. These updates I think are very good. The functions worked as expected this time with the data I typically work with. I had a few automated tests fail this time though, one because I have an older version of R that does not support the |> operator, and the rest all seemingly because of some issue accessing the function ts_dts.

The details of the automated test output you can find here: https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md. The rest of my review is below:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • [X ] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • [ X] A statement of need: clearly stating problems the software is designed to solve and its target audience in README

  • [X ] Installation instructions: for the development version of package and any non-standard dependencies in README

  • [X ] Vignette(s): demonstrating major functionality that runs successfully locally

Caveat: I am running R 4.0.5 on my machine, and so I do not have native pipes. For the tsbox vignette, the code did work, but I first had to replace all instances |> with %>%. I’d recommend using the older syntax to be compatable with more versions of R.

  • [X ] Function Documentation: for all exported functions

I did not check all of the functions’ documentation, but the subset I checked looked good.

  • [X ] Examples: (that run successfully locally) for all exported functions

Again, only checked a subset, but all worked.

  • [X \ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • [X ] Installation: Installation succeeds as documented.
  • [X ] Functionality: Any functional claims of the software been confirmed.
  • [X ] Performance: Any performance claims of the software been confirmed.
  • [ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine. Several checks and tests still seem to fail, at least one of which is likely due to native pipes with my version of R. The other has to do with the ts_dts function.
sessionInfo()

## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.5  magrittr_2.0.3  fastmap_1.1.0   cli_3.3.0      
##  [5] tools_4.0.5     htmltools_0.5.2 rstudioapi_0.13 yaml_2.3.5     
##  [9] stringi_1.7.6   rmarkdown_2.14  knitr_1.39      stringr_1.4.0  
## [13] xfun_0.31       digest_0.6.29   rlang_1.0.2     evaluate_0.15

pkg_dir <- "C:/Users/Cathy/Documents/PeerReviewing/tsbox/R2/tsbox/"
try(devtools::check(pkg_dir))

## i Updating tsbox documentation

## i Loading tsbox

## Error in parse(text = lines, keep.source = TRUE, srcfile = srcfilecopy(srcref_path %||%  : 
##   C:\Users\Cathy\Documents\PeerReviewing\tsbox\R2\tsbox/tests/testthat/test-date_utils.R:9:41: unexpected '>'
## 8: 
## 9:   x1 <- ts_tbl(ts_c(mdeaths, fdeaths)) |>
##                                            ^

devtools::test(pkg_dir)

## Output removed for clarity. View at https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md ##
  • [X ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 3

  • [X ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.

Review Comments

The functions now work as expected for me for the types of data I typically use. Thanks for the updates! I found the messages on the ts_long function to be very helpful.

@christophsax
Copy link
Author

@christophsax christophsax commented Aug 8, 2022

Very sorry for the slow turnaround.

Native Pipe

I changed the native pipe (|>) back to the magrittr one (%>%), so this should now work on your system with the latest version (0.3.1.9003).

I introduced the native pipe deliberately in the vignettes since I always wanted to keep the dependency graph very simple (tsbox only imports data.table and anytime) and wanted to make the vignettes available for non-magrittr/dplyr users. But I see that there is a trade-off between this aim and the support for older versions of R. The native pipe was introduced in R 4.1 on 2021-05-18 and is now available for almost 15 months. With more time passing, only a few older versions will be around. I changed it back for now but will go to the native pipe again at some point in the future.

Other Test Errors

I have a hard time reproducing them. First, all the tests run through on my system (macOS, R release) and all systems on GitHub Actions (ubuntu-20.04, release, devel), (windows-latest, release), (macOS-latest, release). So I wonder if this is another problem related to the old R version?

Your error says:

 Error in `UseMethod("ts_dts")`: no applicable method for 'ts_dts' applied to an object of class "data.frame"

But there clearly is such a method:

https://github.com/christophsax/tsbox/blob/dd0447c66405da6101419c31b0fe3f0cffdfadc1/R/to_from_data.frame.R#L14-L18

If you still encounter the error in the tests, could you try to run one the failing example outside of the tests?
Does this fail? It really shouldn't.

library(tsbox)
with_id <- wo_id <- ts_df(mdeaths)
with_id$id <- "mdeaths"
ts_c(wo_id, with_id)

Thank you so much for your work! I guess we are alomost done now. I will answer quickly this time :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants