Skip to content
This repository has been archived by the owner on Apr 25, 2019. It is now read-only.

Undocumented datasets #4

Closed
ellisvalentiner opened this issue Aug 15, 2015 · 12 comments
Closed

Undocumented datasets #4

ellisvalentiner opened this issue Aug 15, 2015 · 12 comments
Assignees
Labels

Comments

@ellisvalentiner
Copy link
Collaborator

* checking for missing documentation entries ... WARNING
Undocumented data sets:
  ‘FSO2014’ ‘FSO2014H’ ‘FSO2014_1’ ‘data_10_time’ ‘leftJoin1’
  ‘rightJoin1’ ‘xdata_10’
All user-level objects in a package should have documentation entries.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
@ClaytonJY ClaytonJY added this to the RQt 0.1: Prepare for CRAN milestone Aug 16, 2015
@ClaytonJY
Copy link
Contributor

The fix here is to write Roxygen2 doc blocks for each dataset. They should probably be given more descriptive and consistent names as well.

While we're on the topic of data, it may be better to put these in a data-raw folder and have a build process which converts them to serialized *.rda files for better loading. We can probably get away with not doing that for now.

@ClaytonJY
Copy link
Contributor

Instructions on dataset documentation from Hadley can be found here.

The datasets could be documented right in R/RQt.R, but I recommend against putting everything in there. Two options:

  • Make a file R/datasets.R for all dataset documentation (I do this at work)
  • Move all the rqt.* functions into one or more new R/*.R file(s), and add dataset documentation to R/RQt.R underneath the package block (this is what ggplot2 does

What do you think @javadch @ellisvalentiner?

@javadch
Copy link
Owner

javadch commented Aug 16, 2015

I would stay with @ClaytonJY first option R/datasets.R. And later I'd like to break the RQt.R to multiple files each containing one (or a small number of) function

@javadch
Copy link
Owner

javadch commented Aug 16, 2015

FSO2014_1.csv: is hourly weather data at San Francisco airport for year 2014 to be used directly by R.
fso2014h: is hourly weather data at San Francisco airport for year 2014 to be used by RQt. the difference is that the headers are enhanced to accommodate data types and constraints and are external to the data file. Also adding a # in from of each record causes the engine to treat it as a comment line and ignore it.
data_10M: Is a big soil nitrogen dataset that show cases filtering and offsetting features of XQt. (it is about 500 MB). what do you think @ellisvalentiner and @ClaytonJY ? should we remove it and its corresponding example too?
Other datasets in the data folder are not used, and can be safely removed. keep a local copy for your tests, by the way!

@javadch
Copy link
Owner

javadch commented Aug 16, 2015

Regarding the datasets: they are read by the XQt engine, so converting them to rda should cause failure for the engine! so I guess inst/extdata is a better place if accessible at runtime!

@ClaytonJY
Copy link
Contributor

I agree this data belongs in inst/extdata. I think that will require some editing in the Examples/ folder as well; on build, extdata will move to the top-level of the package folder, so any references to data need to be changed to extdata. A nice side effect of this is we won't need to document the data after this move.

@javadch
Copy link
Owner

javadch commented Aug 17, 2015

what about the CHECK time?

@ClaytonJY
Copy link
Contributor

What do you mean by "CHECK time"?

@javadch
Copy link
Owner

javadch commented Aug 17, 2015

you said that the folder is moved up on level when package is installed, which means as the the development when I CHECK the package, the extdata is still in the inst folder! so the examples should refer to inst/extdata or extdata or the R build system takes care of it?

@ClaytonJY
Copy link
Contributor

The magic of the inst folder is that when building the package(which happens as part of R CMD check), everything in it is moved up one level, to the package root (RQt/). This means you can't have, for example, an inst/R/ folder, because that will conflict with the RQt/R/ folder which contains the R code. It also means to reference data in extdata/, you start with extdata/ rather than inst/extdata because by the time the examples are run, the package has been built, so extdata/ has moved from RQt/inst/extdata/ to RQt/extdata.

I'm hopeful using inst/ might also solve issues with things like config/ not being where they should post-build. I'll start working on a branch to move stuff into inst/ and updating the appropriate references, but I'll need to rely on you for testing and confirmation until I can get things working on Linux.

@javadch
Copy link
Owner

javadch commented Aug 17, 2015

Sounds feasible. Good idea for the config. I test the java environment
variables and let you know
On Aug 16, 2015 9:34 PM, "Clayton" notifications@github.com wrote:

The magic of the inst folder is that when building the package(which
happens as part of R CMD check), everything in it is moved up one level, to
the package root (RQt/). This means you can't have, for example, an
inst/R/ folder, because that will conflict with the RQt/R/ folder which
contains the R code. It also means to reference data in extdata/, you
start with extdata/ rather than inst/extdata because by the time the
examples are run, the package has been built, so extdata/ has moved from
RQt/inst/extdata/ to RQt/extdata.

I'm hopeful using inst/ might also solve issues with things like config/
not being where they should post-build. I'll start working on a branch to
move stuff into inst/ and updating the appropriate references, but I'll
need to rely on you for testing and confirmation until I can get things
working on Linux.


Reply to this email directly or view it on GitHub
#4 (comment).

@ClaytonJY
Copy link
Contributor

Fixed with #13.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants