Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rnaturalearth #22

Closed
6 tasks done
andysouth opened this issue Nov 26, 2015 · 55 comments
Closed
6 tasks done

rnaturalearth #22

andysouth opened this issue Nov 26, 2015 · 55 comments

Comments

@andysouth
Copy link

    1. What does this package do? (explain in 50 words or less)
      rnaturalearth provides :
  • a pre-downloaded subset of Natural Earth vectors commonly used in world mapping.
  • functions to download other Natural Earth vectors.
  • an open, reproducible workflow from www.naturalearthdata.com enabling updating as new versions become available.
  • flexibility to get maps classified by countries, sovereign states and map-units.
    1. Paste the full DESCRIPTION file inside a code block (bounded by ``` on either end).
Package: rnaturalearth
Title: World Vector Map Data from Natural Earth
Version: 0.0.0.9000
Authors@R: person("Andy", "South", , "southandy@gmail.com", role = c("aut", "cre"))
Description: Facilitates mapping by making natural earth map data from http://
    www.naturalearthdata.com/ more easily available to R users. Focuses on vector
    data.
License: CC0
LazyData: true
LazyDataCompression: xz
URL: https://github.com/AndySouth/rnaturalearth
BugReports: https://github.com/AndySouth/rnaturalearth/issues
Depends:
    R (>= 3.1.1)
Imports:
    sp (>= 1.0.15),
    rgdal
Suggests:
    knitr,
    testthat (>= 0.9.1),
    httr
VignetteBuilder: knitr
RoxygenNote: 5.0.1
    1. URL for the package (the development repository, not a stylized html page)
      https://github.com/AndySouth/rnaturalearth
    1. What data source(s) does it work with (if applicable)?
      www.naturalearthdata.com
    1. Who is the target audience?
      Anyone who wants to use Natural Earth vector data to make a map or perform other geographical analyses.
    1. Are there other R packages that accomplish the same thing? If so, what is different about yours?
      More limited subsets of Natural Earth data are available in at least the following CRAN packages rworldmap, rworldxtra, choroplethr, maps, oce, tmap. rnaturalearth is different in that it provides a comprehensive reproducible solution for access to Natural Earth vector map data either pre-downloaded or by facilitating download. By separating data access from visualisation this package provides a resource that can be used by other visualisation packages.
    1. Check the box next to each policy below, confirming that you agree. These are mandatory.
  • This package does not violate the Terms of Service of any service it interacts with.
  • The repository has continuous integration with Travis and/or another service
  • The package contains a vignette
  • The package contains a reasonably complete readme with devtools install instructions
  • The package contains unit tests
  • The package only exports functions to the NAMESPACE that are intended for end users
    1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
  • [no] Are there any package dependencies not on CRAN?
  • [yes] Do you intend for this package to go on CRAN?
  • [yes] Does the package have a CRAN accepted license?
  • [no] Did devtools::check() produce any errors or warnings? If so paste them below.
    One NOTE
  • checking installed package size ... NOTE
    installed size is 26.0Mb
    sub-directories of 1Mb or more:
    data 25.8Mb
    1. Please add explanations below for any exceptions to the above:
      The package is currently too big because of the fine scale (10m) data. Hadley suggested :
      "I think you need to keep the data under five meg. I'd suggest that you put the fine level data a separate package - you can then make that available via a drat repo (perhaps the ROpenSci one?), or just via github."
      I don't know anything about drat repos. I'm happy to take advice and modify the package(s) as recommended.
@sckott
Copy link
Contributor

sckott commented Nov 26, 2015

@andysouth Thanks for this! Seeking reviewer(s)

@Robinlovelace
Copy link
Member

I could take a look

@andysouth
Copy link
Author

Thanks both!

On 26 November 2015 at 23:44, Robin notifications@github.com wrote:

I could take a look


Reply to this email directly or view it on GitHub
#22 (comment).

@sckott
Copy link
Contributor

sckott commented Nov 30, 2015

Reviewers: @Robinlovelace, @lmullen

@andysouth
Copy link
Author

Thanks all. I appreciate your efforts.
If you have questions feel free to ask.

On 30 November 2015 at 16:17, Scott Chamberlain notifications@github.com
wrote:

@Robinlovelace https://github.com/Robinlovelace and @lmullen
https://github.com/lmullen reviewing - thanks both!


Reply to this email directly or view it on GitHub
#22 (comment).

@lmullen
Copy link
Member

lmullen commented Dec 14, 2015

First, let me say that this package is very useful. It will be a very helpful addition to the mapping packages on CRAN and in rOpenSci. I already have plans to use it for my own work and in assignments for students. Well done, @andysouth.

It takes about 2 minutes to install the dev version from GitHub and the installed version is 26 MB. In addition to the fact that CRAN won't accept the package until the data size is smaller, the weight of this package also hinders you as the developer from doing updates as necessary, and it makes it difficult for users to install the package. To solve this it will be necessary to break this package into two or more other packages as detailed below. For the purposes of this review I've focused mainly on package-level issues. Once you've solved the size problem and the package design has stabilized, I'll review the code more granularly.

Structural changes

This package should be at least two packages, and probably three. Here is a suggestion for how you should break this up:

  • The main package would be rnaturalearth and it would be on CRAN. It would be the only package that a user would ever install or load directly. It should contain nothing but the code to download NaturalEarth files, functions to access the data from the other packages, and the vignettes and package level documentation. The absence of data from this package means that you can update it with impunity, since the size of the package will be very small. (I've found this also helps speed up development.)
  • The second package would be rnaturalearthData or the like, and it should also be on CRAN. It would contain nothing but the most essential data: as much as you can fit while keeping the package under the 5MB limit set by CRAN. It will also contain the documentation for the datasets. This package will be imported by rnaturalearth, so users will install it whenever they install rnaturalearth. But since this data will changed infrequently, users will seldom need to update this package.
  • The third package will be rnaturalearthHiRes or the like. This package would reside solely in the rOpenSci drat repository. It would be a suggested package of rnaturalearth, using the AdditionalRepos field in the DESCRIPTION. When users call a function that would use this pre-downloaded data, the rnaturalearth package would check if rnaturalearthHiRes is installed. If it is not installed, then it will ask to install the package from the rOpenSci drat repo. Since this would be a non-CRAN package, you can include as much data as you want. Whether it would ever make sense to break this high resolution package into several different packages would be up to you.

A compromise solution would be to put minimal data into rnaturalearth, and have all the other data in a single rOpenSci drat repository package. But if you're going to do this, I think you may as well get as well break it up into one further package as suggested above.

Now that is a lot of breaking up into multiple packages what will remain, as far as users are concerned, a single user-facing package. But this is the accepted practice for packages which depend on a lot of data. For instance, the openNLP package installs very large packages containing trained models for different langauges from Datacube. I've had to do this in gender and genderdata, even though the data there is much smaller than the geographic data you are working with. And I'm in the process of redoing USAboundaries into exactly the same package structure that I'm suggesting here. If it is a helpful model, you might take a look at gender and genderdata. Here are the relevant functions.

When you do this, you should feel free to give the data packages the CC0 license, but to give the rnaturalearth package which contains only your own work whatever open source license you prefer.

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @andysouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

Tooling

  • The README.md should be converted to README.Rmd via devtools::use_readme_rmd(). As a bonus, this will let the README include images easily.
  • The package should use Appveyor's CI for Windows via devtools::use_appveyor().

DESCRIPTION

  • Specify a version number for rgdal, httr, and knitr.

Tests

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

Vignette

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

Code style

There are places where the code style can be improved. See the suggestions in this chapter of Hadley Wickham's R Packages book or the style chapter in his Advanced R book.

  • In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).
  • Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").
  • Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.) E.g.,
# This is more legible
  • Most important, all code which has been commented out needs to be deleted.
  • The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.
  • Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this
myfunc <- function(type = "a") {
  if(type == "a") {
    # ...
  } else if (type =="b") {

  } else {
    stop("Error.")
  }
}

Should be changed to this:

myfunc <- function(type = c("a", "b")) {
  type <- match.arg(type)
  # Logic checking type here.
}
  • get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

Human language style

  • In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)
  • This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

@andysouth
Copy link
Author

Thanks @lmullen for a very helpful review and for giving me clear
guidelines to follow. I knew some of these needed doing, now I know how.
I'll wait for feedback from others and then get stuck in.

On 14 December 2015 at 22:31, Lincoln Mullen notifications@github.com
wrote:

First, let me say that this package is very useful. It will be a very
helpful addition to the mapping packages on CRAN and in rOpenSci. I already
have plans to use it for my own work and in assignments for students. Well
done, @andysouth https://github.com/AndySouth.

It takes about 2 minutes to install the dev version from GitHub and the
installed version is 26 MB. In addition to the fact that CRAN won't accept
the package until the data size is smaller, the weight of this package also
hinders you as the developer from doing updates as necessary, and it makes
it difficult for users to install the package. To solve this it will be
necessary to break this package into two or more other packages as detailed
below. For the purposes of this review I've focused mainly on package-level
issues. Once you've solved the size problem and the package design has
stabilized, I'll review the code more granularly.
Structural changes

This package should be at least two packages, and probably three. Here is
a suggestion for how you should break this up:

  • The main package would be rnaturalearth and it would be on CRAN. It
    would be the only package that a user would ever install or load directly.
    It should contain nothing but the code to download NaturalEarth files,
    functions to access the data from the other packages, and the vignettes and
    package level documentation. The absence of data from this package means
    that you can update it with impunity, since the size of the package will be
    very small. (I've found this also helps speed up development.)
  • The second package would be rnaturalearthData or the like, and it
    should also be on CRAN. It would contain nothing but the most essential
    data: as much as you can fit while keeping the package under the 5MB limit
    set by CRAN. It will also contain the documentation for the datasets. This
    package will be imported by rnaturalearth, so users will install it
    whenever they install rnaturalearth. But since this data will changed
    infrequently, users will seldom need to update this package.
  • The third package will be rnaturalearthHiRes or the like. This
    package would reside solely in the rOpenSci drat repository
    http://packages.ropensci.org/. It would be a suggested package of
    rnaturalearth, using the AdditionalRepos field in the DESCRIPTION.
    When users call a function that would use this pre-downloaded data, the
    rnaturalearth package would check if rnaturalearthHiRes is installed.
    If it is not installed, then it will ask to install the package from the
    rOpenSci drat repo. Since this would be a non-CRAN package, you can include
    as much data as you want. Whether it would ever make sense to break this
    high resolution package into several different packages would be up to you.

A compromise solution would be to put minimal data into rnaturalearth,
and have all the other data in a single rOpenSci drat repository package.
But if you're going to do this, I think you may as well get as well break
it up into one further package as suggested above.

Now that is a lot of breaking up into multiple packages what will remain,
as far as users are concerned, a single user-facing package. But this is
the accepted practice for packages which depend on a lot of data. For
instance, the openNLP package installs very large packages containing
trained models for different langauges from Datacube. I've had to do this
in gender and genderdata, even though the data there is much smaller than
the geographic data you are working with. And I'm in the process of redoing
USAboundaries into exactly the same package structure that I'm suggesting
here. If it is a helpful model, you might take a look at gender
https://github.com/ropensci/gender and genderdata
https://github.com/ropensci/genderdata. Here are the relevant functions
https://github.com/ropensci/gender/blob/master/R/install-genderdata-package.R
.

When you do this, you should feel free to give the data packages the CC0
license, but to give the rnaturalearth package which contains only your
own work whatever open source license you prefer.

@sckott https://github.com/sckott, @karthik https://github.com/karthik,
@cboettig https://github.com/cboettig: To be able to test these changes
effectively, @andysouth https://github.com/AndySouth will need to use
the rOpenSci drat repository. I'd suggest that the package be provisionally
accepted and moved into rOpenSci GitHub and drat repositories.
Tooling

  • The README.md should be converted to README.Rmd via
    devtools::use_readme_rmd(). As a bonus, this will let the README
    include images easily.
  • The package should use Appveyor's CI for Windows via
    devtools::use_appveyor().

DESCRIPTION

  • Specify a version number for rgdal, httr, and knitr.

Tests

All the tests pass. I'll check coverage more carefully once the package
structure has stabilized. In the meantime, all of the tests which download
files should have skip_on_cran().
Vignette

The vignette should not need download anything from Natural Earth. The
code block at the end with two downloads should be set to eval = FALSE.
Code style

There are places where the code style can be improved. See the suggestions
in this chapter of Hadley Wickham's R Packages book
http://r-pkgs.had.co.nz/r.html or the style chapter
http://adv-r.had.co.nz/Style.html in his Advanced R book.

  • In the README and vignettes, in place of require(rnaturalearth) use
    library(rnaturalearth).
  • Spacing is often inconsistent around = using in function definitions
    and calls. There should always be a single space on both sides of =.
    For example, in the README vignette("rnaturalearth",
    package="rnaturalearth") should be vignette("rnaturalearth", package =
    "rnaturalearth").
  • Comments at the start of the line should be followed by a space. (I
    prefer to user sentence capitalization; but that's just a preference.) E.g.,

This is more legible

  • Most important, all code which has been commented out needs to be

    deleted.

    The same goes for the data download scripts in data-raw. At the
    moment, they have a lot investigating the data. Now that you've settled on
    what to do, you can retain that information in comments but trim

    unnecessary code.

    Functions which have limited categorical options should specify all
    the options in the function signature and then use match.args(). An
    example of this is get_data() and the type parameter. For example, a
    function that looks like this

myfunc <- function(type = "a") {
if(type == "a") {
# ...
} else if (type =="b") {

} else {
stop("Error.")
}
}

Should be changed to this:

myfunc <- function(type = c("a", "b")) {
type <- match.arg(type)

Logic checking type here.

}

  • get_data() should be renamed to something less generic, like
    ne_get_data(). Or, can you avoid exporting it?

Human language style

  • In the headings of the vignettes, prefer sentence-style
    capitalization. E.g. 1. Maps in the package. (I'd suggest the same for
    comments as well, but that's entirely up to you.)
  • This line in the vignette should be revised, perhaps into a bulleted
    list, since a sentence can only contain one colon. contains
    pre-downloaded vector maps for countries: ne_countries(), states:
    ne_states() and coastline: ne_coastline()


Reply to this email directly or view it on GitHub
#22 (comment).

@Robinlovelace
Copy link
Member

Review

A number of packages provide geographic data, but none has yet made the high quality datasets within the naturalearth project easily available to R users. rnaturalearth does this with a user-friendly command-line interface, which encourages exploration of what the collection has to offer, in addition to simply providing country and 'state' outlines at various level of geographical resolution. The package will very useful for anyone wanting to make simple yet accurate maps quickly.

ROpenSci, with its emphasis on data access, is an ideal community space to host this package.

Useability of the functions

Overall I found the package intuitive and easy to use. The preface of ne_ is useful for quickly accessing the package's functions (in a similar way that stringr functions begin with st_) and I think the order and description of the function arguments makes sense, given the structure of the data on the naturalearth data repository.

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here. The what-is-a-country vignette provides a useful discussion of the subtleties surrounding up how to divide up geographical space along national, state and regional borders.

Minor comments/suggestions:

  • I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")
  • To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets. It was not immediately obvious that these very interesting datasets were made available with - this could usefully be better documented.
roads <- ne_download(scale = 10, type = "roads", category = "cultural")
railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")
  • sp is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.
  • scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.
  • It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.
  • I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...
  • Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

Thoughts on reducing the package size

In-line with the previous reviewer I think the size of the package could be problematic. I think the suggestion of splitting it into other packages is a good one. An alternative would be to provide a default destination within the package for data download, not host any data on the package itself directly, and simply use the package to access the data hosted on rnaturalearth. Other than package install requirements, the clear advantage of that is that the data won't have to be updated. The disadvantage is that the user would need internet access the first time they accessed many datasets and possible complexities surrounding fresh installs wiping previously downloaded data.

if(file.exists(...)) could be used to decide whether the data should be downloaded or not. Below is an illustrative example for what the call could look like something like this, for ne_countries():

ne_countries_10 <- function(){
  file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda")
  if(file.exists(file_path)){
    x <- readRDS(file_path)
  }else{
    x <- ne_download()
    saveRDS(x, file_path)
  }
  x
}

Note that data may need to change to exdata in the above code and that this is just a preliminary idea - would be interested to hear thoughts from others. In any case I can see a disadvantage of hosting raw data on CRAN or ROpenSci in terms of the DRY principle.

I'm not sure about specifying providing the same data with data(countries110) is necessary given t countries10 <- ne_countries(110) does the same job.

I'd like to point out that I'm not an expert in this area, think the previous reviewers' suggestions in relation to package size is also another viable option and would be interested in hearing views from others, especially in relation to the DRY principle as it applies to data.

@andysouth
Copy link
Author

Thanks @Robinlovelace https://github.com/Robinlovelace for a considered
and useful review.

On 17 December 2015 at 00:15, Robin notifications@github.com wrote:

Review

A number of packages provide geographic data, but none has yet made the
high quality datasets within the naturalearth project easily available to R
users. rnaturalearth does this with a user-friendly command-line interface,
which encourages exploration of what the collection has to offer, in
addition to 'simply' providing country and 'state' outlines at various
level of geographical resolution. The package will very useful for anyone
wanting to make simple yet accurate maps quickly.

ROpenSci, with its emphasis on data access, is an ideal community space to
host this package.
Useability of the functions

Overall I found the package intuitive and easy to use. The preface of ne_
is useful for quickly accessing the package's functions (in a similar way
that stringr functions begin with st_) and I think the order and
description of the function arguments makes sense, given the structure of
the data on the naturalearth data repository.

The main vignette
https://github.com/AndySouth/rnaturalearth/blob/master/vignettes/rnaturalearth.Rmd
succintly describes the core purpose and functions of the package, although
I think more on what extra can be downloaded via ne_download() would be
interesting here. The what-is-a-country vignette provides a useful
discussion of the subtleties surrounding up how to divide up geographical
space along national, state and regional borders.

Minor comments/suggestions:

I think the description of the @param type in ne_download could be
better at conveying to the user what datasets are available. It is not
clear from the function documentation, for example, that lakes can be
downloaded with ne_download(type = 'lakes', category = 'physical'). To
resolve this issue I suggest that a link to the dataset types is make more
explicit (e.g. "see
naturalearthdata.com/downloads/110m-cultural-vectors/
http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ for
the types of data that can be downloaded for cultural 110 m resolution
data.")

To assist with the 'making it clear what data is available' issue, I
think that ne_download could usefully be split into ne_download10,
ne_download50 and ne_download110, while keeping the generic ne_download
function intact. The data available varies depending on scale, so
ne_download10, for example, could provide descriptions of the very
useful roads and railways datasets. It was not immediately obvious that
these very interesting datasets were made available with - this could
usefully be better documented.

roads <- ne_download(scale = 10, type = "railroads", category = "cultural")railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")

sp is a dependency but I think the package would benefit from making
it a dependency: all the data loaded by the package that I've seen is in an
sp S4 object. Yet these will not work (e.g. plot) properly unless sp is
loaded, as illustrated in the vignette. This would also make the plot
commands in the vignette simpler.

scale = 10 isn't listed as an argument of ne_countries("scale of map
to return, one of 110, 50, 'small', 'medium'") but it works so should be
added.

It is not clear that category is a needed argument. airports <-
ne_download(scale = 10, type = "airports"), for example, works the
same as airports <- ne_download(scale = 10, type = "airports") - there
are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was
expecting ne_download(scale = 10, type = "antarctic_ice_shelves",
category = "physical") or ne_download(scale = 10, type = "antarctic
ice shelves", category = "physical") to work but neither seemed to...

Links to other data sources such as the SRTM (some of which can be
accessed through the raster package) would be useful, perhaps in the
vignette.

Thoughts on reducing the package size

In-line with the previous reviewer I think the size of the package could
be problematic. I think the suggestion of splitting it into other packages
is a good one. An alternative would be to provide a default destination
within the package for data download, not host any data on the package
itself directly, and simply use the package to access the data hosted on
rnaturalearth. Other than package install requirements, the clear advantage
of that is that the data won't have to be updated. The disadvantage is that
the user would need internet access the first time they accessed many
datasets and possible complexities surrounding fresh installs wiping
previously downloaded data.

if(file.exists(...)) could be used to decide whether the data should be
downloaded or not. Below is an illustrative example for what the call could
look like something like this, for ne_countries():

ne_countries_10 <- function(){
file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda")
if(file.exists(file_path)){
x <- readRDS(file_path)
}else{
x <- ne_download()
saveRDS(x, file_path)
}
x
}

Note that data may need to change to exdata in the above code and that
this is just a preliminary idea - would be interested to hear thoughts from
others. In any case I can see a disadvantage of hosting raw data on CRAN or
ROpenSci in terms of the DRY principle.

I'm not sure about specifying a function to generate data ne_countries()
while
providing the same data with data(countries110), and the type argument is
necessary, especially given the suggestion above of a way to reduce the
package's size whilst maintaining the downloaded data in a default and
accessible location.

I'd like to point out that I'm not an expert in this area, think the
previous reviewers' suggestions in relation to package size is also another
viable option and would be interested in hearing views from others,
especially in relation to the DRY principle as it applies to data.


Reply to this email directly or view it on GitHub
#22 (comment).

@richfitz
Copy link
Member

Another option for reducing package size (though experimental and could delay CRAN submission).

In another context, I'm developing a framework for using GitHub to distribute evolving data sources without putting them in a package; see early vignette, and beginnings of manuscript which include a little about where we're thinking of taking this.

In short, the data don't go in the package, or even in the repository but live in "github releases" and are fetched as they're used. This decouples the development of the code of the package from the data source which avoids the issues of duplicating the data over and over on CRAN and makes for fast package downloads.

@lmullen
Copy link
Member

lmullen commented Dec 22, 2015

@richfitz: datastorr looks really great. I'm looking forward to the release.

@sckott
Copy link
Contributor

sckott commented Feb 11, 2016

hey @andysouth let us know if there's anything we can do to help...

@andysouth
Copy link
Author

Thanks @sckott https://github.com/sckott. I've been meaning to get back
to this.
I have a plan to start by following parts of the suggestions of both
reviewers. I'll try to get something to you in the next few weeks.

On 11 February 2016 at 19:34, Scott Chamberlain notifications@github.com
wrote:

hey @andysouth https://github.com/AndySouth let us know if there's
anything we can do to help...


Reply to this email directly or view it on GitHub
#22 (comment).

@sckott
Copy link
Contributor

sckott commented Feb 12, 2016

thanks! sounds good

@andysouth
Copy link
Author

Thanks for your patience, finally found spare time to finish addressing these useful recomendations from the reviewers.

It fails travis checks currently due to the relationships between the 3 packages, but passes checks locally.

responses to @lmullen

This package should be at least two packages, and probably three.

done. converted to 3 packages rnaturalearth, rnaturalearthdata & rnaturalearthhires. The first 2 are aimed at CRAN and the 3rd the rOpenSci drat repo)

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @andysouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

I'm ready for that stage now if you are.

The README.md should be converted to README.Rmd via devtools::use_readme_rmd().

done

The package should use Appveyor's CI for Windows via devtools::use_appveyor().

done

Specify a version number for rgdal, httr, and knitr.

I don't know what version number is needed for these packages. Feel like I would just be making one up. Where might I look ?

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

done

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

done

Code style
In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).

done

Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").

I recognise this, but unless it's essential would prefer to keep my sometimes idiosyncratic use of spaces which I use to enhance my readability (and out of habit too).

Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.)

done

Most important, all code which has been commented out needs to be deleted.

done (mostly)

The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.

I have kept some commented code as these are the areas where it is most likely I will need to adapt old code in repsonse to potential future data changes, and least likely that users need to access.

Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

myfunc <- function(type = "a") {
if(type == "a") {
# ...
} else if (type =="b") {

} else {
stop("Error.")
}
}
Should be changed to this:

myfunc <- function(type = c("a", "b")) {
type <- match.arg(type)

Logic checking type here.

}

Done, except where I wanted to give the user more guidance if they submitted an incorrect option.

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

no longer exported

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)

done in vignettes

This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

done

responses to @Robinlovelace

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here.

started. More could definitely be added, and I will but would be good to get out first.

I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")

Thanks for pointing this out. I've added a table to the start of the man page for ne_download() which outlines the main maps available at different scales

To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets.

I think that the improved documentation in ne_download() makes splitting it into the 3 functions unnecessary.

sp is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.

I'm open to considering making sp a dependency, how do others feel ? I wanted to keep this as light on dependencies as possible. If a user was using fortify with ggplot2 they wouldn't need sp, right ?

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

done

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

Whilst you are correct, category is needed to construct the download url. If I were to dispense with it I would need to create a list of all the maps and which category they are in. I wanted to keep the link to the Natural Earth data as simple as possible so that if it changes I don't have too much work to do.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

ne_download(scale=50, type="antarctic_ice_shelves_polys", category="physical")
I am following natural earth filenaming directly to try to keep this sustainable.

Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

I added a link to seealso for ne_download()

additions

I've added support for raster downloads and tiny_countries points.

Thanks @richfitz for the data storage suggestion. I chose the path more well travelled for now, but am interested in that approach for the future.

@lmullen
Copy link
Member

lmullen commented Mar 22, 2016

@andysouth Glad to see the next version of this package.

@karthik and @sckott: I can review this package in more detail now (not anticipating any problems) but perhaps it would be best to have the high resolution data in drat so I can check that too.

@andysouth: The version numbers for rgdal etc should be whatever the current version is, since that is what you are developing against. You can get the current versions from packageVersion() or better yet from looking at CRAN.

@sckott
Copy link
Contributor

sckott commented Mar 22, 2016

hmm, I don't know the process for adding repos to our drat, asked over there ropensci/drat#10

@sckott
Copy link
Contributor

sckott commented Mar 22, 2016

@lmullen so we just need rnaturalearthhires in drat for now?

@lmullen
Copy link
Member

lmullen commented Mar 22, 2016

Yes, I think that should work, right @andysouth? The other two will be on
CRAN, so I can install them as if they were on CRAN. Then test with and
without the high res package.

On Tue, Mar 22, 2016 at 2:38 PM, Scott Chamberlain <notifications@github.com

wrote:

@lmullen https://github.com/lmullen so we just need rnaturalearthhires
in drat for now?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#22 (comment)

Lincoln Mullen, http://lincolnmullen.com
Assistant Professor, Department of History & Art History
George Mason University

@andysouth
Copy link
Author

@lmullen @sckott Yes, thanks that sounds good.

@sckott
Copy link
Contributor

sckott commented Mar 22, 2016

Working on it now

sckott added a commit to ropensci/drat that referenced this issue Mar 23, 2016
@andysouth
Copy link
Author

@lmullen @sckott any news ?

@sckott
Copy link
Contributor

sckott commented May 4, 2016

@andysouth looking back through comments, was I supposed to do something? this does work now install.packages("rnaturalearthhires", repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"))

@lmullen
Copy link
Member

lmullen commented May 4, 2016

@sckott @andysouth: Sorry I've been slow. Got hung up with the end of the semester. I'll review the revised package soon.

@andysouth
Copy link
Author

@lmullen @sckott Thanks and no worries, I look forward to it.

@lmullen
Copy link
Member

lmullen commented May 19, 2016

@andysouth: Nice work on this package. I think it is very close to being done and accepted into rOpenSci, then on its way to CRAN. Here are a few minor fixes or questions I found.

Separation of packages

The separation of packages seems to work fine. I was a little surprised that the package installs the data packages using devtools instead of using the rOpenSci repository. What is the reason for that choice? I'd prefer that the rOpenSci repository was used if possible.

The DESCRIPTION will need this line added:

Additional_repositories: http://packages.ropensci.org

I'd still like to see package version numbers for the dependencies in the DESCRIPTION.

R CMD check

I get the following note, which should be easily corrected:

* checking R code for possible problems ... NOTE
ne_download: no visible global function definition for ‘download.file’
ne_download: no visible global function definition for ‘unzip’
Undefined global functions or variables:
  download.file unzip
Consider adding
  importFrom("utils", "download.file", "unzip")
to your NAMESPACE file.

In addition, R CMD check fails for me when the rnaturalearthhires package is not available. This is because it can't build the vignette.

The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data. It is sufficient to show what the code looks like without actually running it.

I get a warning message when running the code in the examples in ne_download(). When I run

spdf_world2 <-    ne_load( scale = 110, type = 'countries' )

I get this warning message which does not seem relevant to the command run.

Warning message:
In if (category == "raster") { :
  the condition has length > 1 and only the first element will be used

The reason is that the default argument for category is a vector of three choices. This function should probably have the line

category <- match.arg(category)

Testing / continuous integration

I noticed that the Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev. As for Appveyor, it looks like the problem has to do with package installation as well.

Not a requirement, but you might consider adding code coverage with covr.

Stylistic considerations

I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."

The package is inconsistent in how it handles spaces around = in function arguments. All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.

That's about it. Nice work!

@andysouth
Copy link
Author

Many thanks @lmullen for the thorough 2nd review. I'll sort these issues over the next couple of weeks.

andysouth added a commit to ropensci/rnaturalearth that referenced this issue Jun 2, 2016
@andysouth
Copy link
Author

@sckott I've addressed all issues except one. There's a remaining problem with Travis rgdal install that I've fought with and failed, be good to get some input as it seems you may have had similar with rnoaa. Appveyor is now building successfully.

responses to 2nd rOpenSci review from @lmullen

Separation of packages

Using rOpenSci repository for loading data packages instead of devtools.

Done for rnaturalearthhires, but rnaturalearthdata is not yet on rOpenSCi or CRAN, I have put in the code ready to change once rnaturalearthdata is on rOpenSci.

               #utils::install.packages("rnaturalearthdata",
               #                        repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"),
               #                        type = "source")

I'd still like to see package version numbers for the dependencies in the DESCRIPTION.

Done

R CMD check

* checking R code for possible problems ... NOTE

ne_download: no visible global function definition for ‘download.file’

ne_download: no visible global function definition for ‘unzip’

done

The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data.

done

I get a warning message when running the code in the examples in ne_download()

In if (category == "raster") { :

the condition has length > 1 and only the first element will be used

done

Testing / continuous integration

Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev. As for Appveyor, it looks like the problem has to do with package installation as well.

Now build success on appveyor

For Travis I've tried various fixes but still can't get rgdal to install despite the binary installation seemingly working for travis on similar packages e.g. https://github.com/Hackout2/repijson/blob/master/.travis.yml

I'd appreciate any input on this.

Not a requirement, but you might consider adding code coverage with covr.

added a github issue to look at covr for future release

Stylistic considerations

I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."

done

All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.

done

@sckott
Copy link
Contributor

sckott commented Jun 2, 2016

asking about rgdal, i've been using old language: c setup, which works, but means you don't get the faster builds, caching, container based infrastucture, etc. - asking jim hester

@sckott
Copy link
Contributor

sckott commented Jun 2, 2016

jim pointed me to this issue on travis-ci repo travis-ci/travis-ci#5852

So Edzer uses this setup https://github.com/edzer/gstat/blob/master/.travis.yml to install gdal/geos from source

worth trying dist: trusty in your travis file e.g. https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3

@lmullen
Copy link
Member

lmullen commented Jun 2, 2016

@andysouth I looked at the Travis logs and your comments in your Travis config file. Two ideas:

First, Travis does install the Ubuntu r-cran-rgdal package as it should, but then it also tries to install rgdal from source. That seems to me like a bug in Travis.

Second, I couldn't exactly tell which combination of options you used at any given time. I just tried it on my local version of Ubuntu (16.04) and all that I needed to install was libgdal-dev to compile rgdal from source. It looks like one of your option has conflicting versions of libgdal. Maybe try it with both the binary version r-cran-rgdal as you have now plus installing just libgdal-dev with before_install:?

@lmullen
Copy link
Member

lmullen commented Jun 2, 2016

@andysouth I think you might also want to use install_github: to install rnaturalearthdata.

@andysouth
Copy link
Author

Thanks @lmullen @sckott I've tried both of your suggestions and various other permutations except for the c setup .

I don't fully understand how travis works. There are 2 setups (1-using binary and 2-installing from source ) that apparently should work and seem to work for other packages (1-https://github.com/Hackout2/repijson/blob/master/.travis.yml, 2-https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3) but don't work for mine. I'm reluctant to try the 3rd potential c solution.

Currently I've got it set seemingly the same as in quickmapr (2) but it is still failing with The following packages have unmet dependencies:
libgdal-dev : Depends: libhdf5-serial-dev

The comment on the end of here travis-ci/travis-ci#5215 prompted me to try removing the version dependency on rgdal in DESCRIPTION, however that seemed not to solve it either.

Am I missing anything obvious ?

@sckott
Copy link
Contributor

sckott commented Jun 8, 2016

I've forked to my account to see if I can sort it out, will report back if successful

@andysouth
Copy link
Author

@sckott thankyou thankyou

@andysouth
Copy link
Author

andysouth commented Jun 14, 2016

Hi @sckott does it seem that Edzers solution didn't work either ?
https://github.com/sckott/rnaturalearth/commit/68f30e781d42ecd5c312d1caf2483e35c7dd74c3

Because it succeeds on appveyor does this suggest it's a specific Travis problem ? Should this stop proceeding to rOpenSci & CRAN ? I'm keen to proceed if possible.

@sckott
Copy link
Contributor

sckott commented Jun 14, 2016

@andysouth Yeah, let's move on, couldn't sort it out yet, but not crucial as it seems to pass elsewhere just not on travis.

Approved (see new label on issue)

@sckott
Copy link
Contributor

sckott commented Jun 14, 2016

  • Add this footer to your README:
[![ropensci\_footer](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org)
  • Update installation of dev versions to ropenscilabs/rnaturalearth and any urls for the github repo to ropenscilabs instead of AndySouth
  • Go to the Repo Settings --> Transfer Ownership and transfer to ropenscilabs - Note that all our newer pkgs go to ropenscilabs first, then when more mature we'll move to ropensci

@andysouth
Copy link
Author

@sckott

  1. hurrah ! Thanks.
  2. at the transfer on github I get : You don’t have admin rights to ropenscilabs

@sckott
Copy link
Contributor

sckott commented Jun 14, 2016

you should get an invitation to a team within ropenscilabs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants