rnaturalearth #22

andysouth · 2015-11-26T15:35:21Z

1. What does this package do? (explain in 50 words or less)
  rnaturalearth provides :
a pre-downloaded subset of Natural Earth vectors commonly used in world mapping.
functions to download other Natural Earth vectors.
an open, reproducible workflow from www.naturalearthdata.com enabling updating as new versions become available.
flexibility to get maps classified by countries, sovereign states and map-units.
1. Paste the full DESCRIPTION file inside a code block (bounded by ``` on either end).

Package: rnaturalearth
Title: World Vector Map Data from Natural Earth
Version: 0.0.0.9000
Authors@R: person("Andy", "South", , "southandy@gmail.com", role = c("aut", "cre"))
Description: Facilitates mapping by making natural earth map data from http://
    www.naturalearthdata.com/ more easily available to R users. Focuses on vector
    data.
License: CC0
LazyData: true
LazyDataCompression: xz
URL: https://github.com/AndySouth/rnaturalearth
BugReports: https://github.com/AndySouth/rnaturalearth/issues
Depends:
    R (>= 3.1.1)
Imports:
    sp (>= 1.0.15),
    rgdal
Suggests:
    knitr,
    testthat (>= 0.9.1),
    httr
VignetteBuilder: knitr
RoxygenNote: 5.0.1

1. URL for the package (the development repository, not a stylized html page)
  https://github.com/AndySouth/rnaturalearth
1. What data source(s) does it work with (if applicable)?
  www.naturalearthdata.com
1. Who is the target audience?
  Anyone who wants to use Natural Earth vector data to make a map or perform other geographical analyses.
1. Are there other R packages that accomplish the same thing? If so, what is different about yours?
  More limited subsets of Natural Earth data are available in at least the following CRAN packages rworldmap, rworldxtra, choroplethr, maps, oce, tmap. rnaturalearth is different in that it provides a comprehensive reproducible solution for access to Natural Earth vector map data either pre-downloaded or by facilitating download. By separating data access from visualisation this package provides a resource that can be used by other visualisation packages.
1. Check the box next to each policy below, confirming that you agree. These are mandatory.
This package does not violate the Terms of Service of any service it interacts with.
The repository has continuous integration with Travis and/or another service
The package contains a vignette
The package contains a reasonably complete readme with devtools install instructions
The package contains unit tests
The package only exports functions to the NAMESPACE that are intended for end users
1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
[no] Are there any package dependencies not on CRAN?
[yes] Do you intend for this package to go on CRAN?
[yes] Does the package have a CRAN accepted license?
[no] Did devtools::check() produce any errors or warnings? If so paste them below.
One NOTE
checking installed package size ... NOTE
installed size is 26.0Mb
sub-directories of 1Mb or more:
data 25.8Mb
1. Please add explanations below for any exceptions to the above:
  The package is currently too big because of the fine scale (10m) data. Hadley suggested :
  "I think you need to keep the data under five meg. I'd suggest that you put the fine level data a separate package - you can then make that available via a drat repo (perhaps the ROpenSci one?), or just via github."
  I don't know anything about drat repos. I'm happy to take advice and modify the package(s) as recommended.

The text was updated successfully, but these errors were encountered:

sckott · 2015-11-26T22:33:24Z

@andysouth Thanks for this! Seeking reviewer(s)

Robinlovelace · 2015-11-26T23:44:05Z

I could take a look

andysouth · 2015-11-30T10:24:26Z

Thanks both!

On 26 November 2015 at 23:44, Robin notifications@github.com wrote:

I could take a look

—
Reply to this email directly or view it on GitHub
#22 (comment).

sckott · 2015-11-30T16:17:45Z

Reviewers: @Robinlovelace, @lmullen

andysouth · 2015-12-02T11:39:25Z

Thanks all. I appreciate your efforts.
If you have questions feel free to ask.

On 30 November 2015 at 16:17, Scott Chamberlain notifications@github.com
wrote:

@Robinlovelace https://github.com/Robinlovelace and @lmullen
https://github.com/lmullen reviewing - thanks both!

—
Reply to this email directly or view it on GitHub
#22 (comment).

lmullen · 2015-12-14T22:31:43Z

First, let me say that this package is very useful. It will be a very helpful addition to the mapping packages on CRAN and in rOpenSci. I already have plans to use it for my own work and in assignments for students. Well done, @andysouth.

It takes about 2 minutes to install the dev version from GitHub and the installed version is 26 MB. In addition to the fact that CRAN won't accept the package until the data size is smaller, the weight of this package also hinders you as the developer from doing updates as necessary, and it makes it difficult for users to install the package. To solve this it will be necessary to break this package into two or more other packages as detailed below. For the purposes of this review I've focused mainly on package-level issues. Once you've solved the size problem and the package design has stabilized, I'll review the code more granularly.

Structural changes

This package should be at least two packages, and probably three. Here is a suggestion for how you should break this up:

The main package would be rnaturalearth and it would be on CRAN. It would be the only package that a user would ever install or load directly. It should contain nothing but the code to download NaturalEarth files, functions to access the data from the other packages, and the vignettes and package level documentation. The absence of data from this package means that you can update it with impunity, since the size of the package will be very small. (I've found this also helps speed up development.)
The second package would be rnaturalearthData or the like, and it should also be on CRAN. It would contain nothing but the most essential data: as much as you can fit while keeping the package under the 5MB limit set by CRAN. It will also contain the documentation for the datasets. This package will be imported by rnaturalearth, so users will install it whenever they install rnaturalearth. But since this data will changed infrequently, users will seldom need to update this package.
The third package will be rnaturalearthHiRes or the like. This package would reside solely in the rOpenSci drat repository. It would be a suggested package of rnaturalearth, using the AdditionalRepos field in the DESCRIPTION. When users call a function that would use this pre-downloaded data, the rnaturalearth package would check if rnaturalearthHiRes is installed. If it is not installed, then it will ask to install the package from the rOpenSci drat repo. Since this would be a non-CRAN package, you can include as much data as you want. Whether it would ever make sense to break this high resolution package into several different packages would be up to you.

A compromise solution would be to put minimal data into rnaturalearth, and have all the other data in a single rOpenSci drat repository package. But if you're going to do this, I think you may as well get as well break it up into one further package as suggested above.

Now that is a lot of breaking up into multiple packages what will remain, as far as users are concerned, a single user-facing package. But this is the accepted practice for packages which depend on a lot of data. For instance, the openNLP package installs very large packages containing trained models for different langauges from Datacube. I've had to do this in gender and genderdata, even though the data there is much smaller than the geographic data you are working with. And I'm in the process of redoing USAboundaries into exactly the same package structure that I'm suggesting here. If it is a helpful model, you might take a look at gender and genderdata. Here are the relevant functions.

When you do this, you should feel free to give the data packages the CC0 license, but to give the rnaturalearth package which contains only your own work whatever open source license you prefer.

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @andysouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

Tooling

The README.md should be converted to README.Rmd via devtools::use_readme_rmd(). As a bonus, this will let the README include images easily.
The package should use Appveyor's CI for Windows via devtools::use_appveyor().

DESCRIPTION

Specify a version number for rgdal, httr, and knitr.

Tests

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

Vignette

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

Code style

There are places where the code style can be improved. See the suggestions in this chapter of Hadley Wickham's R Packages book or the style chapter in his Advanced R book.

In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).
Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").
Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.) E.g.,

# This is more legible

Most important, all code which has been commented out needs to be deleted.
The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.
Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

myfunc <- function(type = "a") {
  if(type == "a") {
    # ...
  } else if (type =="b") {

  } else {
    stop("Error.")
  }
}

Should be changed to this:

myfunc <- function(type = c("a", "b")) {
  type <- match.arg(type)
  # Logic checking type here.
}

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

Human language style

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)
This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

andysouth · 2015-12-15T21:16:43Z

Thanks @lmullen for a very helpful review and for giving me clear
guidelines to follow. I knew some of these needed doing, now I know how.
I'll wait for feedback from others and then get stuck in.

On 14 December 2015 at 22:31, Lincoln Mullen notifications@github.com
wrote:

First, let me say that this package is very useful. It will be a very
helpful addition to the mapping packages on CRAN and in rOpenSci. I already
have plans to use it for my own work and in assignments for students. Well
done, @andysouth https://github.com/AndySouth.

It takes about 2 minutes to install the dev version from GitHub and the
installed version is 26 MB. In addition to the fact that CRAN won't accept
the package until the data size is smaller, the weight of this package also
hinders you as the developer from doing updates as necessary, and it makes
it difficult for users to install the package. To solve this it will be
necessary to break this package into two or more other packages as detailed
below. For the purposes of this review I've focused mainly on package-level
issues. Once you've solved the size problem and the package design has
stabilized, I'll review the code more granularly.
Structural changes

This package should be at least two packages, and probably three. Here is
a suggestion for how you should break this up:

The main package would be rnaturalearth and it would be on CRAN. It
would be the only package that a user would ever install or load directly.
It should contain nothing but the code to download NaturalEarth files,
functions to access the data from the other packages, and the vignettes and
package level documentation. The absence of data from this package means
that you can update it with impunity, since the size of the package will be
very small. (I've found this also helps speed up development.)

The second package would be rnaturalearthData or the like, and it
should also be on CRAN. It would contain nothing but the most essential
data: as much as you can fit while keeping the package under the 5MB limit
set by CRAN. It will also contain the documentation for the datasets. This
package will be imported by rnaturalearth, so users will install it
whenever they install rnaturalearth. But since this data will changed
infrequently, users will seldom need to update this package.

The third package will be rnaturalearthHiRes or the like. This
package would reside solely in the rOpenSci drat repository
http://packages.ropensci.org/. It would be a suggested package of
rnaturalearth, using the AdditionalRepos field in the DESCRIPTION.
When users call a function that would use this pre-downloaded data, the
rnaturalearth package would check if rnaturalearthHiRes is installed.
If it is not installed, then it will ask to install the package from the
rOpenSci drat repo. Since this would be a non-CRAN package, you can include
as much data as you want. Whether it would ever make sense to break this
high resolution package into several different packages would be up to you.

A compromise solution would be to put minimal data into rnaturalearth,
and have all the other data in a single rOpenSci drat repository package.
But if you're going to do this, I think you may as well get as well break
it up into one further package as suggested above.

Now that is a lot of breaking up into multiple packages what will remain,
as far as users are concerned, a single user-facing package. But this is
the accepted practice for packages which depend on a lot of data. For
instance, the openNLP package installs very large packages containing
trained models for different langauges from Datacube. I've had to do this
in gender and genderdata, even though the data there is much smaller than
the geographic data you are working with. And I'm in the process of redoing
USAboundaries into exactly the same package structure that I'm suggesting
here. If it is a helpful model, you might take a look at gender
https://github.com/ropensci/gender and genderdata
https://github.com/ropensci/genderdata. Here are the relevant functions
https://github.com/ropensci/gender/blob/master/R/install-genderdata-package.R
.

When you do this, you should feel free to give the data packages the CC0
license, but to give the rnaturalearth package which contains only your
own work whatever open source license you prefer.

@sckott https://github.com/sckott, @karthik https://github.com/karthik,
@cboettig https://github.com/cboettig: To be able to test these changes
effectively, @andysouth https://github.com/AndySouth will need to use
the rOpenSci drat repository. I'd suggest that the package be provisionally
accepted and moved into rOpenSci GitHub and drat repositories.
Tooling

The README.md should be converted to README.Rmd via
devtools::use_readme_rmd(). As a bonus, this will let the README
include images easily.

The package should use Appveyor's CI for Windows via
devtools::use_appveyor().

DESCRIPTION

Specify a version number for rgdal, httr, and knitr.

Tests

All the tests pass. I'll check coverage more carefully once the package
structure has stabilized. In the meantime, all of the tests which download
files should have skip_on_cran().
Vignette

The vignette should not need download anything from Natural Earth. The
code block at the end with two downloads should be set to eval = FALSE.
Code style

There are places where the code style can be improved. See the suggestions
in this chapter of Hadley Wickham's R Packages book
http://r-pkgs.had.co.nz/r.html or the style chapter
http://adv-r.had.co.nz/Style.html in his Advanced R book.

In the README and vignettes, in place of require(rnaturalearth) use
library(rnaturalearth).

Spacing is often inconsistent around = using in function definitions
and calls. There should always be a single space on both sides of =.
For example, in the README vignette("rnaturalearth",
package="rnaturalearth") should be vignette("rnaturalearth", package =
"rnaturalearth").

Comments at the start of the line should be followed by a space. (I
prefer to user sentence capitalization; but that's just a preference.) E.g.,

This is more legible

Most important, all code which has been commented out needs to be

deleted.

The same goes for the data download scripts in data-raw. At the
moment, they have a lot investigating the data. Now that you've settled on
what to do, you can retain that information in comments but trim

unnecessary code.

Functions which have limited categorical options should specify all
the options in the function signature and then use match.args(). An
example of this is get_data() and the type parameter. For example, a
function that looks like this

myfunc <- function(type = "a") {
if(type == "a") {
# ...
} else if (type =="b") {

} else {
stop("Error.")
}
}

Should be changed to this:

myfunc <- function(type = c("a", "b")) {
type <- match.arg(type)

Logic checking type here.

}

get_data() should be renamed to something less generic, like
ne_get_data(). Or, can you avoid exporting it?

Human language style

In the headings of the vignettes, prefer sentence-style
capitalization. E.g. 1. Maps in the package. (I'd suggest the same for
comments as well, but that's entirely up to you.)

This line in the vignette should be revised, perhaps into a bulleted
list, since a sentence can only contain one colon. contains
pre-downloaded vector maps for countries: ne_countries(), states:
ne_states() and coastline: ne_coastline()

—
Reply to this email directly or view it on GitHub
#22 (comment).

Robinlovelace · 2015-12-17T00:15:53Z

Review

A number of packages provide geographic data, but none has yet made the high quality datasets within the naturalearth project easily available to R users. rnaturalearth does this with a user-friendly command-line interface, which encourages exploration of what the collection has to offer, in addition to simply providing country and 'state' outlines at various level of geographical resolution. The package will very useful for anyone wanting to make simple yet accurate maps quickly.

ROpenSci, with its emphasis on data access, is an ideal community space to host this package.

Useability of the functions

Overall I found the package intuitive and easy to use. The preface of ne_ is useful for quickly accessing the package's functions (in a similar way that stringr functions begin with st_) and I think the order and description of the function arguments makes sense, given the structure of the data on the naturalearth data repository.

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here. The what-is-a-country vignette provides a useful discussion of the subtleties surrounding up how to divide up geographical space along national, state and regional borders.

Minor comments/suggestions:

I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")
To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets. It was not immediately obvious that these very interesting datasets were made available with - this could usefully be better documented.

roads <- ne_download(scale = 10, type = "roads", category = "cultural")
railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")

sp is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.
scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.
It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.
I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...
Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

Thoughts on reducing the package size

In-line with the previous reviewer I think the size of the package could be problematic. I think the suggestion of splitting it into other packages is a good one. An alternative would be to provide a default destination within the package for data download, not host any data on the package itself directly, and simply use the package to access the data hosted on rnaturalearth. Other than package install requirements, the clear advantage of that is that the data won't have to be updated. The disadvantage is that the user would need internet access the first time they accessed many datasets and possible complexities surrounding fresh installs wiping previously downloaded data.

if(file.exists(...)) could be used to decide whether the data should be downloaded or not. Below is an illustrative example for what the call could look like something like this, for ne_countries():

ne_countries_10 <- function(){
  file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda")
  if(file.exists(file_path)){
    x <- readRDS(file_path)
  }else{
    x <- ne_download()
    saveRDS(x, file_path)
  }
  x
}

Note that data may need to change to exdata in the above code and that this is just a preliminary idea - would be interested to hear thoughts from others. In any case I can see a disadvantage of hosting raw data on CRAN or ROpenSci in terms of the DRY principle.

I'm not sure about specifying providing the same data with data(countries110) is necessary given t countries10 <- ne_countries(110) does the same job.

I'd like to point out that I'm not an expert in this area, think the previous reviewers' suggestions in relation to package size is also another viable option and would be interested in hearing views from others, especially in relation to the DRY principle as it applies to data.

andysouth · 2015-12-17T16:40:01Z

Thanks @Robinlovelace https://github.com/Robinlovelace for a considered
and useful review.

On 17 December 2015 at 00:15, Robin notifications@github.com wrote:

Review

A number of packages provide geographic data, but none has yet made the
high quality datasets within the naturalearth project easily available to R
users. rnaturalearth does this with a user-friendly command-line interface,
which encourages exploration of what the collection has to offer, in
addition to 'simply' providing country and 'state' outlines at various
level of geographical resolution. The package will very useful for anyone
wanting to make simple yet accurate maps quickly.

ROpenSci, with its emphasis on data access, is an ideal community space to
host this package.
Useability of the functions

Overall I found the package intuitive and easy to use. The preface of ne_
is useful for quickly accessing the package's functions (in a similar way
that stringr functions begin with st_) and I think the order and
description of the function arguments makes sense, given the structure of
the data on the naturalearth data repository.

The main vignette
https://github.com/AndySouth/rnaturalearth/blob/master/vignettes/rnaturalearth.Rmd
succintly describes the core purpose and functions of the package, although
I think more on what extra can be downloaded via ne_download() would be
interesting here. The what-is-a-country vignette provides a useful
discussion of the subtleties surrounding up how to divide up geographical
space along national, state and regional borders.

Minor comments/suggestions:

I think the description of the @param type in ne_download could be
better at conveying to the user what datasets are available. It is not
clear from the function documentation, for example, that lakes can be
downloaded with ne_download(type = 'lakes', category = 'physical'). To
resolve this issue I suggest that a link to the dataset types is make more
explicit (e.g. "see
naturalearthdata.com/downloads/110m-cultural-vectors/
http://www.naturalearthdata.com/downloads/110m-cultural-vectors/ for
the types of data that can be downloaded for cultural 110 m resolution
data.")

To assist with the 'making it clear what data is available' issue, I
think that ne_download could usefully be split into ne_download10,
ne_download50 and ne_download110, while keeping the generic ne_download
function intact. The data available varies depending on scale, so
ne_download10, for example, could provide descriptions of the very
useful roads and railways datasets. It was not immediately obvious that
these very interesting datasets were made available with - this could
usefully be better documented.

roads <- ne_download(scale = 10, type = "railroads", category = "cultural")railroads <- ne_download(scale = 10, type = "railroads", category = "cultural")

sp is a dependency but I think the package would benefit from making
it a dependency: all the data loaded by the package that I've seen is in an
sp S4 object. Yet these will not work (e.g. plot) properly unless sp is
loaded, as illustrated in the vignette. This would also make the plot
commands in the vignette simpler.

scale = 10 isn't listed as an argument of ne_countries("scale of map
to return, one of 110, 50, 'small', 'medium'") but it works so should be
added.

It is not clear that category is a needed argument. airports <-
ne_download(scale = 10, type = "airports"), for example, works the
same as airports <- ne_download(scale = 10, type = "airports") - there
are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was
expecting ne_download(scale = 10, type = "antarctic_ice_shelves",
category = "physical") or ne_download(scale = 10, type = "antarctic
ice shelves", category = "physical") to work but neither seemed to...

Links to other data sources such as the SRTM (some of which can be
accessed through the raster package) would be useful, perhaps in the
vignette.

Thoughts on reducing the package size

In-line with the previous reviewer I think the size of the package could
be problematic. I think the suggestion of splitting it into other packages
is a good one. An alternative would be to provide a default destination
within the package for data download, not host any data on the package
itself directly, and simply use the package to access the data hosted on
rnaturalearth. Other than package install requirements, the clear advantage
of that is that the data won't have to be updated. The disadvantage is that
the user would need internet access the first time they accessed many
datasets and possible complexities surrounding fresh installs wiping
previously downloaded data.

if(file.exists(...)) could be used to decide whether the data should be
downloaded or not. Below is an illustrative example for what the call could
look like something like this, for ne_countries():

ne_countries_10 <- function(){
file_path <- file.path(system.file("data", package = "rnaturalearth"), "countries10.rda")
if(file.exists(file_path)){
x <- readRDS(file_path)
}else{
x <- ne_download()
saveRDS(x, file_path)
}
x
}

Note that data may need to change to exdata in the above code and that
this is just a preliminary idea - would be interested to hear thoughts from
others. In any case I can see a disadvantage of hosting raw data on CRAN or
ROpenSci in terms of the DRY principle.

I'm not sure about specifying a function to generate data ne_countries()
while
providing the same data with data(countries110), and the type argument is
necessary, especially given the suggestion above of a way to reduce the
package's size whilst maintaining the downloaded data in a default and
accessible location.

I'd like to point out that I'm not an expert in this area, think the
previous reviewers' suggestions in relation to package size is also another
viable option and would be interested in hearing views from others,
especially in relation to the DRY principle as it applies to data.

—
Reply to this email directly or view it on GitHub
#22 (comment).

richfitz · 2015-12-18T11:07:36Z

Another option for reducing package size (though experimental and could delay CRAN submission).

In another context, I'm developing a framework for using GitHub to distribute evolving data sources without putting them in a package; see early vignette, and beginnings of manuscript which include a little about where we're thinking of taking this.

In short, the data don't go in the package, or even in the repository but live in "github releases" and are fetched as they're used. This decouples the development of the code of the package from the data source which avoids the issues of duplicating the data over and over on CRAN and makes for fast package downloads.

lmullen · 2015-12-22T15:21:50Z

@richfitz: datastorr looks really great. I'm looking forward to the release.

sckott · 2016-02-11T19:34:07Z

hey @andysouth let us know if there's anything we can do to help...

andysouth · 2016-02-12T21:02:15Z

Thanks @sckott https://github.com/sckott. I've been meaning to get back
to this.
I have a plan to start by following parts of the suggestions of both
reviewers. I'll try to get something to you in the next few weeks.

On 11 February 2016 at 19:34, Scott Chamberlain notifications@github.com
wrote:

hey @andysouth https://github.com/AndySouth let us know if there's
anything we can do to help...

—
Reply to this email directly or view it on GitHub
#22 (comment).

sckott · 2016-02-12T21:18:27Z

thanks! sounds good

andysouth · 2016-03-22T00:33:40Z

Thanks for your patience, finally found spare time to finish addressing these useful recomendations from the reviewers.

It fails travis checks currently due to the relationships between the 3 packages, but passes checks locally.

responses to @lmullen

This package should be at least two packages, and probably three.

done. converted to 3 packages rnaturalearth, rnaturalearthdata & rnaturalearthhires. The first 2 are aimed at CRAN and the 3rd the rOpenSci drat repo)

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @andysouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

I'm ready for that stage now if you are.

The README.md should be converted to README.Rmd via devtools::use_readme_rmd().

done

The package should use Appveyor's CI for Windows via devtools::use_appveyor().

done

Specify a version number for rgdal, httr, and knitr.

I don't know what version number is needed for these packages. Feel like I would just be making one up. Where might I look ?

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

done

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

done

Code style

In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).

done

Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").

I recognise this, but unless it's essential would prefer to keep my sometimes idiosyncratic use of spaces which I use to enhance my readability (and out of habit too).

Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.)

done

Most important, all code which has been commented out needs to be deleted.

done (mostly)

The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.

I have kept some commented code as these are the areas where it is most likely I will need to adapt old code in repsonse to potential future data changes, and least likely that users need to access.

Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

myfunc <- function(type = "a") {
if(type == "a") {
# ...
} else if (type =="b") {

} else {
stop("Error.")
}
}
Should be changed to this:

myfunc <- function(type = c("a", "b")) {
type <- match.arg(type)

Logic checking type here.

}

Done, except where I wanted to give the user more guidance if they submitted an incorrect option.

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

no longer exported

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)

done in vignettes

This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

done

responses to @Robinlovelace

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here.

started. More could definitely be added, and I will but would be good to get out first.

I think the description of the @param type in ne_download could be better at conveying to the user what datasets are available. It is not clear from the function documentation, for example, that lakes can be downloaded with ne_download(type = 'lakes', category = 'physical'). To resolve this issue I suggest that a link to the dataset types is make more explicit (e.g. "see naturalearthdata.com/downloads/110m-cultural-vectors/ for the types of data that can be downloaded for cultural 110 m resolution data.")

Thanks for pointing this out. I've added a table to the start of the man page for ne_download() which outlines the main maps available at different scales

To assist with the 'making it clear what data is available' issue, I think that ne_download could usefully be split into ne_download10, ne_download50 and ne_download110, while keeping the generic ne_download function intact. The data available varies depending on scale, so ne_download10, for example, could provide descriptions of the very useful roads and railways datasets.

I think that the improved documentation in ne_download() makes splitting it into the 3 functions unnecessary.

sp is an import but I think the package would benefit from making it a dependency (as it is with stplanr after some debate): all the data loaded by the package that I've seen is in an sp S4 object. Yet these will not work (e.g. plot) properly unless sp is loaded, as illustrated in the vignette. This would also make the plot commands in the vignette simpler.

I'm open to considering making sp a dependency, how do others feel ? I wanted to keep this as light on dependencies as possible. If a user was using fortify with ggplot2 they wouldn't need sp, right ?

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

done

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

Whilst you are correct, category is needed to construct the download url. If I were to dispense with it I would need to create a list of all the maps and which category they are in. I wanted to keep the link to the Natural Earth data as simple as possible so that if it changes I don't have too much work to do.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

ne_download(scale=50, type="antarctic_ice_shelves_polys", category="physical")
I am following natural earth filenaming directly to try to keep this sustainable.

Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

I added a link to seealso for ne_download()

additions

I've added support for raster downloads and tiny_countries points.

Thanks @richfitz for the data storage suggestion. I chose the path more well travelled for now, but am interested in that approach for the future.

lmullen · 2016-03-22T00:41:39Z

@andysouth Glad to see the next version of this package.

@karthik and @sckott: I can review this package in more detail now (not anticipating any problems) but perhaps it would be best to have the high resolution data in drat so I can check that too.

@andysouth: The version numbers for rgdal etc should be whatever the current version is, since that is what you are developing against. You can get the current versions from packageVersion() or better yet from looking at CRAN.

sckott · 2016-03-22T16:11:35Z

hmm, I don't know the process for adding repos to our drat, asked over there ropensci/drat#10

sckott · 2016-03-22T18:38:21Z

@lmullen so we just need rnaturalearthhires in drat for now?

lmullen · 2016-03-22T19:15:16Z

Yes, I think that should work, right @andysouth? The other two will be on
CRAN, so I can install them as if they were on CRAN. Then test with and
without the high res package.

On Tue, Mar 22, 2016 at 2:38 PM, Scott Chamberlain <notifications@github.com

wrote:

@lmullen https://github.com/lmullen so we just need rnaturalearthhires
in drat for now?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#22 (comment)

Lincoln Mullen, http://lincolnmullen.com
Assistant Professor, Department of History & Art History
George Mason University

andysouth · 2016-03-22T19:19:23Z

@lmullen @sckott Yes, thanks that sounds good.

sckott · 2016-03-22T20:48:51Z

Working on it now

related to ropensci/software-review#22

andysouth · 2016-05-04T16:56:23Z

@lmullen @sckott any news ?

sckott · 2016-05-04T17:03:14Z

@andysouth looking back through comments, was I supposed to do something? this does work now install.packages("rnaturalearthhires", repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"))

lmullen · 2016-05-04T17:11:13Z

@sckott @andysouth: Sorry I've been slow. Got hung up with the end of the semester. I'll review the revised package soon.

andysouth · 2016-05-04T17:30:19Z

@lmullen @sckott Thanks and no worries, I look forward to it.

lmullen · 2016-05-19T02:21:08Z

@andysouth: Nice work on this package. I think it is very close to being done and accepted into rOpenSci, then on its way to CRAN. Here are a few minor fixes or questions I found.

Separation of packages

The separation of packages seems to work fine. I was a little surprised that the package installs the data packages using devtools instead of using the rOpenSci repository. What is the reason for that choice? I'd prefer that the rOpenSci repository was used if possible.

The DESCRIPTION will need this line added:

Additional_repositories: http://packages.ropensci.org

I'd still like to see package version numbers for the dependencies in the DESCRIPTION.

R CMD check

I get the following note, which should be easily corrected:

* checking R code for possible problems ... NOTE
ne_download: no visible global function definition for ‘download.file’
ne_download: no visible global function definition for ‘unzip’
Undefined global functions or variables:
  download.file unzip
Consider adding
  importFrom("utils", "download.file", "unzip")
to your NAMESPACE file.

In addition, R CMD check fails for me when the rnaturalearthhires package is not available. This is because it can't build the vignette.

The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data. It is sufficient to show what the code looks like without actually running it.

I get a warning message when running the code in the examples in ne_download(). When I run

spdf_world2 <-    ne_load( scale = 110, type = 'countries' )

I get this warning message which does not seem relevant to the command run.

Warning message:
In if (category == "raster") { :
  the condition has length > 1 and only the first element will be used

The reason is that the default argument for category is a vector of three choices. This function should probably have the line

category <- match.arg(category)

Testing / continuous integration

I noticed that the Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev. As for Appveyor, it looks like the problem has to do with package installation as well.

Not a requirement, but you might consider adding code coverage with covr.

Stylistic considerations

I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."

The package is inconsistent in how it handles spaces around = in function arguments. All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.

That's about it. Nice work!

andysouth · 2016-05-20T10:08:48Z

Many thanks @lmullen for the thorough 2nd review. I'll sort these issues over the next couple of weeks.

andysouth · 2016-06-02T11:08:25Z

@sckott I've addressed all issues except one. There's a remaining problem with Travis rgdal install that I've fought with and failed, be good to get some input as it seems you may have had similar with rnoaa. Appveyor is now building successfully.

responses to 2nd rOpenSci review from @lmullen

Separation of packages

Using rOpenSci repository for loading data packages instead of devtools.

Done for rnaturalearthhires, but rnaturalearthdata is not yet on rOpenSCi or CRAN, I have put in the code ready to change once rnaturalearthdata is on rOpenSci.

               #utils::install.packages("rnaturalearthdata",
               #                        repos = c("http://packages.ropensci.org", "http://cran.rstudio.com"),
               #                        type = "source")

I'd still like to see package version numbers for the dependencies in the DESCRIPTION.

Done

R CMD check

* checking R code for possible problems ... NOTE

ne_download: no visible global function definition for ‘download.file’

ne_download: no visible global function definition for ‘unzip’

done

The vignette fails to build if rnaturalearthhires is not available. Since that package won't be CRAN, I don't think this is going to pass CRAN checks. You should probably add eval = FALSE to the chunks in the vignette that use the high resolution data.

done

I get a warning message when running the code in the examples in ne_download()

In if (category == "raster") { :

the condition has length > 1 and only the first element will be used

done

Testing / continuous integration

Travis and Appveyor tests are failing. At least the Travis tests should be passing. It looks like the problem is that the installation of rgdal fails because you don't have the rgdal dev package installed on Ubuntu. I'm not sure why the binary package as specified in the Travis file is not working. But you should be able to install rgdal from source if you install the apt package libgdal-dev. As for Appveyor, it looks like the problem has to do with package installation as well.

Now build success on appveyor

For Travis I've tried various fixes but still can't get rgdal to install despite the binary installation seemingly working for travis on similar packages e.g. https://github.com/Hackout2/repijson/blob/master/.travis.yml

I'd appreciate any input on this.

Not a requirement, but you might consider adding code coverage with covr.

added a github issue to look at covr for future release

Stylistic considerations

I'd suggest changing the title of the introductory vignette to "Introduction to rnaturalearth."

done

All user-facing documentation (i.e., examples and vignettes) should be made consistent in having a single space on both sides of the =. There are a few places with inconsistent spacing around parentheses as well.

done

sckott · 2016-06-02T16:34:39Z

asking about rgdal, i've been using old language: c setup, which works, but means you don't get the faster builds, caching, container based infrastucture, etc. - asking jim hester

sckott · 2016-06-02T17:53:27Z

jim pointed me to this issue on travis-ci repo travis-ci/travis-ci#5852

So Edzer uses this setup https://github.com/edzer/gstat/blob/master/.travis.yml to install gdal/geos from source

worth trying dist: trusty in your travis file e.g. https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3

lmullen · 2016-06-02T19:38:31Z

@andysouth I looked at the Travis logs and your comments in your Travis config file. Two ideas:

First, Travis does install the Ubuntu r-cran-rgdal package as it should, but then it also tries to install rgdal from source. That seems to me like a bug in Travis.

Second, I couldn't exactly tell which combination of options you used at any given time. I just tried it on my local version of Ubuntu (16.04) and all that I needed to install was libgdal-dev to compile rgdal from source. It looks like one of your option has conflicting versions of libgdal. Maybe try it with both the binary version r-cran-rgdal as you have now plus installing just libgdal-dev with before_install:?

lmullen · 2016-06-02T19:44:10Z

@andysouth I think you might also want to use install_github: to install rnaturalearthdata.

andysouth · 2016-06-06T11:00:49Z

Thanks @lmullen @sckott I've tried both of your suggestions and various other permutations except for the c setup .

I don't fully understand how travis works. There are 2 setups (1-using binary and 2-installing from source ) that apparently should work and seem to work for other packages (1-https://github.com/Hackout2/repijson/blob/master/.travis.yml, 2-https://github.com/jhollist/quickmapr/blob/master/.travis.yml#L3) but don't work for mine. I'm reluctant to try the 3rd potential c solution.

Currently I've got it set seemingly the same as in quickmapr (2) but it is still failing with The following packages have unmet dependencies:
libgdal-dev : Depends: libhdf5-serial-dev

The comment on the end of here travis-ci/travis-ci#5215 prompted me to try removing the version dependency on rgdal in DESCRIPTION, however that seemed not to solve it either.

Am I missing anything obvious ?

sckott · 2016-06-08T16:37:56Z

I've forked to my account to see if I can sort it out, will report back if successful

andysouth · 2016-06-09T08:21:30Z

@sckott thankyou thankyou

andysouth · 2016-06-14T13:08:07Z

Hi @sckott does it seem that Edzers solution didn't work either ?
https://github.com/sckott/rnaturalearth/commit/68f30e781d42ecd5c312d1caf2483e35c7dd74c3

Because it succeeds on appveyor does this suggest it's a specific Travis problem ? Should this stop proceeding to rOpenSci & CRAN ? I'm keen to proceed if possible.

sckott · 2016-06-14T14:16:10Z

@andysouth Yeah, let's move on, couldn't sort it out yet, but not crucial as it seems to pass elsewhere just not on travis.

Approved (see new label on issue)

sckott · 2016-06-14T14:17:47Z

Add this footer to your README:

[![ropensci\_footer](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org)

Update installation of dev versions to ropenscilabs/rnaturalearth and any urls for the github repo to ropenscilabs instead of AndySouth
Go to the Repo Settings --> Transfer Ownership and transfer to ropenscilabs - Note that all our newer pkgs go to ropenscilabs first, then when more mature we'll move to ropensci

andysouth · 2016-06-14T14:46:56Z

@sckott

hurrah ! Thanks.
at the transfer on github I get : You don’t have admin rights to ropenscilabs

sckott · 2016-06-14T14:55:12Z

you should get an invitation to a team within ropenscilabs

sckott added package topic:geospatial topic:data-access 2/seeking-reviewer(s) labels Nov 26, 2015

sckott added 1/editor-checks and removed 2/seeking-reviewer(s) labels Nov 30, 2015

sckott added the 4/review(s)-in-awaiting-changes label Dec 18, 2015

sckott mentioned this issue Mar 23, 2016

Add a pkg? ropensci/drat#10

Closed

sckott added a commit to ropensci/drat that referenced this issue Mar 23, 2016

added AndySouth/rnaturalearthhires to packages.txt

c020216

related to ropensci/software-review#22

noamross mentioned this issue Apr 4, 2016

onboarding ideas #32

Closed

andysouth added a commit to ropensci/rnaturalearth that referenced this issue Jun 2, 2016

update to 2nd revies response, to paste ropensci/software-review#22

ab78275

sckott added 6/approved and removed 4/review(s)-in-awaiting-changes labels Jun 14, 2016

sckott closed this as completed Jun 14, 2016

sckott mentioned this issue Aug 2, 2016

dbhydroR package #61

Closed

13 tasks

noamross added topic:data-only-pkg topic:data-visualisation and removed 1/editor-checks topic:data-access labels Jan 17, 2017

noamross assigned sckott Jan 17, 2017

rnaturalearth #22

rnaturalearth #22

Comments

andysouth commented Nov 26, 2015

sckott commented Nov 26, 2015

Robinlovelace commented Nov 26, 2015

andysouth commented Nov 30, 2015

sckott commented Nov 30, 2015

andysouth commented Dec 2, 2015

lmullen commented Dec 14, 2015

Structural changes

Tooling

DESCRIPTION

Tests

Vignette

Code style

Human language style

andysouth commented Dec 15, 2015

This is more legible

deleted.

unnecessary code.

Logic checking type here.

Robinlovelace commented Dec 17, 2015

Review

Useability of the functions

Thoughts on reducing the package size

andysouth commented Dec 17, 2015

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

richfitz commented Dec 18, 2015

lmullen commented Dec 22, 2015

sckott commented Feb 11, 2016

andysouth commented Feb 12, 2016

sckott commented Feb 12, 2016

andysouth commented Mar 22, 2016

responses to @lmullen

This package should be at least two packages, and probably three.

@sckott, @karthik, @cboettig: To be able to test these changes effectively, @andysouth will need to use the rOpenSci drat repository. I'd suggest that the package be provisionally accepted and moved into rOpenSci GitHub and drat repositories.

The README.md should be converted to README.Rmd via devtools::use_readme_rmd().

The package should use Appveyor's CI for Windows via devtools::use_appveyor().

Specify a version number for rgdal, httr, and knitr.

All the tests pass. I'll check coverage more carefully once the package structure has stabilized. In the meantime, all of the tests which download files should have skip_on_cran().

The vignette should not need download anything from Natural Earth. The code block at the end with two downloads should be set to eval = FALSE.

Code style

In the README and vignettes, in place of require(rnaturalearth) use library(rnaturalearth).

Spacing is often inconsistent around = using in function definitions and calls. There should always be a single space on both sides of =. For example, in the README vignette("rnaturalearth", package="rnaturalearth") should be vignette("rnaturalearth", package = "rnaturalearth").

Comments at the start of the line should be followed by a space. (I prefer to user sentence capitalization; but that's just a preference.)

Most important, all code which has been commented out needs to be deleted.

The same goes for the data download scripts in data-raw. At the moment, they have a lot investigating the data. Now that you've settled on what to do, you can retain that information in comments but trim unnecessary code.

Functions which have limited categorical options should specify all the options in the function signature and then use match.args(). An example of this is get_data() and the type parameter. For example, a function that looks like this

Logic checking type here.

get_data() should be renamed to something less generic, like ne_get_data(). Or, can you avoid exporting it?

In the headings of the vignettes, prefer sentence-style capitalization. E.g. 1. Maps in the package. (I'd suggest the same for comments as well, but that's entirely up to you.)

This line in the vignette should be revised, perhaps into a bulleted list, since a sentence can only contain one colon. contains pre-downloaded vector maps for countries: ne_countries(), states: ne_states() and coastline: ne_coastline()

responses to @Robinlovelace

The main vignette succintly describes the core purpose and functions of the package, although I think more on what extra can be downloaded via ne_download() would be interesting here.

scale = 10 isn't listed as an argument of ne_countries("scale of map to return, one of 110, 50, 'small', 'medium'") but it works so should be added.

It is not clear that category is a needed argument. airports <- ne_download(scale = 10, type = "airports"), for example, works the same as airports <- ne_download(scale = 10, type = "airports") - there are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was expecting ne_download(scale = 10, type = "antarctic_ice_shelves", category = "physical") or ne_download(scale = 10, type = "antarctic ice shelves", category = "physical") to work but neither seemed to...

Links to other data sources such as the SRTM (some of which can be accessed through the raster package) would be useful, perhaps in the vignette.

additions

lmullen commented Mar 22, 2016

sckott commented Mar 22, 2016

sckott commented Mar 22, 2016

lmullen commented Mar 22, 2016

andysouth commented Mar 22, 2016

sckott commented Mar 22, 2016

andysouth commented May 4, 2016

sckott commented May 4, 2016

lmullen commented May 4, 2016

andysouth commented May 4, 2016

lmullen commented May 19, 2016 • edited Loading

Separation of packages

R CMD check

Testing / continuous integration

Stylistic considerations

andysouth commented May 20, 2016

andysouth commented Jun 2, 2016

responses to 2nd rOpenSci review from @lmullen

scale = 10 isn't listed as an argument of ne_countries("scale of map
to return, one of 110, 50, 'small', 'medium'") but it works so should be
added.

It is not clear that category is a needed argument. airports <-
ne_download(scale = 10, type = "airports"), for example, works the
same as airports <- ne_download(scale = 10, type = "airports") - there
are no types I can see that duplicate names between categories.

I couldn't see how to download ice shelf data succesfully. I was
expecting ne_download(scale = 10, type = "antarctic_ice_shelves",
category = "physical") or ne_download(scale = 10, type = "antarctic
ice shelves", category = "physical") to work but neither seemed to...

lmullen commented May 19, 2016 •

edited

Loading

andysouth commented Jun 14, 2016 •

edited

Loading