New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osmdata package #103

Closed
mpadge opened this Issue Mar 8, 2017 · 26 comments

Comments

Projects
None yet
6 participants
@mpadge
Contributor

mpadge commented Mar 8, 2017

Summary

  • What does this package do? (explain in 50 words or less):

Download OpenStreetMap (OSM) data and convert to native R objects defined by sf or sp packages. OSM datasets are extracted from the overpass API and processed with efficient C++ routines.

  • Paste the full DESCRIPTION file inside a code block below:
Package: osmdata
Version: 0.0.0
Authors@R: c(
    person("Mark", "Padgham", email="mark.padgham@email.com", role=c("aut", "cre")),
    person("Bob", "Rudis", role="aut"),
    person("Robin", "Lovelace", role="aut"), 
    person("Maëlle", "Salmon", role="aut"),
    person("Andrew", "Smith", role="ctb"),
    person("Marcin", "Kalicinski", role=c("ctb", "cph"),
        comment="Author of included RapidXML code"),
    person("Finkelstein", "Noam", role=c("ctb","cph"),
        comment="Author of included stub.R code"),
    person ("Bartnik", "Lukasz", role=c("ctb","cph"),
        comment="Author of included stub.R code"))
Maintainer: Mark Padgham <mark.padgham@email.com>
Title: Download OpenStreetMap Data with the Overpass API
Description: Extraction of OpenStreetMap (OSM) data as 'sf' or 'sp' objects.
    OSM data are extracted from the overpass API and processed with very fast
    C++ routines for return to R.
Depends:
    R (>= 3.2.4)
License: GPL-3 + file LICENSE
SystemRequirements: C++11
NeedsCompilation: yes
LazyData: true
Imports:
    curl,
    httr,
    lubridate,
    methods,
    Rcpp (>= 0.12.4),
    rvest,
    jsonlite,
    sp,
    utils,
    xml2
Suggests:
    devtools,
    knitr,
    microbenchmark,
    rmarkdown,
    roxygen2,
    sf,
    testthat
LinkingTo: Rcpp
URL: https://github.com/osmdatar/osmdata
BugReports: https://github.com/osmdatar/osmdata/issues
Encoding: UTF-8
RoxygenNote: 6.0.1
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/osmdatar/osmdata

  • Who is the target audience?

Anyone interested in geospatial analysis and visualisation, especially geographers, urban and transport modellers and planners.

  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

The package is similar to GDAL through the interface provided by sf, yet produces fundamentally different results, as briefly described in the main vignette, and detailed at length in a second vignette. osmdata also reproduces much functionality available in the CRAN package osmar, yet does so enormously faster (like at least thousands of times faster!) and with a more intuitive user interface. The package also imports data faster than sf. osmar only returns data defined by a bounding box, while the osmdata interface to the overpass API allows highly customised and specific queries to be constructed (see examples in main vignette). osmdata is also designed to complement the ropensci package osmplotr, and the latter package will be modified immediately following CRAN acceptance of osmdata.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

NOTE: As per the JOSS guidelines, the package will be submitted to a long-term repo subsequent to the review process

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

    • Jochen Topf / joto - Main developer of osmcode and author of libosmium
    • Edzer Pebesma / edzer - Developer of the R packages sp and sf
@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge Mar 9, 2017

Contributor

One further comment prior to review: The goodpractice output about <- instead of = and also about sapply() mostly arise from fragments of sf code which are copied verbatim to ensure absolute consistency with sf functionality. I do not intend to modify these fragments (except to reflect changes in sf itself). Other than this, sapply()is only used in sapply (x, is.null)-type statements, to which the goodpractice remarks do not apply.

Contributor

mpadge commented Mar 9, 2017

One further comment prior to review: The goodpractice output about <- instead of = and also about sapply() mostly arise from fragments of sf code which are copied verbatim to ensure absolute consistency with sf functionality. I do not intend to modify these fragments (except to reflect changes in sf itself). Other than this, sapply()is only used in sapply (x, is.null)-type statements, to which the goodpractice remarks do not apply.

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Mar 21, 2017

Member

Looking for reviewers. Will update thread when I do.

Member

karthik commented Mar 21, 2017

Looking for reviewers. Will update thread when I do.

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Mar 23, 2017

Member

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thank you for your submission! Out of curiosity, can you speak to how this relates to functionality from the Openstreetmap package in terms of overlap?

While I seek reviewers I want to bring your attention to a few checks that were reuturned by goodpractice. I have noted your comments about the notes around code copied over from sf (so I've removed those). There is no urgency to address them before the review, but it would be good to fix during.

  • Here are outputs from automated testing using goodpractice::gp(). I'd also recommend using devtools::spell_check() to examine documentation.
── GP osmdata ───────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all
    package code in general. 84% of code lines are
    covered by test cases.

    R/features.R:24:NA
    R/features.R:55:NA
    R/opq.R:75:NA
    R/opq.R:76:NA
    R/osm-extract.R:55:NA
    ... and 275 more lines

  ✖ avoid long code lines, it is bad for readability.
    Also, many people prefer editor windows that are
    about 80 characters wide. Try make your lines
    shorter than 80 characters

    tests/testthat/test-extract.R:74:1
    tests/testthat/test-extract.R:84:1
    tests/testthat/test-extract.R:121:1
    tests/testthat/test-extract.R:171:1
    tests/testthat/test-osmdata.R:21:1
    ... and 15 more lines

 

  ✖ not import packages as a whole, as this can cause
    name clashes between the imported packages.
    Instead, import only the specific functions you
    need.
Member

karthik commented Mar 23, 2017

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thank you for your submission! Out of curiosity, can you speak to how this relates to functionality from the Openstreetmap package in terms of overlap?

While I seek reviewers I want to bring your attention to a few checks that were reuturned by goodpractice. I have noted your comments about the notes around code copied over from sf (so I've removed those). There is no urgency to address them before the review, but it would be good to fix during.

  • Here are outputs from automated testing using goodpractice::gp(). I'd also recommend using devtools::spell_check() to examine documentation.
── GP osmdata ───────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all
    package code in general. 84% of code lines are
    covered by test cases.

    R/features.R:24:NA
    R/features.R:55:NA
    R/opq.R:75:NA
    R/opq.R:76:NA
    R/osm-extract.R:55:NA
    ... and 275 more lines

  ✖ avoid long code lines, it is bad for readability.
    Also, many people prefer editor windows that are
    about 80 characters wide. Try make your lines
    shorter than 80 characters

    tests/testthat/test-extract.R:74:1
    tests/testthat/test-extract.R:84:1
    tests/testthat/test-extract.R:121:1
    tests/testthat/test-extract.R:171:1
    tests/testthat/test-osmdata.R:21:1
    ... and 15 more lines

 

  ✖ not import packages as a whole, as this can cause
    name clashes between the imported packages.
    Instead, import only the specific functions you
    need.
@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge Mar 23, 2017

Contributor

Thanks Karthik!

In response to your Q, OpenStreetMap simply produces raster layers representing OSM data, and is a purely graphical thing; osmdata returns the actual data, and is not a graphical thing at all. I see no real relationship between them.

As for the goodpractice results:

  • This is a package which can not be fully covered in tests, but is actually > 90% in total, and will increase slightly beyond that
  • The import warning is for sp, and can't be avoided because the /src code needs an [[Rcpp::Depends(sp)]], requiring the whole package to be imported even though it's not actually required in the R code at all.
  • Long lines are only for tests, and are all for long strings containing warning/error messages being tested. Can be eliminated by reducing indentation within each test block, but believe me, that makes the tests much harder to read, not easier. They are also almost all only 81 or 82 characters. I would strongly prefer not to change these, if that's acceptable.
Contributor

mpadge commented Mar 23, 2017

Thanks Karthik!

In response to your Q, OpenStreetMap simply produces raster layers representing OSM data, and is a purely graphical thing; osmdata returns the actual data, and is not a graphical thing at all. I see no real relationship between them.

As for the goodpractice results:

  • This is a package which can not be fully covered in tests, but is actually > 90% in total, and will increase slightly beyond that
  • The import warning is for sp, and can't be avoided because the /src code needs an [[Rcpp::Depends(sp)]], requiring the whole package to be imported even though it's not actually required in the R code at all.
  • Long lines are only for tests, and are all for long strings containing warning/error messages being tested. Can be eliminated by reducing indentation within each test block, but believe me, that makes the tests much harder to read, not easier. They are also almost all only 81 or 82 characters. I would strongly prefer not to change these, if that's acceptable.
@noamross

This comment has been minimized.

Show comment
Hide comment
@noamross

noamross Mar 24, 2017

Collaborator

FYI, while we don't strictly enforce line limits, putting #nolint after a long message line or URL in code suppresses linting that line and gives me peace of mind and clean gp() results. A .lintr file is useful for excluding files in some cases (such as auto-generated Rcpp code), but of course easily abused ;)

Collaborator

noamross commented Mar 24, 2017

FYI, while we don't strictly enforce line limits, putting #nolint after a long message line or URL in code suppresses linting that line and gives me peace of mind and clean gp() results. A .lintr file is useful for excluding files in some cases (such as auto-generated Rcpp code), but of course easily abused ;)

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Mar 24, 2017

Member

@mpadge Perfect, your responses make sense to me.
@noamross Thanks for those useful tips! 💯

Member

karthik commented Mar 24, 2017

@mpadge Perfect, your responses make sense to me.
@noamross Thanks for those useful tips! 💯

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Mar 24, 2017

Member

Reviewer 1 is: @edzer
Review due: April 14, 2017

Member

karthik commented Mar 24, 2017

Reviewer 1 is: @edzer
Review due: April 14, 2017

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge Mar 24, 2017

Contributor

Thanks @noamross, i didn't know about # nolint and will definitely modify accordingly. I had syntasticed all source files, but forgot about the tests, which actually have lots more tidying issues than just the gp ones ... whoops

Thanks @karthik for arranging one reviewer so quickly, and thanks @edzer for agreeing to review.

Contributor

mpadge commented Mar 24, 2017

Thanks @noamross, i didn't know about # nolint and will definitely modify accordingly. I had syntasticed all source files, but forgot about the tests, which actually have lots more tidying issues than just the gp ones ... whoops

Thanks @karthik for arranging one reviewer so quickly, and thanks @edzer for agreeing to review.

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge Mar 26, 2017

Contributor

Okay, gp() no longer raises any issues other than unit tests and whole-package import, the latter unavoidable for reasons described above. And only had to cheat with #nolint 4 times.

Contributor

mpadge commented Mar 26, 2017

Okay, gp() no longer raises any issues other than unit tests and whole-package import, the latter unavoidable for reasons described above. And only had to cheat with #nolint 4 times.

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Mar 27, 2017

Member

Reviewer 2: @ramnathv
Review due: May 3rd (due to travel)

Member

karthik commented Mar 27, 2017

Reviewer 2: @ramnathv
Review due: May 3rd (due to travel)

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Apr 10, 2017

Package Review of osmdata

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

It would be good to mention in the first sentence of the pkg description that it returns vector data, and not images (like pkg OpenStreetMaps does): not everyone knows what's behind the Overpass API.

  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.

the target audience is not addressed in paper.md: is it end-users interested in exploring and plotting, or package developers using osmdata to ingest raw data?

  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

paper.md does not give references to the packages used (imported) by osmdata, to packages or interfaces to which this package can be compared, or to the technology interfaced (open streetmaps, overpass api).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

I didn't check for automated tests, or the guidelines.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6-8


Review Comments

Design

This package imports openstreetmap vector data in R. It has functionality similar to osmar, which has not seen any development since 2013; osmar can export to sp, osmdata to sp and sf. osmar, on the other hand, can do shortest paths through a network, and exposes the street network to R users (combining sp and igraph), osmdata does not expose the network. The package claims it is fast compared to sf/GDAL, but the link supporting this claim is dead.

The package advertises itself mostly mentioning the overpass api, but not everyone familiar with R and (vaguely with) openstreetmap will know what that means; it would be good to also mention that osmdata reads openstreetmap vector data, as opposed to e.g. package OpenStreetMap which imports openstreetmap raster data (images).

Some of the functions are rather verbose; e.g. function osmdata checks all data for missing, and if they are, assigns NULL. It would be much shorter to make NULL their default value; the function can also be written as

osmdata <- function (bbox = NULL, overpass_call = NULL, osm_points =
	NULL, osm_lines = NULL, osm_polygons = NULL, osm_multilines = NULL, 
	osm_multipolygons = NULL, timestamp = NULL, ...) {

    structure(bbox = bbox, overpass_call = overpass_call, timestamp = timestamp, 
        osm_points = osm_points, osm_lines = osm_lines, osm_polygons = osm_polygons, 
        osm_multilines = osm_multilines, osm_multipolygons = osm_multipolygons,
		class = append(class(list()), "osmdata"))
}

which reduces 19 expressions to 1. Also, it is rather un R-like to make a class c("list", "osmdata"), instead of c("osmdata", "list") or even c("osmdata"), sincelistis the default S3 superclass, and that would allow to add methods forosmdata, and fall back tolistmethods where useful. The...is not used (andosmdata` is not a generic), and should be omitted, so that mis-typing of arguments lead to an error.

Function osmdata is supposed to be called internally only (not directly by users), and all the calls I could see look like osmdata () (why the space before ()?), so the whole list of arguments looks obsolete. The help page of osmdata gives an introduction to the whole package, to only reveal at the end that osmdata() should not be called. It mentions osm class def, but uses S3, where classes do not have a definition; somewhat confusing to those living in S3. I'm also not so sure whether the constructor function is needed at all: the class members are overwritten later without checking on using the $ operator; for a list L, a member that was not overwritten, like L$whatever, results in NULL also when it does not exist:

> x = list()
> x$bbox
# NULL

this implies that structure(list(), class = "osmdata") could replace calls to osmdata(), allowing for removal of osmdata() altogether.

The functions starting with osmdata_ export an overpass query to sp, sf or (write to file as) xml. It's somewhat confusing that

osmdata_sf(query, doc)

writes from doc into an object of class sf, where

osmdata_xml(query, doc)

writes the query to doc. Also, I had expected that option quiet = TRUE would suppress the message

q missing: osmdata object will not include query

The return of osmdata_sp, following the example, gives

> hampi_sp
Object of class 'osmdata' with:
                 $bbox : 15.3159769,76.4406199,15.3559769,76.4806199
        $overpass_call : The call submitted to the overpass API
            $timestamp : [ Sun 1 Mar 2017 20:04:23 ]
           $osm_points : 'sp' SpatialpointsDataFrame with 109 points
            $osm_lines : 'sp' SpatiallinesDataFrame with 0 lines
         $osm_polygons : 'sp' SpatialpolygonsDataFrame with 18 polygons
       $osm_multilines : 'sp' SpatialmultilinesDataFrame with 0 multilines
    $osm_multipolygons : 'sp' SpatialmultipolygonsDataFrame with 0 multipolygons

where it might be better to use the class names of sp classes (SpatialpointsDataFrame etc. are not class names of sp, or of something else). Package sp does not distinguish between osm_lines and osm_multilines, or osm_polygons and osm_multipolygon; the list above confusingly suggests this. Why this distinction?

Function names

Now, osmdata_sf, osmdata_sp and osmdata_xml have a large overlap; osmdata_sf and osmdata_sp could call osmdata_xml without the file argument to avoid this.

The vignette/README mention that

Or data can be read from a previously downloaded file:

x <- osmdata_sf(q1, "data.xml")

... but what is q1 used for in this call? It seems that q1 is added to the returned object, which takes its data from data.xml, but there is no guarantee that the two belong together. When the query call is added to the returned object (e.g. by osmdata_xml) this guarantee can be given, and ambiguity is excluded.

For the function names, it would help to memorize what they do if they would be, or contain, verbs; now, only getbb has a verb.

osmdata_sf is described as "Return an OSM Overpass query as an ‘osmdata’ object in ‘sf’ format." I would have opted for "convert the result of an OSM Overpass query to a list with ‘sf’ objects".

Install

During install, I see:

./configure: 5: [: Linux: unexpected operator

which points to a bug. The configure script aims at doing a coverage test on travis, but by checking for Linux. It might be better (esp. for linux users!) if it would test whether USER equals travis to create a conditional configuration for travis.

Performance, comparison

Since the the paper mentions data is "processed with very fast C++ routines for return to R", I compared the speed to that of the GDAL/OGR driver, using sf::st_read:

library(osmdata)
library(sf)

bb = c(11.561541, 48.14102, 11.582852, 48.1530)
q = opq(bbox = bb)
osmdata_xml(q, "x.osm")
system.time(o <- osmdata_sf(doc = "x.osm"))
layers = st_layers("x.osm")$name

osm_rd = function() { osmdata_sf(doc = "x.osm", quiet = TRUE) }
sf_rd = function() { system.time(lst <- lapply(layers, function(x) st_read("x.osm", x, quiet = TRUE))) }

library(microbenchmark)
microbenchmark::microbenchmark(osm_rd(), sf_rd(), times = 20)
# Unit: seconds
#      expr      min       lq     mean   median       uq      max neval
#  osm_rd() 2.393081 2.567723 2.645534 2.620662 2.708159 2.961327    20
#   sf_rd() 2.948154 3.195789 3.351276 3.318939 3.479391 3.760206    20

and see that the results are quite similar. Of interest is the difference between the objects returned:

o
# Object of class 'osmdata' with:
#                  $bbox : 11.3675909042358 48.0759544372559 11.7228775024414 48.2664070129395
#         $overpass_call : The call submitted to the overpass API
#             $timestamp : [ Sun 1 Mar 2017 20:02:02 ]
#            $osm_points : 'sf' Simple Features Collection with 37997 points
#             $osm_lines : 'sf' Simple Features Collection with 4251 linestrings
#          $osm_polygons : 'sf' Simple Features Collection with 2682 polygons
#        $osm_multilines : 'sf' Simple Features Collection with 248 multilinestrings
#     $osm_multipolygons : 'sf' Simple Features Collection with 65 multipolygons

lst <- lapply(st_layers("x.osm")$name, function(x) st_read("x.osm", x, quiet = TRUE))
structure(sapply(lst, nrow), names = st_layers("x.osm")$name)
#          points            lines multilinestrings    multipolygons 
#            6445             4447               52             2427 
# other_relations 
#             192 

since both functions read the same xml file, I was suprised to see that the number of features returned differs pretty much. I then looked a bit further into the GDAL OSM driver, which turns out to be highly configurable. Changing a few fields in the config file I got

#          points            lines multilinestrings    multipolygons 
#           37997             4569               52             2427 
# other_relations 
#             192 

Which at least gets the same number of points, and even more lines and polygons, than osmdata. The number of options did scare me somewhat, and made me wonder which osmdata.ini configuruation would correspond to the osmdata query result, and how these configuration options are handled by osmdata. By the query call? If so, do R users need to learn the OSM Overpass API query call language?

Minor things

The link to sfa (vignette 2) is http://www.opengeospatial.org/standards/sfa, not http://www.opengeospatial.org/standards/sfo

At several points, the vignettes misses the point that the word simple in simple features refers to "2D geometry with linear interpolation between vertices": simple (straight), as opposed to curved.

Concluding

This is a nice package that shows a significant programming effort to get OpenStreetMap vector data into R, using the R datastructures of either sp or sf. It does not give much directions what users can, or should do with these data. Compared to package osmar, it is not clear whether (nor how) the network topology is obtained from an osmdata query result, how e.g. the results from a query can be used to solve a shortest route problem in R.

The package claims to be considerably faster and to get more data, compared to the GDAL driver that is interfaced through e.g. R packages rgdal and sf. I could confirm neither of these claims: reading through sf::st_read had a very similar speed, and the GDAL driver is highly configurable, allowing it to also read e.g. all the points that are vertices in lines and polygons. It would be good if osmdata could document to which GDAL driver options its default query corresponds: OSM API queries seem to be extremely configurable.

I wonder, for an end-users, what the value is of getting a point set that consists of not only point features (such as traffic lights, or way signs) but also all vertices in lines and polygons. The vignettes of osmdata do not point out how, from an osmdata query result, one could for instance isolate the point features for plotting purposes, nor other user benefits of having the ID of every possible point available.

edzer commented Apr 10, 2017

Package Review of osmdata

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

It would be good to mention in the first sentence of the pkg description that it returns vector data, and not images (like pkg OpenStreetMaps does): not everyone knows what's behind the Overpass API.

  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.

the target audience is not addressed in paper.md: is it end-users interested in exploring and plotting, or package developers using osmdata to ingest raw data?

  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

paper.md does not give references to the packages used (imported) by osmdata, to packages or interfaces to which this package can be compared, or to the technology interfaced (open streetmaps, overpass api).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

I didn't check for automated tests, or the guidelines.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6-8


Review Comments

Design

This package imports openstreetmap vector data in R. It has functionality similar to osmar, which has not seen any development since 2013; osmar can export to sp, osmdata to sp and sf. osmar, on the other hand, can do shortest paths through a network, and exposes the street network to R users (combining sp and igraph), osmdata does not expose the network. The package claims it is fast compared to sf/GDAL, but the link supporting this claim is dead.

The package advertises itself mostly mentioning the overpass api, but not everyone familiar with R and (vaguely with) openstreetmap will know what that means; it would be good to also mention that osmdata reads openstreetmap vector data, as opposed to e.g. package OpenStreetMap which imports openstreetmap raster data (images).

Some of the functions are rather verbose; e.g. function osmdata checks all data for missing, and if they are, assigns NULL. It would be much shorter to make NULL their default value; the function can also be written as

osmdata <- function (bbox = NULL, overpass_call = NULL, osm_points =
	NULL, osm_lines = NULL, osm_polygons = NULL, osm_multilines = NULL, 
	osm_multipolygons = NULL, timestamp = NULL, ...) {

    structure(bbox = bbox, overpass_call = overpass_call, timestamp = timestamp, 
        osm_points = osm_points, osm_lines = osm_lines, osm_polygons = osm_polygons, 
        osm_multilines = osm_multilines, osm_multipolygons = osm_multipolygons,
		class = append(class(list()), "osmdata"))
}

which reduces 19 expressions to 1. Also, it is rather un R-like to make a class c("list", "osmdata"), instead of c("osmdata", "list") or even c("osmdata"), sincelistis the default S3 superclass, and that would allow to add methods forosmdata, and fall back tolistmethods where useful. The...is not used (andosmdata` is not a generic), and should be omitted, so that mis-typing of arguments lead to an error.

Function osmdata is supposed to be called internally only (not directly by users), and all the calls I could see look like osmdata () (why the space before ()?), so the whole list of arguments looks obsolete. The help page of osmdata gives an introduction to the whole package, to only reveal at the end that osmdata() should not be called. It mentions osm class def, but uses S3, where classes do not have a definition; somewhat confusing to those living in S3. I'm also not so sure whether the constructor function is needed at all: the class members are overwritten later without checking on using the $ operator; for a list L, a member that was not overwritten, like L$whatever, results in NULL also when it does not exist:

> x = list()
> x$bbox
# NULL

this implies that structure(list(), class = "osmdata") could replace calls to osmdata(), allowing for removal of osmdata() altogether.

The functions starting with osmdata_ export an overpass query to sp, sf or (write to file as) xml. It's somewhat confusing that

osmdata_sf(query, doc)

writes from doc into an object of class sf, where

osmdata_xml(query, doc)

writes the query to doc. Also, I had expected that option quiet = TRUE would suppress the message

q missing: osmdata object will not include query

The return of osmdata_sp, following the example, gives

> hampi_sp
Object of class 'osmdata' with:
                 $bbox : 15.3159769,76.4406199,15.3559769,76.4806199
        $overpass_call : The call submitted to the overpass API
            $timestamp : [ Sun 1 Mar 2017 20:04:23 ]
           $osm_points : 'sp' SpatialpointsDataFrame with 109 points
            $osm_lines : 'sp' SpatiallinesDataFrame with 0 lines
         $osm_polygons : 'sp' SpatialpolygonsDataFrame with 18 polygons
       $osm_multilines : 'sp' SpatialmultilinesDataFrame with 0 multilines
    $osm_multipolygons : 'sp' SpatialmultipolygonsDataFrame with 0 multipolygons

where it might be better to use the class names of sp classes (SpatialpointsDataFrame etc. are not class names of sp, or of something else). Package sp does not distinguish between osm_lines and osm_multilines, or osm_polygons and osm_multipolygon; the list above confusingly suggests this. Why this distinction?

Function names

Now, osmdata_sf, osmdata_sp and osmdata_xml have a large overlap; osmdata_sf and osmdata_sp could call osmdata_xml without the file argument to avoid this.

The vignette/README mention that

Or data can be read from a previously downloaded file:

x <- osmdata_sf(q1, "data.xml")

... but what is q1 used for in this call? It seems that q1 is added to the returned object, which takes its data from data.xml, but there is no guarantee that the two belong together. When the query call is added to the returned object (e.g. by osmdata_xml) this guarantee can be given, and ambiguity is excluded.

For the function names, it would help to memorize what they do if they would be, or contain, verbs; now, only getbb has a verb.

osmdata_sf is described as "Return an OSM Overpass query as an ‘osmdata’ object in ‘sf’ format." I would have opted for "convert the result of an OSM Overpass query to a list with ‘sf’ objects".

Install

During install, I see:

./configure: 5: [: Linux: unexpected operator

which points to a bug. The configure script aims at doing a coverage test on travis, but by checking for Linux. It might be better (esp. for linux users!) if it would test whether USER equals travis to create a conditional configuration for travis.

Performance, comparison

Since the the paper mentions data is "processed with very fast C++ routines for return to R", I compared the speed to that of the GDAL/OGR driver, using sf::st_read:

library(osmdata)
library(sf)

bb = c(11.561541, 48.14102, 11.582852, 48.1530)
q = opq(bbox = bb)
osmdata_xml(q, "x.osm")
system.time(o <- osmdata_sf(doc = "x.osm"))
layers = st_layers("x.osm")$name

osm_rd = function() { osmdata_sf(doc = "x.osm", quiet = TRUE) }
sf_rd = function() { system.time(lst <- lapply(layers, function(x) st_read("x.osm", x, quiet = TRUE))) }

library(microbenchmark)
microbenchmark::microbenchmark(osm_rd(), sf_rd(), times = 20)
# Unit: seconds
#      expr      min       lq     mean   median       uq      max neval
#  osm_rd() 2.393081 2.567723 2.645534 2.620662 2.708159 2.961327    20
#   sf_rd() 2.948154 3.195789 3.351276 3.318939 3.479391 3.760206    20

and see that the results are quite similar. Of interest is the difference between the objects returned:

o
# Object of class 'osmdata' with:
#                  $bbox : 11.3675909042358 48.0759544372559 11.7228775024414 48.2664070129395
#         $overpass_call : The call submitted to the overpass API
#             $timestamp : [ Sun 1 Mar 2017 20:02:02 ]
#            $osm_points : 'sf' Simple Features Collection with 37997 points
#             $osm_lines : 'sf' Simple Features Collection with 4251 linestrings
#          $osm_polygons : 'sf' Simple Features Collection with 2682 polygons
#        $osm_multilines : 'sf' Simple Features Collection with 248 multilinestrings
#     $osm_multipolygons : 'sf' Simple Features Collection with 65 multipolygons

lst <- lapply(st_layers("x.osm")$name, function(x) st_read("x.osm", x, quiet = TRUE))
structure(sapply(lst, nrow), names = st_layers("x.osm")$name)
#          points            lines multilinestrings    multipolygons 
#            6445             4447               52             2427 
# other_relations 
#             192 

since both functions read the same xml file, I was suprised to see that the number of features returned differs pretty much. I then looked a bit further into the GDAL OSM driver, which turns out to be highly configurable. Changing a few fields in the config file I got

#          points            lines multilinestrings    multipolygons 
#           37997             4569               52             2427 
# other_relations 
#             192 

Which at least gets the same number of points, and even more lines and polygons, than osmdata. The number of options did scare me somewhat, and made me wonder which osmdata.ini configuruation would correspond to the osmdata query result, and how these configuration options are handled by osmdata. By the query call? If so, do R users need to learn the OSM Overpass API query call language?

Minor things

The link to sfa (vignette 2) is http://www.opengeospatial.org/standards/sfa, not http://www.opengeospatial.org/standards/sfo

At several points, the vignettes misses the point that the word simple in simple features refers to "2D geometry with linear interpolation between vertices": simple (straight), as opposed to curved.

Concluding

This is a nice package that shows a significant programming effort to get OpenStreetMap vector data into R, using the R datastructures of either sp or sf. It does not give much directions what users can, or should do with these data. Compared to package osmar, it is not clear whether (nor how) the network topology is obtained from an osmdata query result, how e.g. the results from a query can be used to solve a shortest route problem in R.

The package claims to be considerably faster and to get more data, compared to the GDAL driver that is interfaced through e.g. R packages rgdal and sf. I could confirm neither of these claims: reading through sf::st_read had a very similar speed, and the GDAL driver is highly configurable, allowing it to also read e.g. all the points that are vertices in lines and polygons. It would be good if osmdata could document to which GDAL driver options its default query corresponds: OSM API queries seem to be extremely configurable.

I wonder, for an end-users, what the value is of getting a point set that consists of not only point features (such as traffic lights, or way signs) but also all vertices in lines and polygons. The vignettes of osmdata do not point out how, from an osmdata query result, one could for instance isolate the point features for plotting purposes, nor other user benefits of having the ID of every possible point available.

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge Apr 13, 2017

Contributor

Thanks @edzer for the useful comments and your detailed consideration of osmdata! 😸 These responses are divided into into two classes:

  • Points which we can easily respond to and in some cases already have.
  • Those for which we feel more discussion (with the reviewer and potentially others) and thinking (on our side) is needed.

Note that collective pronouns refer here to @mpadge, @Robinlovelace, and @maelle, unless otherwise explicitly indicated.

easy to solve issues 😎

First start with the easy bits 🍰, mostly already done with this commit:

  1. NULL assignment in function def rather than verbose re-def of missing -> NULL: Already done; thanks!

  2. Message produced on quiet = TRUE - agreed, ought not to have been there and has already been removed.

  3. Return of osmdata_sp - agreed: an inappropriate mis-interpretation of sp, and has already been rectified. osmdata objects nevertheless remain divided into the two respective osmdata types of lines / multilines and polygons / multipolygons, with sp forms now simply declaring these as SpatialLines and SpatialPolygons regardless of multi_ or not. The lack of distinction in sp between lines / multilines and polys / multipolys is a non-issue, because the OSM itself makes a fundamental distinction between these two to which osmdata adheres.

  4. configure message re: unexpected operator - agreed, that's no longer required for codecov anyway, so will be fixed.

  5. Link to claim of being faster than sf/GDAL is dead: Already fixed - thanks! (RL: I hope to add a benchmark comparing with osmar which in my experience is painfully slow. The example benchmark is useful, would be interesting to build on it by a) downloading some data also and b) comparing against larger datasets.)

  6. Link to sfa in vignette#2: Fixed - thanks once more!

open issues pending further discussion / debate 👷

Now on to those points regarding which we are uncertain how best to respond. 🌰

Code

  1. Class definition: we entirely agree with you here, but are unsure how best to resolve the issue. While a simple structure (list(), class = 'osmdata') is a great suggestion, the class def is strictly required (as far as we understand it) in order to provide the print.osmdata and c.osmdata functionality. These are both utterly necessary: The first to avoid all data being dumped to screen; and the second because it's enormously useful for combining query results within R. If you - or anyone else - can suggest ways to effectively print and c without defining a class, we'd be more than happy to remove the class def entirely. A solution is obviously strongly desired, because we agree entirely that it can be nothing other than confusing that the class constructor osmdata() is present yet neither can nor should be used.

  2. It seems that q1 (in the call x <- osmdata_sf(q1, "data.xml")) is added to the return object which takes its data from "data.xml"

    Yes, that's correct.

    but there is no guarantee that the two belong together.

    There are abundant checks within osmdata_sf() to ensure the forms and classes of those objects are appropriate to the call, with extensive tests throwing a range of incompatible objects in there, and generating appropriately informative messages.

    When the query call is added to the returned object (e.g. by osmdata_xml) this guarantee can be given, and ambiguity is excluded.

    The query call, q1 is added to the returned object, so I (MP) am unsure what to make of this. Note that osmdata_xml() really just returns the .xml/.osm data directly from overpass with no post-processing, and so is a fundamentally different function to osmdata_sf() and osmdata_sp(). The latter two really don't use or rely on the functionality of osmdata_xml().

  3. Comment that it's confusing that osmdata_sf (q, doc) writes from doc into sf, yet osmdata_xml (q, doc) writes to doc. I (MP) hadn't thought of it that way, but that could certainly be somewhat confusing. I am, however, unsure of how to resolve this. I like the simplicity of a mere three main functions all of form osmdata_abc(), with similar arguments in all cases. All functions can or do return data of the form specified after that subscript, so that's consistent. The only inconsistency is that osmdata_xml() also enables doc to be written. Making this any neater would likely require adding an additional step, so osmdata_xml() would purely serve to import into R, with an additional stage required to potentially write back to xml. I'm not sure what would be less confusing: the admittedly present yet likely quite mild confusion highlighted here, or unnecessarily breaking a currently very simple one-liner into two distinct functions? I simply don't know at present, but would welcome any suggestions.

  4. It would help to memorize what [the functions] do if [the function names]
    contain verbs

    yeah, we entirely agree as a general point, but as mentioned above, we also favour the simplicity of osmdata_abc() for the three primary functions. Extra verbs would likely best be incorporated by simply renaming these to osmdata_get_abc(), but we don't really see that as adding anything constructive. We'll nevertheless have a ponder in regard to other function names.

Package functionality

  1. osmar ... can do shortest paths through a network, and
    exposes the street network to R users (combining sp and igraph), osmdata does
    not expose the network.

    That's true, and mainly because our current vision of osmdata is as a coherent package that serves the singular purpose of getting OSM data into R in a way that most faithfully represents the structure and intent of OSM itself.

    We encourage others to add functionality building on osmdata for routing, e.g. building on the idea of stplanr::SpatialLinesNetwork() or a spnetwork package (RL).

    'Networks' are not a part of OSM - they are a post-imposition by others wishing to use OSM for routing (e.g. OSRM). We're unsure whether such arguably non-OSM functionality belongs in this package. A function to 'expose the network' has already been constructed as described in this osmdata issue. Potential incorporation depends on the ever-evolving discussions of the extent to which R packages should best encapsulate singularly focussed functionality, or broader ranges thereof. The upcoming fragmentation of devtools into numerous individual packages certainly reifies Hadley's view on that, by which we're inclined to abide. The concept that packages should 'do one thing unambiguously and do it well' is a sensible design philosophy we hope osmdata follows.

  2. The very important question raised just before the list of 'Minor things': > Do R users need to learn the OSM Overpass API query call language?

    We cannot provide a clear answer to this question. A analogous question would be 'can sf::st_read() be used/understood without learning GDAL?'. Yes it can be used and it mercifully makes life easier, but understanding the back-end code-base helps greatly. We have tried to communicate this in the vignette but are open to ideas for improving the communication of this message.

    The osmdata package is an attempt to facilitate direct access to OSM data without the need to understand this sometimes non-trivial query language (QL). It comes down to a compromise between breadth and depth. We've aimed for a good breadth, and so hope the vast majority of typical use cases can be executed just by examining the osmdata documentation and forgetting entirely about overpass. The depth has hopefully nevertheless been retained so truly complex queries nevertheless remain possible. Do R users need to learn the QL? We would certainly hope not. Would learning the QL enable more powerful use of osmdata? We would certainly hope so. Where can we best strike the balance there between? ... We don't know, but we've given it a good shot.

  3. it would be good to also mention that osmdata reads openstreetmap vector data, as opposed to e.g. package OpenStreetMap which imports openstreetmap raster data (images).

    We don't agree here because osmdata simply reads data from OpenStreetMap in a form that reflects as accurately as possible the underlying form of OSM, which is a vector form. Raster images have nothing to do with OSM data - they are only used by OSM to generate tiles for their web interface, which in turn merely serves to visualise the underlying data. That's why the package is called osmdata and not osm (or whatever). It's about the data. The R package OpenStreetMap is in no way directly connected with OSM, and does not directly deliver OSM data; it merely renders them in visual form.

    We nevertheless acknowledge in response to this concern the importance of mentioning how osmdata differs from the very different OpenStreetMap package, in order to avoid cofusion around names, and have flagged this task in an osmdata issue.

  4. I wonder, for an end-users, what the value is of getting a point set that consists of not only point features (such as traffic lights, or way signs) but also all vertices in lines and polygons.

    The package simply aims to bring OSM data into R in a form that resembles as closely as possible the structure and design of OSM data itself. Points would only be removed because of some arbitrary decision on our part that they are somehow not as important as other data components - it is an assuredly far more neutral design decision to simply leave the data in as intact a form as possible, and thereby enable the widest possible range of potential usage. OSM demands unique ID labels on all components so that they can always be connected together: all points within a line, all lines which contain any given point, all whatever which are within or contain bits of whatever else. We see no need to remove or reduce the ability of osmdata to transfer as directly as possible such abilities to assemble and dissemble any and all data components within an R environment. osmdata brings OSM data into R it its entirety; the GDAL OSM driver does not extract complete data because even with config set to full volume, GDAL strips all object IDs, and so prevents any ability to assemble and dissemble geometrically distinct components.

    Nevertheless, in response to this legitimate concern, we have decided on the following package modifications:

    • The primary osmdata_...() functions shall remain as they are and import all OSM data, to ensure that they remain as true as possible to the underlying data model of OSM itself; but
    • We will implement a new function (see this osmdata issue) that enables reduction of a given osmdata object to include only unique elements of each geometric class. The result of this will then more closely resemble the output of GDAL, while retaining the primary difference described above that osmdata will still retain all individual OSM IDs for all components.

    We nevertheless note in direct response to the above concern that a use case would be routing to minimse numbers of traffic lights, which could only be done through storing all points as point objects in addition to their presence as line and polygon vertices.

Comparison with GDAL

  1. ''Performance, comparison'' and speed: this is a useful benchmark, which I (RL) think a modified version of would be useful in the vignette to illustrate speed and usage differences with the 'raw' way of doing things.

    With due apology for the now-rectified dead link mentioned above, the presented figures nevertheless reveal that osmdata is indeed >20% faster than sf/GDAL. Is GDAL very fast? Surely so. More importantly, are either sf or osmdata very fast in comparison to the long-standing singular alternative for getting OSM data into R? We can't even given numbers, but the answer is yes by factors of thousands. Both sf/GDAL and osmdata are very fast, yet osmdata remains nevertheless the fastest. Surely that's a justification for claiming ''very fast''? Compared to all other currently possible alternatives, it is so. And finally, 20% faster than the current fastest surely represents an improvement?

  2. Changing the config file of the GDAL OSM driver: Other than explicit customisation of which key fields ought to be returned, the only options that the GDAL config file enables are potentially returning all points and all ways. There still remains no way within sf/GDAL to connect any of these ways (whether line or polygon objects) to the multi.. objects of which they may or may not be members. osmdata retains all such membership, and through the recursive searching functions enables immediate reconstruction of all members of any given object, or of all objects which share a set of members. That is simply not possible with sf/GDAL, no matter how much the GDAL config file is customised. In short, there is no GDAL config setting that can reconstruct how osmdata extracts data, and we stand by the claims of our second vignette that osmdata remains generally ''truer'' to the underlying structure of OSM data (with due acknowledgement in that vignette that the entire SF scheme was never intended for such purposes, so that claim ought not be interpreted as a criticism!).

Thankfulness

Note that we consciously didn't acknowledge @edzer's contributions prior to review but will definitely do it after review with a double role:

  • cph of sf code

  • rev because of the improvements of the package thanks to his review 👌

Contributor

mpadge commented Apr 13, 2017

Thanks @edzer for the useful comments and your detailed consideration of osmdata! 😸 These responses are divided into into two classes:

  • Points which we can easily respond to and in some cases already have.
  • Those for which we feel more discussion (with the reviewer and potentially others) and thinking (on our side) is needed.

Note that collective pronouns refer here to @mpadge, @Robinlovelace, and @maelle, unless otherwise explicitly indicated.

easy to solve issues 😎

First start with the easy bits 🍰, mostly already done with this commit:

  1. NULL assignment in function def rather than verbose re-def of missing -> NULL: Already done; thanks!

  2. Message produced on quiet = TRUE - agreed, ought not to have been there and has already been removed.

  3. Return of osmdata_sp - agreed: an inappropriate mis-interpretation of sp, and has already been rectified. osmdata objects nevertheless remain divided into the two respective osmdata types of lines / multilines and polygons / multipolygons, with sp forms now simply declaring these as SpatialLines and SpatialPolygons regardless of multi_ or not. The lack of distinction in sp between lines / multilines and polys / multipolys is a non-issue, because the OSM itself makes a fundamental distinction between these two to which osmdata adheres.

  4. configure message re: unexpected operator - agreed, that's no longer required for codecov anyway, so will be fixed.

  5. Link to claim of being faster than sf/GDAL is dead: Already fixed - thanks! (RL: I hope to add a benchmark comparing with osmar which in my experience is painfully slow. The example benchmark is useful, would be interesting to build on it by a) downloading some data also and b) comparing against larger datasets.)

  6. Link to sfa in vignette#2: Fixed - thanks once more!

open issues pending further discussion / debate 👷

Now on to those points regarding which we are uncertain how best to respond. 🌰

Code

  1. Class definition: we entirely agree with you here, but are unsure how best to resolve the issue. While a simple structure (list(), class = 'osmdata') is a great suggestion, the class def is strictly required (as far as we understand it) in order to provide the print.osmdata and c.osmdata functionality. These are both utterly necessary: The first to avoid all data being dumped to screen; and the second because it's enormously useful for combining query results within R. If you - or anyone else - can suggest ways to effectively print and c without defining a class, we'd be more than happy to remove the class def entirely. A solution is obviously strongly desired, because we agree entirely that it can be nothing other than confusing that the class constructor osmdata() is present yet neither can nor should be used.

  2. It seems that q1 (in the call x <- osmdata_sf(q1, "data.xml")) is added to the return object which takes its data from "data.xml"

    Yes, that's correct.

    but there is no guarantee that the two belong together.

    There are abundant checks within osmdata_sf() to ensure the forms and classes of those objects are appropriate to the call, with extensive tests throwing a range of incompatible objects in there, and generating appropriately informative messages.

    When the query call is added to the returned object (e.g. by osmdata_xml) this guarantee can be given, and ambiguity is excluded.

    The query call, q1 is added to the returned object, so I (MP) am unsure what to make of this. Note that osmdata_xml() really just returns the .xml/.osm data directly from overpass with no post-processing, and so is a fundamentally different function to osmdata_sf() and osmdata_sp(). The latter two really don't use or rely on the functionality of osmdata_xml().

  3. Comment that it's confusing that osmdata_sf (q, doc) writes from doc into sf, yet osmdata_xml (q, doc) writes to doc. I (MP) hadn't thought of it that way, but that could certainly be somewhat confusing. I am, however, unsure of how to resolve this. I like the simplicity of a mere three main functions all of form osmdata_abc(), with similar arguments in all cases. All functions can or do return data of the form specified after that subscript, so that's consistent. The only inconsistency is that osmdata_xml() also enables doc to be written. Making this any neater would likely require adding an additional step, so osmdata_xml() would purely serve to import into R, with an additional stage required to potentially write back to xml. I'm not sure what would be less confusing: the admittedly present yet likely quite mild confusion highlighted here, or unnecessarily breaking a currently very simple one-liner into two distinct functions? I simply don't know at present, but would welcome any suggestions.

  4. It would help to memorize what [the functions] do if [the function names]
    contain verbs

    yeah, we entirely agree as a general point, but as mentioned above, we also favour the simplicity of osmdata_abc() for the three primary functions. Extra verbs would likely best be incorporated by simply renaming these to osmdata_get_abc(), but we don't really see that as adding anything constructive. We'll nevertheless have a ponder in regard to other function names.

Package functionality

  1. osmar ... can do shortest paths through a network, and
    exposes the street network to R users (combining sp and igraph), osmdata does
    not expose the network.

    That's true, and mainly because our current vision of osmdata is as a coherent package that serves the singular purpose of getting OSM data into R in a way that most faithfully represents the structure and intent of OSM itself.

    We encourage others to add functionality building on osmdata for routing, e.g. building on the idea of stplanr::SpatialLinesNetwork() or a spnetwork package (RL).

    'Networks' are not a part of OSM - they are a post-imposition by others wishing to use OSM for routing (e.g. OSRM). We're unsure whether such arguably non-OSM functionality belongs in this package. A function to 'expose the network' has already been constructed as described in this osmdata issue. Potential incorporation depends on the ever-evolving discussions of the extent to which R packages should best encapsulate singularly focussed functionality, or broader ranges thereof. The upcoming fragmentation of devtools into numerous individual packages certainly reifies Hadley's view on that, by which we're inclined to abide. The concept that packages should 'do one thing unambiguously and do it well' is a sensible design philosophy we hope osmdata follows.

  2. The very important question raised just before the list of 'Minor things': > Do R users need to learn the OSM Overpass API query call language?

    We cannot provide a clear answer to this question. A analogous question would be 'can sf::st_read() be used/understood without learning GDAL?'. Yes it can be used and it mercifully makes life easier, but understanding the back-end code-base helps greatly. We have tried to communicate this in the vignette but are open to ideas for improving the communication of this message.

    The osmdata package is an attempt to facilitate direct access to OSM data without the need to understand this sometimes non-trivial query language (QL). It comes down to a compromise between breadth and depth. We've aimed for a good breadth, and so hope the vast majority of typical use cases can be executed just by examining the osmdata documentation and forgetting entirely about overpass. The depth has hopefully nevertheless been retained so truly complex queries nevertheless remain possible. Do R users need to learn the QL? We would certainly hope not. Would learning the QL enable more powerful use of osmdata? We would certainly hope so. Where can we best strike the balance there between? ... We don't know, but we've given it a good shot.

  3. it would be good to also mention that osmdata reads openstreetmap vector data, as opposed to e.g. package OpenStreetMap which imports openstreetmap raster data (images).

    We don't agree here because osmdata simply reads data from OpenStreetMap in a form that reflects as accurately as possible the underlying form of OSM, which is a vector form. Raster images have nothing to do with OSM data - they are only used by OSM to generate tiles for their web interface, which in turn merely serves to visualise the underlying data. That's why the package is called osmdata and not osm (or whatever). It's about the data. The R package OpenStreetMap is in no way directly connected with OSM, and does not directly deliver OSM data; it merely renders them in visual form.

    We nevertheless acknowledge in response to this concern the importance of mentioning how osmdata differs from the very different OpenStreetMap package, in order to avoid cofusion around names, and have flagged this task in an osmdata issue.

  4. I wonder, for an end-users, what the value is of getting a point set that consists of not only point features (such as traffic lights, or way signs) but also all vertices in lines and polygons.

    The package simply aims to bring OSM data into R in a form that resembles as closely as possible the structure and design of OSM data itself. Points would only be removed because of some arbitrary decision on our part that they are somehow not as important as other data components - it is an assuredly far more neutral design decision to simply leave the data in as intact a form as possible, and thereby enable the widest possible range of potential usage. OSM demands unique ID labels on all components so that they can always be connected together: all points within a line, all lines which contain any given point, all whatever which are within or contain bits of whatever else. We see no need to remove or reduce the ability of osmdata to transfer as directly as possible such abilities to assemble and dissemble any and all data components within an R environment. osmdata brings OSM data into R it its entirety; the GDAL OSM driver does not extract complete data because even with config set to full volume, GDAL strips all object IDs, and so prevents any ability to assemble and dissemble geometrically distinct components.

    Nevertheless, in response to this legitimate concern, we have decided on the following package modifications:

    • The primary osmdata_...() functions shall remain as they are and import all OSM data, to ensure that they remain as true as possible to the underlying data model of OSM itself; but
    • We will implement a new function (see this osmdata issue) that enables reduction of a given osmdata object to include only unique elements of each geometric class. The result of this will then more closely resemble the output of GDAL, while retaining the primary difference described above that osmdata will still retain all individual OSM IDs for all components.

    We nevertheless note in direct response to the above concern that a use case would be routing to minimse numbers of traffic lights, which could only be done through storing all points as point objects in addition to their presence as line and polygon vertices.

Comparison with GDAL

  1. ''Performance, comparison'' and speed: this is a useful benchmark, which I (RL) think a modified version of would be useful in the vignette to illustrate speed and usage differences with the 'raw' way of doing things.

    With due apology for the now-rectified dead link mentioned above, the presented figures nevertheless reveal that osmdata is indeed >20% faster than sf/GDAL. Is GDAL very fast? Surely so. More importantly, are either sf or osmdata very fast in comparison to the long-standing singular alternative for getting OSM data into R? We can't even given numbers, but the answer is yes by factors of thousands. Both sf/GDAL and osmdata are very fast, yet osmdata remains nevertheless the fastest. Surely that's a justification for claiming ''very fast''? Compared to all other currently possible alternatives, it is so. And finally, 20% faster than the current fastest surely represents an improvement?

  2. Changing the config file of the GDAL OSM driver: Other than explicit customisation of which key fields ought to be returned, the only options that the GDAL config file enables are potentially returning all points and all ways. There still remains no way within sf/GDAL to connect any of these ways (whether line or polygon objects) to the multi.. objects of which they may or may not be members. osmdata retains all such membership, and through the recursive searching functions enables immediate reconstruction of all members of any given object, or of all objects which share a set of members. That is simply not possible with sf/GDAL, no matter how much the GDAL config file is customised. In short, there is no GDAL config setting that can reconstruct how osmdata extracts data, and we stand by the claims of our second vignette that osmdata remains generally ''truer'' to the underlying structure of OSM data (with due acknowledgement in that vignette that the entire SF scheme was never intended for such purposes, so that claim ought not be interpreted as a criticism!).

Thankfulness

Note that we consciously didn't acknowledge @edzer's contributions prior to review but will definitely do it after review with a double role:

  • cph of sf code

  • rev because of the improvements of the package thanks to his review 👌

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Apr 21, 2017

Member

@ramnathv Gentle reminder that we hope to have your review in a couple of weeks (May 3rd) 🙏

Member

karthik commented Apr 21, 2017

@ramnathv Gentle reminder that we hope to have your review in a couple of weeks (May 3rd) 🙏

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik May 9, 2017

Member

@ramnathv Another quick ping.

Member

karthik commented May 9, 2017

@ramnathv Another quick ping.

@ramnathv

This comment has been minimized.

Show comment
Hide comment
@ramnathv

ramnathv May 10, 2017

@karthik I need a few more days to wrap this up. Apologies for the delay.

@karthik I need a few more days to wrap this up. Apologies for the delay.

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge May 17, 2017

Contributor

@ramnathv note that tests currently fail because of this issue with the current CRAN version of sf. It's already fixed with sf 0.5.0, but tests won't pass again until that's on CRAN. temporary workaround implemented so tests pass again 😄

Contributor

mpadge commented May 17, 2017

@ramnathv note that tests currently fail because of this issue with the current CRAN version of sf. It's already fixed with sf 0.5.0, but tests won't pass again until that's on CRAN. temporary workaround implemented so tests pass again 😄

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik May 23, 2017

Member

Hi @mpadge, even though we did not receive the second review, we are pleased with Edzer's and also note that the package has been successfully submitted to CRAN. So I am pleased to accept this package into the rOpenSci suite! 🎉

I've created a new team called osmdata with you and me as members. Please go ahead and request a transfer of the pacakge the rOpenSci github account.

Once you transfer, please add the rOpenSci footer badge at the bottom of your README.

![](https://ropensci.org/public_images/github_footer.png)

Then you'll need to update CI links and badge links and also quickly reactivate CI services. We can help with this part.

Please reach out with any questions.

Member

karthik commented May 23, 2017

Hi @mpadge, even though we did not receive the second review, we are pleased with Edzer's and also note that the package has been successfully submitted to CRAN. So I am pleased to accept this package into the rOpenSci suite! 🎉

I've created a new team called osmdata with you and me as members. Please go ahead and request a transfer of the pacakge the rOpenSci github account.

Once you transfer, please add the rOpenSci footer badge at the bottom of your README.

![](https://ropensci.org/public_images/github_footer.png)

Then you'll need to update CI links and badge links and also quickly reactivate CI services. We can help with this part.

Please reach out with any questions.

@karthik karthik closed this May 23, 2017

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge May 23, 2017

Contributor

Thanks @karthik - i've requested the transfer and just have to presume you get it (github notifications not so good there - i just got told i didn't have permission to do that). And thanks especially for boldly going ahead while still technically waiting for rev#2. I'll update CI guff as soon as transferred, and will also do the required DOI archiving. Two questions in that regard:

  1. Will just one review be sufficient for JOSS, or will a second need to be requested?
  2. Do or will I need to do anything further in regard to JOSS consideration, notably including informing them of DOI?
Contributor

mpadge commented May 23, 2017

Thanks @karthik - i've requested the transfer and just have to presume you get it (github notifications not so good there - i just got told i didn't have permission to do that). And thanks especially for boldly going ahead while still technically waiting for rev#2. I'll update CI guff as soon as transferred, and will also do the required DOI archiving. Two questions in that regard:

  1. Will just one review be sufficient for JOSS, or will a second need to be requested?
  2. Do or will I need to do anything further in regard to JOSS consideration, notably including informing them of DOI?
@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge May 24, 2017

Contributor

@karthik All CI links and badges updated but I can't push becuase

# remote: Permission to ropensci/osmdata.git denied to mpadge.
Contributor

mpadge commented May 24, 2017

@karthik All CI links and badges updated but I can't push becuase

# remote: Permission to ropensci/osmdata.git denied to mpadge.
@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik May 24, 2017

Member

Will just one review be sufficient for JOSS, or will a second need to be requested?

Yes, that's sufficient. I'll handle your manuscript at the JOSS end.

Do or will I need to do anything further in regard to JOSS consideration, notably including informing them of DOI?

Right before initiating the JOSS submission, let's make sure that the version you have submitted to CRAN is archived in Zenodo and has a DOI ready.

# remote: Permission to ropensci/osmdata.git denied to mpadge.

bummer. Can you give me (@karthik) admin access on your original repo and I will quickly make the move and give you admin here?

Member

karthik commented May 24, 2017

Will just one review be sufficient for JOSS, or will a second need to be requested?

Yes, that's sufficient. I'll handle your manuscript at the JOSS end.

Do or will I need to do anything further in regard to JOSS consideration, notably including informing them of DOI?

Right before initiating the JOSS submission, let's make sure that the version you have submitted to CRAN is archived in Zenodo and has a DOI ready.

# remote: Permission to ropensci/osmdata.git denied to mpadge.

bummer. Can you give me (@karthik) admin access on your original repo and I will quickly make the move and give you admin here?

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik May 24, 2017

Member

@mpadge Please try pushing now. I was able to add your team with admin permission.

Member

karthik commented May 24, 2017

@mpadge Please try pushing now. I was able to add your team with admin permission.

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge May 31, 2017

Contributor

@karthik slight teething probs with CRAN - as soon as 0.0.2 is up, i'll archive it and let you know. (Submitted to CRAN over a week ago ...)

Contributor

mpadge commented May 31, 2017

@karthik slight teething probs with CRAN - as soon as 0.0.2 is up, i'll archive it and let you know. (Submitted to CRAN over a week ago ...)

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge Jun 6, 2017

Contributor

@karthik finally deposited and doi'd here. This version was submitted to CRAN on 22nd May, and should hopefully appear there sometime soon.

Contributor

mpadge commented Jun 6, 2017

@karthik finally deposited and doi'd here. This version was submitted to CRAN on 22nd May, and should hopefully appear there sometime soon.

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Jun 17, 2017

Member

@mpadge excellent. I'll follow up with next steps for JOSS.

Member

karthik commented Jun 17, 2017

@mpadge excellent. I'll follow up with next steps for JOSS.

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Jun 21, 2017

Member

@mpadge Please submit your paper to JOSS and assign me as the editor on the submission screen please. http://joss.theoj.org/papers/new

Member

karthik commented Jun 21, 2017

@mpadge Please submit your paper to JOSS and assign me as the editor on the submission screen please. http://joss.theoj.org/papers/new

@mpadge

This comment has been minimized.

Show comment
Hide comment
@mpadge

mpadge Jun 25, 2017

Contributor

Sorry for slight delay @karthik - now submitted and good to go.

Contributor

mpadge commented Jun 25, 2017

Sorry for slight delay @karthik - now submitted and good to go.

@arfon arfon referenced this issue Jun 26, 2017

Closed

[REVIEW]: osmdata #305

18 of 18 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment