# geoparser: geoparsing with the geoparser.io API #43

Closed
opened this Issue May 8, 2016 · 16 comments

Projects
None yet
4 participants
Member

### maelle commented May 8, 2016 • edited

 What does this package do? (explain in 50 words or less) This package is an interface to the geoparser.io API that identifies places mentioned in text, disambiguates those places, and returns data about the places found in the text. Paste the full DESCRIPTION file inside a code block below. Package: geoparser Type: Package Title: Interface to the Geoparser.io API Version: 0.1.0 Authors@R: person("Maëlle", "Salmon", email = "maelle.salmon@yahoo.se", role = c("aut", "cre")) Description: This packages accesses the Geoparser.io API which is a web service that identifies places mentioned in text, disambiguates those places, and returns detailed data about the places found in the text. License: GPL (>= 2) LazyData: TRUE URL: http://github.com/masalmon/geoparser BugReports: http://github.com/masalmon/geoparser/issues Encoding: UTF-8 RoxygenNote: 5.0.1 Suggests: testthat, knitr, rmarkdown Depends: dplyr, httr, jsonlite, lazyeval, tidyr VignetteBuilder: knitr  URL for the package (the development repository, not a stylized html page) https://github.com/masalmon/geoparser What data source(s) does it work with (if applicable)? The geoparser.io API which is version beta. They said "We expect it to remain largely the same for the immediate future. We'd try to make any changes backwards compatible, and support older versions of the API for an extended period of time to allow users to migrate at their convenience. However, the most likely things to change are: The possible values for Feature.properties.type in the GeoJSON response data How we calculate values for Feature.properties.confidence in the GeoJSON response data The endpoint URI for the API itself" Who is the target audience? People interested in Natural Language Processing, or anyone with text where they would like to identify geographical information. Are there other R packages that accomplish the same thing? If so, what is different about yours? I don't think so. Named Entity Recognition (including placenames) in R is currently offered by the openNLP package that requires Java but then the results are not geolocated. Check the box next to each policy below, confirming that you agree. These are mandatory. [x ] This package does not violate the Terms of Service of any service it interacts with. [ x] The repository has continuous integration with Travis CI and/or another service [ x] The package contains a vignette [ x] The package contains a reasonably complete README with devtools install instructions [ x] The package contains unit tests [ x] The package only exports functions to the NAMESPACE that are intended for end users Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain. Are there any package dependencies not on CRAN? [ x] Do you intend for this package to go on CRAN? [ x] Does the package have a CRAN accepted license? Did devtools::check() produce any errors or warnings? If so paste them below. Please add explanations below for any exceptions to the above: If this is a resubmission following rejection, please explain the change in circumstances.
Member

### sckott commented May 9, 2016

 Thanks for your submission @masalmon Seeking reviewers now

Collaborator

### noamross commented May 9, 2016

 Reviewers: @hrbrmstr Due date: 2016-05-30

Closed

Member

### sckott commented May 30, 2016

 @hrbrmstr Due date: 2016-05-30 - hey there, it's been 21 days, please get your review in soon, thanks 😺 (ropensci-bot)
Member

### sckott commented Jun 4, 2016

 @hrbrmstr Due date: 2016-05-30 - hey there, it's been 26 days, please get your review in soon, thanks 😺 (ropensci-bot)
Member

### sckott commented Jun 9, 2016

 @hrbrmstr Due date: 2016-05-30 - hey there, it's been 31 days, please get your review in soon, thanks 😺 (ropensci-bot)

### hrbrmstr commented Jun 10, 2016

 on it now (apologies)
Member

 thx

### Package Name

• geoparser completely meets the rOpenSci requirements.

### Function/variable Naming

• geoparser completely meets the rOpenSci requirements.

• 𐄂 The rOpenSci footer is missing.
• README.md is created from README.Rmd

### Code of Conduct

• geoparser completely meets the rOpenSci requirements.

### Documentation

• roxygen2 is used
• 𐄂 #' @noRd is missing from internal functions
• 𐄂 NEWS.md is missing

### Authorship

• geoparser completely meets the rOpenSci requirements.

### Testing

• geoparser makes good use of tests for various API components
• 𐄂 testthat::skip_on_cran() is mising from the tests that make API calls

### Versioning

• geoparser completely meets the rOpenSci requirements.

### Continuous integration

• geoparser completely meets the rOpenSci requirements (uses appveyor and travis)

### Examples

• geoparser completely meets the rOpenSci requirements.

### Package dependencies

• 𐄂 The items in Depends: should really be in Imports:

### DESCRIPTION format

• 𐄂 In the Description: field, 'This packages' will (depending on the day) get a nastygram from CRAN overlords. A suggested altrnative wording is A wrapper for the the Geoparser.io API, which is a web service that identifies places mentioned in text, disambiguates those places, and returns detailed data about the places found in the text. Basic, limited API access is free with paid plans to accomodate larger workloads.

### Recommended scaffolding

• geoparser completely meets the rOpenSci requirements.

### Console messages

• geoparser completely meets the rOpenSci requirements.

### CRAN Check Issues

• 𐄂 Undefined global functions or variables: start caused by NSE dplyr calls. It's explainable to CRAN but you may want to consider switching to standard evaluation or putting in a start <- NULL to avoid the CRAN check complaint.
• 𐄂 Non-standard file/directory found at top level: ‘README.Rmd’ caused by a lack of ^README\.Rmd$ in .Rbuildignore. ### General Review • NOTE: (~90% of issues noted above and below are fixed in a PR to @masalmon's repo. I owed her and y'all at least that for the delay) • This is a super neat package and I now respect the API service more than I did a few months ago :-) I tried it with some interesting texts and it came back with good results (their algorithm is impressive). I had no issues understanding how to get an API key everything (apart from the CRAN checks) "just worked" as advertised. • All vignettes and tests run perfectly on OS X & Linux. • Clever emoji use in the GitHub repo title! • Nice use of (in many places) standard evaluation alternatives to NSE functions. This is super helpful in packages to avoid CRAN check issues. Really nice job. • Nice use of geoparser_url() vs hardcoding it in every function call that does URL access. • Nice use of the "standard"/modern R way to access APIs requring an API key. • Great instructions & references for obtaining an API key and installing it. • General exemplary use of httr for API calls, however… • There is no purrr::safely() or tryCatch() usage which would help to trap errors in the event of a geoparser.io service disruption or local lack of an internet connction. This is prbly not critical for 0.1.0 release but could be a nice addition for a 0.2.0 release. • It's safer to call status_code() than test the individual $ elements.
• Your future self might remember what was going on in function_df() and geoparser_parse() but she would also prbly not mind a short description of what's going on in the transformation, especially if the API changes substantially and it's been a while since she's seen the code. :-)
• While it is highly unlikely references to functions such as tbl_df() are going to have namespace collisions, it is usually better to prefix them with the package name; i.e. tibble::tbl_df() or dplyr::tbl_df() vs leave them bare. This was done for httr::content(), dplyr::bind_rows() & jsonlite::fromJSON()but not for other function calls.
• The title of the main function could be more "directive". i.e. Issue a geoparser query vs Geoparser query (not super important)
• URLs in roxygen documentation should be wrapped either in \url{} or \href{}{}. This is done in some places but not others.
• The use of \code{} is required vs markdown back-ticks (at least for now, there are updates coming to roxygen2 that will enable backticks for marking monosapaced font segments). This happens quite a bit (to me, at least) when I'm copying from vignettes or Rmd files to roxygen
• While there is no inherent limitation on roxygen comment line length, it may be better to wrap them at the code-line-length margin (80 chars)
• In the Value section of geoparser_q() the column names of the data.frame element item lists should be in monospace (like the ones you get for free with roxygen when using @param). Also, where there are qualifiers such as see GeoNames.org for a complete list. Subject to change, it might be useful to wrap those in \emph{}.
• It might be helpful to add a: #' @references \href{https://geoparser.io/docs.html}{Geoparser.io documentation} to the geoparser_q() documentation.
• The reference to the opencage package could be converted to an \href{}{} vs \url{}.
• The documentation (both the package and the geoparser.io site) states 8KB as a max limit for the request header but you test for 8,000 vs 8,192. It might be useful to clarify this (though I suspect there won't be many users who would hit this condition and have amiguity).
• Using vapply or one of the purrr classed alternatives over lapply is recommended for more guaranteed type-safety. Without actually tracing the execution of those functions (see "future self" bullet) it may not buy you much in terms of safety (i.e. if most of those just return lists). A cursory iteration thorough both functions (once) suggests that this could be a 0.2.0 internal feature.
• URLencode should be utils::URLencode and the associated @importFrom utils URLencode added.
• In the httr::POST() call in geoparser_get() you can use httr::accept_json() vs add the header manually. I see why you manually did the encoding for the body content, though (that was a good addition and I suspect was super-annoying to figure out; again, really nice job!)
• Spaces around = are missing in a few spots (and there are spaces around = in the vast majority of uses of them, hence the exceptions being noticeable.
• prettify is not used but was imported. This was most likely used during development (I've done that as well).
Collaborator

### hrbrmstr commented Jun 16, 2016

 aye. i also re-looked at the code and it's #spiffy. On Thu, Jun 16, 2016 at 11:10 AM, Noam Ross notifications@github.com wrote: Thanks, @masalmon https://github.com/masalmon! @hrbrmstr https://github.com/hrbrmstr, please let us know if the response addresses everything to your satisfaction. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub #43 (comment), or mute the thread https://github.com/notifications/unsubscribe/AAfHtt5d79VJ4YiREio_79tgbK6gjPN_ks5qMWd8gaJpZM4IZlmP .

Collaborator

### noamross commented Jun 16, 2016 • edited

 Thank you @hrbrmstr. I have done an editors' check and you are approved! Please go ahead and transfer the repository to ropenscilabs and add the rOpenSci footer. 👏
Member

### maelle commented Jun 16, 2016

 Awesome, many thanks to you both, I'll do this ASAP!
Member

### maelle commented Jun 16, 2016

 Transfer done. Next step will be the CRAN submission!