New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geoparser: geoparsing with the geoparser.io API #43

Closed
maelle opened this Issue May 8, 2016 · 16 comments

Comments

Projects
None yet
4 participants
@maelle
Copy link
Member

maelle commented May 8, 2016

    1. What does this package do? (explain in 50 words or less)
      This package is an interface to the geoparser.io API that identifies places mentioned in text, disambiguates those places, and returns data about the places found in the text.
    1. Paste the full DESCRIPTION file inside a code block below.
Package: geoparser
Type: Package
Title: Interface to the Geoparser.io API
Version: 0.1.0
Authors@R: person("Maëlle", "Salmon", email = "maelle.salmon@yahoo.se", role = c("aut", "cre"))
Description: This packages accesses the Geoparser.io API which is a web service
    that identifies places mentioned in text, disambiguates those places, and
    returns detailed data about the places found in the text.
License: GPL (>= 2)
LazyData: TRUE
URL: http://github.com/masalmon/geoparser
BugReports: http://github.com/masalmon/geoparser/issues
Encoding: UTF-8
RoxygenNote: 5.0.1
Suggests: testthat,
    knitr,
    rmarkdown
Depends: dplyr, httr, jsonlite, lazyeval, tidyr
VignetteBuilder: knitr

    1. URL for the package (the development repository, not a stylized html page)
      https://github.com/masalmon/geoparser
    1. What data source(s) does it work with (if applicable)?
      The geoparser.io API which is version beta. They said "We expect it to remain largely the same for the immediate future. We'd try to make any changes backwards compatible, and support older versions of the API for an extended period of time to allow users to migrate at their convenience. However, the most likely things to change are:
    2. The possible values for Feature.properties.type in the GeoJSON response data
    3. How we calculate values for Feature.properties.confidence in the GeoJSON response data
    4. The endpoint URI for the API itself"
    1. Who is the target audience?
      People interested in Natural Language Processing, or anyone with text where they would like to identify geographical information.
    1. Are there other R packages that accomplish the same thing? If so, what is different about yours?
      I don't think so. Named Entity Recognition (including placenames) in R is currently offered by the openNLP package that requires Java but then the results are not geolocated.
    1. Check the box next to each policy below, confirming that you agree. These are mandatory.
    • [x ] This package does not violate the Terms of Service of any service it interacts with.
    • [ x] The repository has continuous integration with Travis CI and/or another service
    • [ x] The package contains a vignette
    • [ x] The package contains a reasonably complete README with devtools install instructions
    • [ x] The package contains unit tests
    • [ x] The package only exports functions to the NAMESPACE that are intended for end users
    1. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
    • Are there any package dependencies not on CRAN?
    • [ x] Do you intend for this package to go on CRAN?
    • [ x] Does the package have a CRAN accepted license?
    • Did devtools::check() produce any errors or warnings? If so paste them below.
    1. Please add explanations below for any exceptions to the above:
    1. If this is a resubmission following rejection, please explain the change in circumstances.
@sckott

This comment has been minimized.

Copy link
Member

sckott commented May 9, 2016

Thanks for your submission @masalmon Seeking reviewers now

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented May 9, 2016

Reviewers: @hrbrmstr
Due date: 2016-05-30

@sckott

This comment has been minimized.

Copy link
Member

sckott commented May 30, 2016

@hrbrmstr
Due date: 2016-05-30 - hey there, it's been 21 days, please get your review in soon, thanks 😺 (ropensci-bot)

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jun 4, 2016

@hrbrmstr
Due date: 2016-05-30 - hey there, it's been 26 days, please get your review in soon, thanks 😺 (ropensci-bot)

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jun 9, 2016

@hrbrmstr
Due date: 2016-05-30 - hey there, it's been 31 days, please get your review in soon, thanks 😺 (ropensci-bot)

@hrbrmstr

This comment has been minimized.

Copy link

hrbrmstr commented Jun 10, 2016

on it now (apologies)

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jun 10, 2016

thx

@hrbrmstr

This comment has been minimized.

Copy link

hrbrmstr commented Jun 10, 2016

Package Name

  • geoparser completely meets the rOpenSci requirements.

Function/variable Naming

  • geoparser completely meets the rOpenSci requirements.

README

  • 𐄂 The rOpenSci footer is missing.
  • README.md is created from README.Rmd

Code of Conduct

  • geoparser completely meets the rOpenSci requirements.

Documentation

  • roxygen2 is used
  • 𐄂 #' @noRd is missing from internal functions
  • 𐄂 NEWS.md is missing

Authorship

  • geoparser completely meets the rOpenSci requirements.

Testing

  • geoparser makes good use of tests for various API components
  • 𐄂 testthat::skip_on_cran() is mising from the tests that make API calls

Versioning

  • geoparser completely meets the rOpenSci requirements.

Continuous integration

  • geoparser completely meets the rOpenSci requirements (uses appveyor and travis)

Examples

  • geoparser completely meets the rOpenSci requirements.

Package dependencies

  • 𐄂 The items in Depends: should really be in Imports:

DESCRIPTION format

  • 𐄂 In the Description: field, 'This packages' will (depending on the day) get a nastygram from CRAN overlords. A suggested altrnative wording is A wrapper for the the Geoparser.io API, which is a web service that identifies places mentioned in text, disambiguates those places, and returns detailed data about the places found in the text. Basic, limited API access is free with paid plans to accomodate larger workloads.

Recommended scaffolding

  • geoparser completely meets the rOpenSci requirements.

Console messages

  • geoparser completely meets the rOpenSci requirements.

CRAN Check Issues

  • 𐄂 Undefined global functions or variables: start caused by NSE dplyr calls. It's explainable to CRAN but you may want to consider switching to standard evaluation or putting in a start <- NULL to avoid the CRAN check complaint.
  • 𐄂 Non-standard file/directory found at top level: ‘README.Rmd’ caused by a lack of ^README\.Rmd$ in .Rbuildignore.

General Review

  • NOTE: (~90% of issues noted above and below are fixed in a PR to @masalmon's repo. I owed her and y'all at least that for the delay)
  • This is a super neat package and I now respect the API service more than I did a few months ago :-) I tried it with some interesting texts and it came back with good results (their algorithm is impressive). I had no issues understanding how to get an API key everything (apart from the CRAN checks) "just worked" as advertised.
  • All vignettes and tests run perfectly on OS X & Linux.
  • Clever emoji use in the GitHub repo title!
  • Nice use of (in many places) standard evaluation alternatives to NSE functions. This is super helpful in packages to avoid CRAN check issues. Really nice job.
  • Nice use of geoparser_url() vs hardcoding it in every function call that does URL access.
  • Nice use of the "standard"/modern R way to access APIs requring an API key.
  • Great instructions & references for obtaining an API key and installing it.
  • General exemplary use of httr for API calls, however…
  • There is no purrr::safely() or tryCatch() usage which would help to trap errors in the event of a geoparser.io service disruption or local lack of an internet connction. This is prbly not critical for 0.1.0 release but could be a nice addition for a 0.2.0 release.
  • It's safer to call status_code() than test the individual $ elements.
  • Your future self might remember what was going on in function_df() and geoparser_parse() but she would also prbly not mind a short description of what's going on in the transformation, especially if the API changes substantially and it's been a while since she's seen the code. :-)
  • While it is highly unlikely references to functions such as tbl_df() are going to have namespace collisions, it is usually better to prefix them with the package name; i.e. tibble::tbl_df() or dplyr::tbl_df() vs leave them bare. This was done for httr::content(), dplyr::bind_rows() & jsonlite::fromJSON()but not for other function calls.
  • The title of the main function could be more "directive". i.e. Issue a geoparser query vs Geoparser query (not super important)
  • URLs in roxygen documentation should be wrapped either in \url{} or \href{}{}. This is done in some places but not others.
  • The use of \code{} is required vs markdown back-ticks (at least for now, there are updates coming to roxygen2 that will enable backticks for marking monosapaced font segments). This happens quite a bit (to me, at least) when I'm copying from vignettes or Rmd files to roxygen
  • While there is no inherent limitation on roxygen comment line length, it may be better to wrap them at the code-line-length margin (80 chars)
  • In the Value section of geoparser_q() the column names of the data.frame element item lists should be in monospace (like the ones you get for free with roxygen when using @param). Also, where there are qualifiers such as see GeoNames.org for a complete list. Subject to change, it might be useful to wrap those in \emph{}.
  • It might be helpful to add a: #' @references \href{https://geoparser.io/docs.html}{Geoparser.io documentation} to the geoparser_q() documentation.
  • The reference to the opencage package could be converted to an \href{}{} vs \url{}.
  • The documentation (both the package and the geoparser.io site) states 8KB as a max limit for the request header but you test for 8,000 vs 8,192. It might be useful to clarify this (though I suspect there won't be many users who would hit this condition and have amiguity).
  • Using vapply or one of the purrr classed alternatives over lapply is recommended for more guaranteed type-safety. Without actually tracing the execution of those functions (see "future self" bullet) it may not buy you much in terms of safety (i.e. if most of those just return lists). A cursory iteration thorough both functions (once) suggests that this could be a 0.2.0 internal feature.
  • URLencode should be utils::URLencode and the associated @importFrom utils URLencode added.
  • In the httr::POST() call in geoparser_get() you can use httr::accept_json() vs add the header manually. I see why you manually did the encoding for the body content, though (that was a good addition and I suspect was super-annoying to figure out; again, really nice job!)
  • Spaces around = are missing in a few spots (and there are spaces around = in the vast majority of uses of them, hence the exceptions being noticeable.
  • prettify is not used but was imported. This was most likely used during development (I've done that as well).
  • In some places columns are modified via $ but in others mutate_() is used. This really isn't an issue, per se, but consistency might be better in the long run.

Other Suggestions

  • Possibly include an example in the vignette on how issue multiple calls to the API and bind the results together OR (alternatively)…
  • Vectorize geoparser_q() to take a character vector for text_input vs just a single string
  • It might be cool to replicate "Example request #2" from the main geoparser.io site API examples as a second @examples item and/or in the vignette, especially since it's a longer bit of text.

Review time: ~2.5 hours (I forgot to clock myself earlier today)

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 10, 2016

Thank you so much, @hrbrmstr! I look forward to making all the suggested changes! 😸

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Jun 10, 2016

Thanks for the thorough review @hrbrmstr.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 10, 2016

I don't think you owed me the PR, @hrbrmstr so many thanks! It really was useful! I have learnt a lot. 👍

Reg.: "Nice use of the "standard"/modern R way to access APIs requring an API key." At least part of it is my copy-pasting the function @noamross had added to opencage. 😄

  • The encoding thing was in the sample code the API contact person had sent to me when advertising for their service (I had commented on a GH issue about geoparsing and that's how they found me)

I'm going to summarize what I did:

  • I have looked at all comments and changes. Thank you!
  • I have added you as rev and ctb in the authors list.
  • I have commented my code a bit more. I am already future me from the perspective of the coder who submitted the package a month ago... This coder was a bit too optimistic about her memory.
  • I have replaced some apply-things by purrr functions but honestly I will need to come back to it later because it's clumsy and when I use map(blabla, unlist), I don't make any progress compared to the earlier version.
  • The function is now vectorized! For identifying each input text in the output I added a column with the MD5 hash of the text. Maybe there is a better solution? I added the MD5 hash because digest seemed to be a light dependency and I did not want to add the whole text.
  • I added the Example request number 2 from the site in the vignette and in the README, thanks!
  • "8,000 vs 8,192" change made.
  • I have used mutate_ instead of "$" for latitude/longitude. Other $ are for lists.
  • I now use purrr::safely... Is what I wrote what you had in mind in your comment?

Once you all (@hrbrmstr, @noamross & @sckott ) will be happy with the package, I'll transfer the repo to ropenscilabs but also submit it to CRAN. I had discussed the possible API changes with geoparser.io and I think it's ok to submit this version. I have added the API version number to DESCRIPTION.

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Jun 16, 2016

Thanks, @masalmon! @hrbrmstr, please let us know if the response addresses everything to your satisfaction.

@hrbrmstr

This comment has been minimized.

Copy link

hrbrmstr commented Jun 16, 2016

aye. i also re-looked at the code and it's #spiffy.

On Thu, Jun 16, 2016 at 11:10 AM, Noam Ross notifications@github.com
wrote:

Thanks, @masalmon https://github.com/masalmon! @hrbrmstr
https://github.com/hrbrmstr, please let us know if the response
addresses everything to your satisfaction.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#43 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAfHtt5d79VJ4YiREio_79tgbK6gjPN_ks5qMWd8gaJpZM4IZlmP
.

@noamross noamross added the 6/approved label Jun 16, 2016

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Jun 16, 2016

Thank you @hrbrmstr.

I have done an editors' check and you are approved! Please go ahead and transfer the repository to ropenscilabs and add the rOpenSci footer. 👏

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 16, 2016

Awesome, many thanks to you both, I'll do this ASAP!

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 16, 2016

Transfer done. Next step will be the CRAN submission!

@noamross noamross self-assigned this Jan 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment