Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to crul from httr? #83

Closed
maelle opened this issue Apr 11, 2018 · 11 comments
Closed

Switch to crul from httr? #83

maelle opened this issue Apr 11, 2018 · 11 comments
Milestone

Comments

@maelle
Copy link
Member

maelle commented Apr 11, 2018

Out of loyalty to rOpenSci + bc crul has a cool helper for #68 it its dev version ☺

@cboettig
Copy link
Member

👍 switch away!

@maelle
Copy link
Member Author

maelle commented Apr 12, 2018

I see httr use in this ORCID notebook that I will ignore.

For which functionality is ORCID needed? I'd be happy to add rorcid code if needed 😉

maelle added a commit that referenced this issue Apr 12, 2018
removes httr dependency in favor of crul cf #83
@maelle maelle closed this as completed Apr 12, 2018
@cboettig
Copy link
Member

Re ORCID / ORCID API, my thought was that it would be nice to have a utility to make it easier to look up a contributor's ORCID ID. I don't think this could be fully automated ever though, since something like 99% of ORCID users don't make their email address public, you basically have to search on full names to get a list of ORCID ids, and then leave it up to the user to match ID to the person; at which point you may as well just use the ORCID web search.

It seems the ORCID search always treats a query of givenName familyName as givenName OR familyName, which is super annoying. @sckott or maybe @mfenner might know if there's a way to do a less stupid search to look up orcid ids...

@maelle maelle reopened this Apr 12, 2018
@sckott
Copy link

sckott commented Apr 12, 2018

will have a look, i think they support solr dismax, so i think should be able to change to AND etc. 🤔

@sckott
Copy link

sckott commented Apr 12, 2018

maybe

first_last <- function(..., rows = 10) {
  tt <- rorcid::orcid(..., rows = rows)
  dplyr::bind_rows(lapply(tt$`orcid-identifier.path`, function(w) {
    rr <- rorcid::orcid_id(w)
    data.frame(
      first = rr[[1]]$name$`given-names`$value,
      last = rr[[1]]$name$`family-name`$value,
      stringsAsFactors = FALSE
    )
  }))
}
#> first_last(query = "carl boettiger")
#>           first               last
#> 1          Carl          Boettiger
#> 2      Alistair          Boettiger
#> 3         David          Boettiger
#> 4        Camila Boettiger Philipps
#> 5          Carl              Kruse
#> 6           Ian               Carl
#> 7      Fr. Carl               Carl
#> 8     Carl Elie           Saroufim
#> 9  Carl-Fredrik             Westin
#> 10         Carl         Liebersohn


#> first_last(query = "given-names:carl AND family-name:boettiger")
#>   first      last
#> 1  Carl Boettiger

@maelle
Copy link
Member Author

maelle commented Apr 18, 2018

Closing in favor of a specific issue #90

@maelle maelle closed this as completed Apr 18, 2018
@cboettig
Copy link
Member

@sckott yes, that looks good! Though interface wise I think it we might tweak it a bit -- I'd prefer a function with a verb name like orcid_search() that takes arguments like given_name and family_name rather than having the user have to guess how to construct the query.

@sckott
Copy link

sckott commented Apr 18, 2018

@cboettig okay, what about this:

field_match_list <- list(
  orcid = 'orcid',
  given_name = 'given-names',
  family_name = 'family-name',
  past_inst = 'past-institution-affiliation-name',
  current_prim_inst = 'current-primary-institution-affiliation-name',
  current_inst = 'current-institution-affiliation-name',
  credit_name = 'credit-name',
  other_name = 'other-names',
  email = 'email',
  digital_object_ids = 'digital-object-ids',
  work_title = 'work-titles',
  grant_number = 'grant-numbers',
  patent_number = 'patent-numbers',
  keywords = 'keywords',
  text = 'text'
)

ocom <- function(l) Filter(Negate(is.null), l)

orcid_search <- function(given_name = NULL, family_name = NULL, 
    past_inst = NULL, current_prim_inst = NULL, current_inst = NULL, 
    credit_name = NULL, other_name = NULL, email = NULL, 
    digital_object_ids = NULL, work_title = NULL, grant_number = NULL, 
    patent_number = NULL, keywords = NULL, text = NULL, rows = 10) {

  query <- ocom(list(given_name = given_name, family_name = family_name, 
    past_inst = past_inst, current_prim_inst = current_prim_inst, 
    current_inst = current_inst,
    credit_name = credit_name, other_name = other_name, email = email, 
    digital_object_ids = digital_object_ids, work_title = work_title, 
    grant_number = grant_number, patent_number = patent_number, 
    keywords = keywords, text = text))
  if (length(query) == 0) stop("must pass at least one param")
  names(query) <- vapply(names(query), function(z) field_match_list[[z]], "")

  # by default, combine with 'AND'
  query <- paste(names(query), unname(query), sep = ":", collapse = " AND ")

  tt <- rorcid::orcid(query = query, rows = rows)
  dplyr::bind_rows(lapply(tt$`orcid-identifier.path`, function(w) {
    rr <- rorcid::orcid_id(w)
    data.frame(
      first = rr[[1]]$name$`given-names`$value,
      last = rr[[1]]$name$`family-name`$value,
      orcid = w,
      stringsAsFactors = FALSE
    )
  }))
}
orcid_search(given_name = "carl", family_name = "boettiger")
#>   first      last               orcid
#> 1  Carl Boettiger 0000-0002-1642-628X

orcid_search(given_name = "carl")
#>    first       last               orcid
#> 1   Carl       Lind 0000-0001-5979-5504
#> 2   Carl       Mika 0000-0003-2348-4290
#> 3   Carl  Bergstrom 0000-0002-2070-385X
#> 4   Carl      Brown 0000-0002-1559-3238
#> 5   Carl      Death 0000-0002-1601-9473
#> 6   Carl   Puylaert 0000-0003-4269-8964
#> 7   Carl Wesolowski 0000-0003-0134-9346
#> 8   Carl  Shanholtz 0000-0003-3938-178X
#> 9   Carl     Snyder 0000-0002-3480-1354
#> 10  Carl     Wadell 0000-0002-2809-9386

The drawback here is that you don't have the flexibility of different SOLR combinators - AND vs. OR, etc. - so above I just by default combine multiple queries with AND

@cboettig
Copy link
Member

@sckott Thanks! Yes, I think the above API makes sense. It might be worth providing a lower-level function allowing a more direct SOLR query, but IMO we do want to default to AND for an R-based API. (For instance, this matches familiar behavior to R users such as dplyr::filter.

In general I think an R user expects additional arguments to be a refinement of search, while in R it is natural to think of OR combinations as separate function calls, which is much more flexible anyway. For instance, if I had a list of researcher names for whom I wanted to look up ORCIDs for, I would not attempt this with some combination of OR, but rather make multiple calls.

It is very difficult to imagine a valid use case for search results such as given_name: Carl OR family_name: Boettiger.

@sckott
Copy link

sckott commented Apr 18, 2018

well, the lower level function is already there: orcid() - we can change that one if you like?

right, okay, so we're good on the AND ✔️

yes hard to imagine wanting to do carl OR boetigger

@sckott
Copy link

sckott commented Apr 18, 2018

okay function added to rorcid on master. see ropensci-archive/rorcid#54

cboettig pushed a commit that referenced this issue Apr 23, 2018
* add pre-commit hook for DESCRIPTION vs codemeta.json

* oops had forgotten the dependencies 😱

* had also forgotten to import the function

* puts the code in the right place and documents what the first call to write_codemeta will do

* add rOpenSci and myself as authors

* remove MIT licence

* change licence

* add that the code is GPL-3

* replaces devtools with usethis where possible

* mostly French snobism 😉

* add uses_git origin

* cf #62

* #62

* only adds hook once!

* start using desc cf #41

* various fixes

* better example?

* document

* removes reference to deleted function

* cf #63

* cf #64, parses more possible roles

* updates codemeta.json in particular more people/orgs appear

* start work on opinions cf #76

* document

* oops

* oops again

* gives opinion when verbose=TRUE and otherwise just uses robust code

* cleans up tests

* more tests of plain authors&maintainer

* work on tests

* corrects documentation

* better if the pkg exists 😁

* update codemeta

* adds a message to get a devtools release question

* fix?

* new try

* removes httr dependency in favor of crul cf #83

* checks URLs in DESCRIPTION cf #68

* oops fixes test

* uses dev version of jsonld

* adds coercion to character to repair bug introduced by jsonld new version cf #88

* clean up cache

* yay encoding

* close #84 by deleting now useless licences.R file

* appveyor

* oh, Appveyor

* start filling NEWS.md

* better checks when several URLs

* more space

* generate review metadata cf #23

* oops

* cf #63

* @jeroen said that this might help 🙏

* thanks again @jeroen

* test on patched R version

* CRAN and Bioconductor links for dependencies cf #81

* add tests of dependencies URL creation

* add canonic URL for the package itself cf #81

* borrows jsonlite code cf #84

* makes it a bit more specific

* badge parsing cf #130

* uses badge parsing function in guess_metadata

* opinions about README cf 98

* add check of provider cf #81

* oops

* oops again

* R CMD Check NOTEs

* oops

* update contributor list cf #95

* several relatedLinks cf #99

* add the URL only once

* oops repairs test

* update NEWS

* add ability to provide relatedLink for packages installed from CRAN or Bioconductor

* add link to commit if available

* only one maintainer currently cf #109

* oops this was wrong!

* mmmh there was a mistake here

* Travis fix?

* remotes cf #96

* Travis fix?

* export the badge extraction function cf #107 and update docs and correct a test

* status as URL cf #102

* now one can extract lifecycle status

* not only Travis CI as contIntegration cf #111

* update NEWs

* update NEWS

* repairs handling of additional terms cf #112 and adds corresponding test

* correct test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants