orcid_peer_reviews() not getting journal name? #52

gorkang · 2018-04-08T14:06:27Z

I am trying to get the peer review activity from ORCID profiles using orcid_peer_reviews(). Everything seems to work fine, but I cannot find the journal names of the reviews.

For example, to get the following review from an ORCID profile...

I use the code below, but the closest I can get to the journal name is through the publons website URL. I can't see it in the general orcid_peer_reviews(id) or the orcid_peer_reviews(id, pur_code) calls.

id = "0000-0001-7678-8656"

# Get reviews  
temp_reviews = orcid_peer_reviews(id)[[1]]

# Get details of specific review
temp_reviews_2 = orcid_peer_reviews(id, put_code = "220419")[[1]]

# Using the publons website I can get to the journal name, but I'd need to scrap it or similar...
temp_reviews_2$`review-identifiers`$`external-id`$`external-id-url.value`

Below the session details.

Session info --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, linux-gnu           
 ui       RStudio (1.1.442)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Santiago            
 date     2018-04-08                  

Packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 package    * version    date       source                          
 assertthat   0.2.0      2017-04-11 CRAN (R 3.4.1)                  
 backports    1.1.2      2017-12-13 CRAN (R 3.4.3)                  
 base       * 3.4.4      2018-03-16 local                           
 bindr        0.1.1      2018-03-13 CRAN (R 3.4.3)                  
 bindrcpp     0.2        2017-06-17 CRAN (R 3.4.1)                  
 bookdown     0.7        2018-02-18 CRAN (R 3.4.3)                  
 compiler     3.4.4      2018-03-16 local                           
 crul         0.5.2      2018-02-24 CRAN (R 3.4.3)                  
 curl         3.1        2017-12-12 CRAN (R 3.4.3)                  
 datasets   * 3.4.4      2018-03-16 local                           
 devtools     1.13.5     2018-02-18 CRAN (R 3.4.3)                  
 digest       0.6.15     2018-01-28 CRAN (R 3.4.3)                  
 dplyr      * 0.7.4      2017-09-28 CRAN (R 3.4.2)                  
 evaluate     0.10.1     2017-06-24 CRAN (R 3.4.1)                  
 glue         1.2.0      2017-10-29 CRAN (R 3.4.2)                  
 graphics   * 3.4.4      2018-03-16 local                           
 grDevices  * 3.4.4      2018-03-16 local                           
 htmltools    0.3.6      2017-04-28 CRAN (R 3.4.1)                  
 httr         1.3.1      2017-08-20 CRAN (R 3.4.1)                  
 jsonlite     1.5        2017-06-01 CRAN (R 3.4.1)                  
 knitr        1.20       2018-02-20 CRAN (R 3.4.3)                  
 magrittr     1.5        2014-11-22 CRAN (R 3.4.1)                  
 memoise      1.1.0      2017-04-21 CRAN (R 3.4.1)                  
 methods    * 3.4.4      2018-03-16 local                           
 openssl      1.0.1      2018-03-03 CRAN (R 3.4.3)                  
 pacman     * 0.4.6      2017-05-14 CRAN (R 3.4.1)                  
 pillar       1.2.1      2018-02-27 CRAN (R 3.4.3)                  
 pkgconfig    2.0.1      2017-03-21 CRAN (R 3.4.1)                  
 plyr         1.8.4      2016-06-08 CRAN (R 3.4.1)                  
 R6           2.2.2      2017-06-17 CRAN (R 3.4.1)                  
 Rcpp         0.12.16    2018-03-13 CRAN (R 3.4.3)                  
 rlang        0.2.0.9000 2018-03-19 Github (tidyverse/rlang@1b81816)
 rmarkdown    1.9        2018-03-01 CRAN (R 3.4.3)                  
 rorcid     * 0.4.1.9210 2018-04-05 Github (ropensci/rorcid@c393ad0)
 rprojroot    1.3-2      2018-01-03 CRAN (R 3.4.3)                  
 rscopus    * 0.5.3      2017-10-11 CRAN (R 3.4.2)                  
 stats      * 3.4.4      2018-03-16 local                           
 stringi      1.1.7      2018-03-12 CRAN (R 3.4.3)                  
 stringr      1.3.0      2018-02-19 CRAN (R 3.4.3)                  
 tibble       1.4.2      2018-01-22 CRAN (R 3.4.3)                  
 tools        3.4.4      2018-03-16 local                           
 triebeard    0.3.0      2016-08-04 CRAN (R 3.4.1)                  
 urltools     1.7.0      2018-01-20 CRAN (R 3.4.3)                  
 utils      * 3.4.4      2018-03-16 local                           
 withr        2.1.2      2018-03-19 Github (jimhester/withr@79d7b0d)
 xfun         0.1        2018-01-22 CRAN (R 3.4.3)                  
 yaml         2.1.18     2018-03-08 CRAN (R 3.4.3)

The text was updated successfully, but these errors were encountered:

sckott · 2018-04-10T03:01:05Z

thanks for the question @gorkang

@rcpeters another question for you. Seems like journal name is shown in the ORCID UI for peer reviews. but I can't seem to find it either in the API response. Any guidance?

alainna · 2018-04-10T03:15:50Z

For peer reviews, the journal name (or publisher name, or organisation, etc) is going to be found in the group data.

https://pub.orcid.org/v2.1/0000-0001-7678-8656/peer-review/220419 ->

peer-review:review-group-idissn:1939-2222</peer-review:review-group-id>

The peer review won't necessarily be grouped by the publisher or journal name -- review groups can be as specific or general as the review posting party desires.

Generally the convening organisation will also be the party which has organised the review, which could be e.g. the journal or publisher. However I notice for Publons that they list this as Publons -- we'll be following up with them on that. Another example posted by AGU (GEMS) which has AGU listed as the convening party:

https://pub.orcid.org/v2.1/0000-0002-7363-4552/peer-review/146242

sckott · 2018-04-10T04:42:10Z

thanks @alainna for that, that's what we needed review-group-id, sorry i missed that

id = "0000-0001-7678-8656"
x = orcid_peer_reviews(id, put_code = "220419")[[1]]
rcrossref::cr_journals(strsplit(x$`review-group-id`, ":")[[1]][[2]])$data$title
#> [1] "Journal of Experimental Psychology General"

rcpeters · 2018-04-10T05:01:30Z

Just a note not all group IDs are required to be ISSNs.

  select split_part(group_id,':',1) as prefix, count(*) from group_id_record group by prefix;
       prefix      | count 
  -----------------+-------
   publons         |  1297
   orcid-generated |   100
   ringgold        |     1
   issn            | 13181
  (4 rows)

sckott · 2018-04-10T17:46:22Z

Thanks @rcpeters - well i guess we can try to detect if it's an ISSN, and if so, we can try to grab the journal name

sckott · 2018-05-11T22:36:49Z

@gorkang does this solultion #52 (comment) work for you? I don't think we want to integrate rcrossref into this pkg, but we could document how to work with it to get publication title names. Thoughts?

gorkang · 2018-05-12T09:57:23Z

Thanks @sckott for checking back on this.

Yes, looking for the journal name using issn works, although it is very slow, so it adds ~15s for each researcher I have (see code below).

get_orcid_reviews <- function(id) {

 # id = "0000-0001-7678-8656" #
  
  library(pacman)
  p_load(dplyr, rorcid)
  
  tictoc::tic()
  
  # Get reviews ---------------------------------------------------------------
  temp_reviews = orcid_peer_reviews(id)[[1]]$group$`peer-review-summary` %>%
    bind_rows() 
  
    years_reviews = temp_reviews %>% 
      # filter(`completion-date.year.value` >= from_year) %>% # we only ask for the records we need to minimize # of calls.
      pull(`completion-date.year.value`) #`put-code`
    

    # Get journal titles ------------------------------------------------------

      # Get put-codes
      put_codes = temp_reviews %>% pull(`put-code`)
      
        # Get details for reviews using put-codes
        list_orcid_reviews <- orcid_peer_reviews(id, put_code = put_codes)
        
          # Get issn
          issn_reviews = 1:length(list_orcid_reviews) %>% purrr::map(~strsplit(list_orcid_reviews[[.x]]$`review-group-id`, ":")[[1]][[2]]) %>% unlist()
          
            # Get journal name using issn
            journal_names = rcrossref::cr_journals(issn_reviews)$data$title
            
  
    # Tidy data ---------------------------------------------------------------
            
    df_reviews = years_reviews %>% as_tibble() %>% 
      mutate(orcid_id = id) %>% 
      left_join(df_orcid_names, by = "orcid_id") %>% 
      rename(year = value) %>% 
      mutate(journal_name = journal_names) %>% 
      select(-other_names)
 
    tictoc::toc()
    df_reviews
  
}  

get_orcid_reviews( id = "0000-0001-7678-8656")

Taking those extra 15s for each researcher feels particularly wasteful as the the journal name is in the ORCID website (but for some reason not in the ORCID data):

Any idea to make it faster would be greatly appreciated.

Thanks!

sckott · 2018-06-14T03:21:29Z

@gorkang just took another look at this.

i can't replicate your function above because the object df_orcid_names is missing, but I think i have a solution.

I just added a dataset of issn's and journal titles gathered from crossref, i need to work out a process for updating it, or letting users do so, but is much faster. e..,g,

system.time({
  id = "0000-0001-7678-8656"
  x = orcid_peer_reviews(id, put_code = "220419")[[1]]
  issn <- strsplit(x$`review-group-id`, ":")[[1]][[2]]
  rcrossref::cr_journals(issn)$data$title
})

 user  system elapsed
0.071   0.003   0.774

system.time({
  id = "0000-0001-7678-8656"
  x = orcid_peer_reviews(id, put_code = "220419")[[1]]
  issn <- strsplit(x$`review-group-id`, ":")[[1]][[2]]
  issn_title[[issn]]
})
 user  system elapsed
0.010   0.001   0.102

…nt have titles

gorkang · 2018-06-23T15:20:50Z

Thanks @sckott for taking another look at this.

The new method does work better, but fails when the issn is not in issn_title.rda (btw, I had to download it manually. Maybe it does not load with the package?)

So, to solve the first point, I created a function to get the title with the best available method:

get_title_from_issn <- function(issn) {
   load("issn_title.rda") # CHANGE PATH AS NEEDED
   tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
 }
 journal_names = issn_reviews %>% purrr::map( ~ get_title_from_issn(.x)) %>% unlist()

In the specific case I am trying, there are 6 out of 20 issn not present in issn_title.rda. The time it takes goes down from ~31 to ~17 seconds.

Please, see the full code below. I adapted the get_orcid_reviews() function so you can select the "method" (new or old). Sorry for leaving df_orcid_names in the previous code. Now it should work.

get_orcid_reviews <- function(id, method = "new") {
   
  library(pacman)
  p_load(dplyr, rorcid)
  
  tictoc::tic()
  
  # Get reviews ---------------------------------------------------------------
  temp_reviews = orcid_peer_reviews(id)[[1]]$group$`peer-review-summary` %>%
    bind_rows() 
  
  years_reviews = temp_reviews %>% 
    # filter(`completion-date.year.value` >= from_year) %>% # we only ask for the records we need to minimize # of calls.
    pull(`completion-date.year.value`) #`put-code`
  
  # Get journal titles ------------------------------------------------------
  
  # Get put-codes
  put_codes = temp_reviews %>% pull(`put-code`)
  
  # Get details for reviews using put-codes
  list_orcid_reviews <- orcid_peer_reviews(id, put_code = put_codes)
  
  # Get issn
  issn_reviews = 1:length(list_orcid_reviews) %>% purrr::map(~strsplit(list_orcid_reviews[[.x]]$`review-group-id`, ":")[[1]][[2]]) %>% unlist()
  
 
  # GET JOURNAL NAMES -------------------
  
  # METHOD A (slow) Get journal name using issn
  if (method == "old") {
    journal_names = rcrossref::cr_journals(issn_reviews)$data$title
 
  # METHOD B (new) Get journal name using issn
  } else if (method == "new") {
      get_title_from_issn <- function(issn) {
        load("dev/BUGS/BUG - reviews slow/issn_title.rda")
        tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
      }
      journal_names = issn_reviews %>% purrr::map( ~ get_title_from_issn(.x)) %>% unlist()
  }
 
  # Tidy data ---------------------------------------------------------------
  
  df_reviews = years_reviews %>% as_tibble() %>% 
    mutate(orcid_id = id) %>% 
    # left_join(df_orcid_names, by = "orcid_id") %>% 
    rename(year = value) %>% 
    mutate(journal_name = journal_names) #%>% select(-other_names)
  
  tictoc::toc()
  df_reviews
  
}  

get_orcid_reviews(id = "0000-0001-7678-8656", method = "old")

get_orcid_reviews(id = "0000-0001-7678-8656", method = "new")

Thanks!

sckott · 2018-09-27T16:43:52Z

sorry for the long delay in responding @gorkang - its not clear from your last reply if you are happy with changes, or there's still some improvements we can make?

gorkang · 2018-09-27T17:08:42Z

No problem @sckott . Last time I checked, there were two problems:

The function failed when the issn was not in issn_title.rda
I had to download issn_title.rda manually

Cheers.

sckott · 2018-09-27T17:36:44Z

I'm not having that problem. just removed rorcid then reinstalled from github, loaded rorcid and issn_title is there in the session. will keep thinking about what the problem could be

gorkang · 2018-10-03T15:52:10Z

Regarding the first issue:

If an ISSN exists, it works great. If it does not exist, gives an error:

issn_title[["1939-2222"]]
[1] "Journal of Experimental Psychology General"

issn_title[["0000-2222"]]
Error in issn_title[["0000-2222"]] : subscript out of bounds

With a function such as the following, we can avoid the error:

  get_title_from_issn <- function(issn) {
    tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
  }

Regarding the second issue. After uninstalling using the gui it wasn't working, but using the remove.packages() function worked:

remove.packages("rorcid")
devtools::install_github("ropensci/rorcid")
library('rorcid')

Also, a final comment, for a single researcher with 20 review records (6 not in the issn_title file) it takes about 10s to fetch the journal titles. It is much better than the ~30s it used to take, but hopefully, there is still some room for improvement.

Thanks!

sckott · 2018-10-04T18:37:27Z

thanks - i'll take another look at the issn issue.

hopefully, there is still some room for improvement.

we'll continue to look for performance improvements 👍

sckott · 2018-10-04T21:54:15Z

note: still no ISSNs in the Crossref API /journals route, so can't work on update flow for the issn titles dataset

make script for updating in inst/ignore/issn_title_collect.R

sckott · 2019-06-05T23:22:34Z

closing for now - added the script for updating the issn_title dataset in inst/ignore/issn_title_collect.R

sckott added a commit that referenced this issue Jun 14, 2018

#52 use an internal vector of issns and journal titles b/c orcid does…

484fe83

…nt have titles

sckott added this to the v0.5 milestone Oct 22, 2018

sckott added a commit that referenced this issue Jun 5, 2019

#52 update issn_title dataset

c8d3453

make script for updating in inst/ignore/issn_title_collect.R

sckott closed this as completed Jun 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orcid_peer_reviews() not getting journal name? #52

orcid_peer_reviews() not getting journal name? #52

gorkang commented Apr 8, 2018

sckott commented Apr 10, 2018

alainna commented Apr 10, 2018

sckott commented Apr 10, 2018

rcpeters commented Apr 10, 2018

sckott commented Apr 10, 2018

sckott commented May 11, 2018

gorkang commented May 12, 2018

sckott commented Jun 14, 2018

gorkang commented Jun 23, 2018

sckott commented Sep 27, 2018

gorkang commented Sep 27, 2018

sckott commented Sep 27, 2018

gorkang commented Oct 3, 2018 •

edited

sckott commented Oct 4, 2018

sckott commented Oct 4, 2018

sckott commented Jun 5, 2019

orcid_peer_reviews() not getting journal name? #52

orcid_peer_reviews() not getting journal name? #52

Comments

gorkang commented Apr 8, 2018

sckott commented Apr 10, 2018

alainna commented Apr 10, 2018

sckott commented Apr 10, 2018

rcpeters commented Apr 10, 2018

sckott commented Apr 10, 2018

sckott commented May 11, 2018

gorkang commented May 12, 2018

sckott commented Jun 14, 2018

gorkang commented Jun 23, 2018

sckott commented Sep 27, 2018

gorkang commented Sep 27, 2018

sckott commented Sep 27, 2018

gorkang commented Oct 3, 2018 • edited

sckott commented Oct 4, 2018

sckott commented Oct 4, 2018

sckott commented Jun 5, 2019

gorkang commented Oct 3, 2018 •

edited