Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orcid_peer_reviews() not getting journal name? #52

Closed
gorkang opened this issue Apr 8, 2018 · 16 comments
Closed

orcid_peer_reviews() not getting journal name? #52

gorkang opened this issue Apr 8, 2018 · 16 comments
Milestone

Comments

@gorkang
Copy link

gorkang commented Apr 8, 2018

I am trying to get the peer review activity from ORCID profiles using orcid_peer_reviews(). Everything seems to work fine, but I cannot find the journal names of the reviews.

For example, to get the following review from an ORCID profile...
screenshot from 2018-04-08 10-54-28

I use the code below, but the closest I can get to the journal name is through the publons website URL. I can't see it in the general orcid_peer_reviews(id) or the orcid_peer_reviews(id, pur_code) calls.

id = "0000-0001-7678-8656"

# Get reviews  
temp_reviews = orcid_peer_reviews(id)[[1]]

# Get details of specific review
temp_reviews_2 = orcid_peer_reviews(id, put_code = "220419")[[1]]

# Using the publons website I can get to the journal name, but I'd need to scrap it or similar...
temp_reviews_2$`review-identifiers`$`external-id`$`external-id-url.value`

Below the session details.

Session info --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, linux-gnu           
 ui       RStudio (1.1.442)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Santiago            
 date     2018-04-08                  

Packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 package    * version    date       source                          
 assertthat   0.2.0      2017-04-11 CRAN (R 3.4.1)                  
 backports    1.1.2      2017-12-13 CRAN (R 3.4.3)                  
 base       * 3.4.4      2018-03-16 local                           
 bindr        0.1.1      2018-03-13 CRAN (R 3.4.3)                  
 bindrcpp     0.2        2017-06-17 CRAN (R 3.4.1)                  
 bookdown     0.7        2018-02-18 CRAN (R 3.4.3)                  
 compiler     3.4.4      2018-03-16 local                           
 crul         0.5.2      2018-02-24 CRAN (R 3.4.3)                  
 curl         3.1        2017-12-12 CRAN (R 3.4.3)                  
 datasets   * 3.4.4      2018-03-16 local                           
 devtools     1.13.5     2018-02-18 CRAN (R 3.4.3)                  
 digest       0.6.15     2018-01-28 CRAN (R 3.4.3)                  
 dplyr      * 0.7.4      2017-09-28 CRAN (R 3.4.2)                  
 evaluate     0.10.1     2017-06-24 CRAN (R 3.4.1)                  
 glue         1.2.0      2017-10-29 CRAN (R 3.4.2)                  
 graphics   * 3.4.4      2018-03-16 local                           
 grDevices  * 3.4.4      2018-03-16 local                           
 htmltools    0.3.6      2017-04-28 CRAN (R 3.4.1)                  
 httr         1.3.1      2017-08-20 CRAN (R 3.4.1)                  
 jsonlite     1.5        2017-06-01 CRAN (R 3.4.1)                  
 knitr        1.20       2018-02-20 CRAN (R 3.4.3)                  
 magrittr     1.5        2014-11-22 CRAN (R 3.4.1)                  
 memoise      1.1.0      2017-04-21 CRAN (R 3.4.1)                  
 methods    * 3.4.4      2018-03-16 local                           
 openssl      1.0.1      2018-03-03 CRAN (R 3.4.3)                  
 pacman     * 0.4.6      2017-05-14 CRAN (R 3.4.1)                  
 pillar       1.2.1      2018-02-27 CRAN (R 3.4.3)                  
 pkgconfig    2.0.1      2017-03-21 CRAN (R 3.4.1)                  
 plyr         1.8.4      2016-06-08 CRAN (R 3.4.1)                  
 R6           2.2.2      2017-06-17 CRAN (R 3.4.1)                  
 Rcpp         0.12.16    2018-03-13 CRAN (R 3.4.3)                  
 rlang        0.2.0.9000 2018-03-19 Github (tidyverse/rlang@1b81816)
 rmarkdown    1.9        2018-03-01 CRAN (R 3.4.3)                  
 rorcid     * 0.4.1.9210 2018-04-05 Github (ropensci/rorcid@c393ad0)
 rprojroot    1.3-2      2018-01-03 CRAN (R 3.4.3)                  
 rscopus    * 0.5.3      2017-10-11 CRAN (R 3.4.2)                  
 stats      * 3.4.4      2018-03-16 local                           
 stringi      1.1.7      2018-03-12 CRAN (R 3.4.3)                  
 stringr      1.3.0      2018-02-19 CRAN (R 3.4.3)                  
 tibble       1.4.2      2018-01-22 CRAN (R 3.4.3)                  
 tools        3.4.4      2018-03-16 local                           
 triebeard    0.3.0      2016-08-04 CRAN (R 3.4.1)                  
 urltools     1.7.0      2018-01-20 CRAN (R 3.4.3)                  
 utils      * 3.4.4      2018-03-16 local                           
 withr        2.1.2      2018-03-19 Github (jimhester/withr@79d7b0d)
 xfun         0.1        2018-01-22 CRAN (R 3.4.3)                  
 yaml         2.1.18     2018-03-08 CRAN (R 3.4.3)  
@sckott
Copy link
Contributor

sckott commented Apr 10, 2018

thanks for the question @gorkang

@rcpeters another question for you. Seems like journal name is shown in the ORCID UI for peer reviews. but I can't seem to find it either in the API response. Any guidance?

@alainna
Copy link

alainna commented Apr 10, 2018

For peer reviews, the journal name (or publisher name, or organisation, etc) is going to be found in the group data.

https://pub.orcid.org/v2.1/0000-0001-7678-8656/peer-review/220419 ->

peer-review:review-group-idissn:1939-2222</peer-review:review-group-id>

The peer review won't necessarily be grouped by the publisher or journal name -- review groups can be as specific or general as the review posting party desires.

Generally the convening organisation will also be the party which has organised the review, which could be e.g. the journal or publisher. However I notice for Publons that they list this as Publons -- we'll be following up with them on that. Another example posted by AGU (GEMS) which has AGU listed as the convening party:

https://pub.orcid.org/v2.1/0000-0002-7363-4552/peer-review/146242

@sckott
Copy link
Contributor

sckott commented Apr 10, 2018

thanks @alainna for that, that's what we needed review-group-id, sorry i missed that

id = "0000-0001-7678-8656"
x = orcid_peer_reviews(id, put_code = "220419")[[1]]
rcrossref::cr_journals(strsplit(x$`review-group-id`, ":")[[1]][[2]])$data$title
#> [1] "Journal of Experimental Psychology General"

@rcpeters
Copy link

Just a note not all group IDs are required to be ISSNs.

  select split_part(group_id,':',1) as prefix, count(*) from group_id_record group by prefix;
       prefix      | count 
  -----------------+-------
   publons         |  1297
   orcid-generated |   100
   ringgold        |     1
   issn            | 13181
  (4 rows)

@sckott
Copy link
Contributor

sckott commented Apr 10, 2018

Thanks @rcpeters - well i guess we can try to detect if it's an ISSN, and if so, we can try to grab the journal name

@sckott
Copy link
Contributor

sckott commented May 11, 2018

@gorkang does this solultion #52 (comment) work for you? I don't think we want to integrate rcrossref into this pkg, but we could document how to work with it to get publication title names. Thoughts?

@gorkang
Copy link
Author

gorkang commented May 12, 2018

Thanks @sckott for checking back on this.

Yes, looking for the journal name using issn works, although it is very slow, so it adds ~15s for each researcher I have (see code below).

get_orcid_reviews <- function(id) {

 # id = "0000-0001-7678-8656" #
  
  library(pacman)
  p_load(dplyr, rorcid)
  
  tictoc::tic()
  
  # Get reviews ---------------------------------------------------------------
  temp_reviews = orcid_peer_reviews(id)[[1]]$group$`peer-review-summary` %>%
    bind_rows() 
  
    years_reviews = temp_reviews %>% 
      # filter(`completion-date.year.value` >= from_year) %>% # we only ask for the records we need to minimize # of calls.
      pull(`completion-date.year.value`) #`put-code`
    

    # Get journal titles ------------------------------------------------------

      # Get put-codes
      put_codes = temp_reviews %>% pull(`put-code`)
      
        # Get details for reviews using put-codes
        list_orcid_reviews <- orcid_peer_reviews(id, put_code = put_codes)
        
          # Get issn
          issn_reviews = 1:length(list_orcid_reviews) %>% purrr::map(~strsplit(list_orcid_reviews[[.x]]$`review-group-id`, ":")[[1]][[2]]) %>% unlist()
          
            # Get journal name using issn
            journal_names = rcrossref::cr_journals(issn_reviews)$data$title
            
  
    # Tidy data ---------------------------------------------------------------
            
    df_reviews = years_reviews %>% as_tibble() %>% 
      mutate(orcid_id = id) %>% 
      left_join(df_orcid_names, by = "orcid_id") %>% 
      rename(year = value) %>% 
      mutate(journal_name = journal_names) %>% 
      select(-other_names)
 
    tictoc::toc()
    df_reviews
  
}  

get_orcid_reviews( id = "0000-0001-7678-8656")

Taking those extra 15s for each researcher feels particularly wasteful as the the journal name is in the ORCID website (but for some reason not in the ORCID data):

screenshot from 2018-05-12 06-52-36

Any idea to make it faster would be greatly appreciated.

Thanks!

@sckott
Copy link
Contributor

sckott commented Jun 14, 2018

@gorkang just took another look at this.

i can't replicate your function above because the object df_orcid_names is missing, but I think i have a solution.

I just added a dataset of issn's and journal titles gathered from crossref, i need to work out a process for updating it, or letting users do so, but is much faster. e..,g,

system.time({
  id = "0000-0001-7678-8656"
  x = orcid_peer_reviews(id, put_code = "220419")[[1]]
  issn <- strsplit(x$`review-group-id`, ":")[[1]][[2]]
  rcrossref::cr_journals(issn)$data$title
})

 user  system elapsed
0.071   0.003   0.774

system.time({
  id = "0000-0001-7678-8656"
  x = orcid_peer_reviews(id, put_code = "220419")[[1]]
  issn <- strsplit(x$`review-group-id`, ":")[[1]][[2]]
  issn_title[[issn]]
})
 user  system elapsed
0.010   0.001   0.102

sckott added a commit that referenced this issue Jun 14, 2018
@gorkang
Copy link
Author

gorkang commented Jun 23, 2018

Thanks @sckott for taking another look at this.

The new method does work better, but fails when the issn is not in issn_title.rda (btw, I had to download it manually. Maybe it does not load with the package?)

So, to solve the first point, I created a function to get the title with the best available method:

get_title_from_issn <- function(issn) {
   load("issn_title.rda") # CHANGE PATH AS NEEDED
   tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
 }
 journal_names = issn_reviews %>% purrr::map( ~ get_title_from_issn(.x)) %>% unlist()

In the specific case I am trying, there are 6 out of 20 issn not present in issn_title.rda. The time it takes goes down from ~31 to ~17 seconds.

Please, see the full code below. I adapted the get_orcid_reviews() function so you can select the "method" (new or old). Sorry for leaving df_orcid_names in the previous code. Now it should work.

get_orcid_reviews <- function(id, method = "new") {
   
  library(pacman)
  p_load(dplyr, rorcid)
  
  tictoc::tic()
  
  # Get reviews ---------------------------------------------------------------
  temp_reviews = orcid_peer_reviews(id)[[1]]$group$`peer-review-summary` %>%
    bind_rows() 
  
  years_reviews = temp_reviews %>% 
    # filter(`completion-date.year.value` >= from_year) %>% # we only ask for the records we need to minimize # of calls.
    pull(`completion-date.year.value`) #`put-code`
  
  # Get journal titles ------------------------------------------------------
  
  # Get put-codes
  put_codes = temp_reviews %>% pull(`put-code`)
  
  # Get details for reviews using put-codes
  list_orcid_reviews <- orcid_peer_reviews(id, put_code = put_codes)
  
  # Get issn
  issn_reviews = 1:length(list_orcid_reviews) %>% purrr::map(~strsplit(list_orcid_reviews[[.x]]$`review-group-id`, ":")[[1]][[2]]) %>% unlist()
  
 
  # GET JOURNAL NAMES -------------------
  
  # METHOD A (slow) Get journal name using issn
  if (method == "old") {
    journal_names = rcrossref::cr_journals(issn_reviews)$data$title
 
  # METHOD B (new) Get journal name using issn
  } else if (method == "new") {
      get_title_from_issn <- function(issn) {
        load("dev/BUGS/BUG - reviews slow/issn_title.rda")
        tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
      }
      journal_names = issn_reviews %>% purrr::map( ~ get_title_from_issn(.x)) %>% unlist()
  }
 
  # Tidy data ---------------------------------------------------------------
  
  df_reviews = years_reviews %>% as_tibble() %>% 
    mutate(orcid_id = id) %>% 
    # left_join(df_orcid_names, by = "orcid_id") %>% 
    rename(year = value) %>% 
    mutate(journal_name = journal_names) #%>% select(-other_names)
  
  tictoc::toc()
  df_reviews
  
}  

get_orcid_reviews(id = "0000-0001-7678-8656", method = "old")

get_orcid_reviews(id = "0000-0001-7678-8656", method = "new")

Thanks!

@sckott
Copy link
Contributor

sckott commented Sep 27, 2018

sorry for the long delay in responding @gorkang - its not clear from your last reply if you are happy with changes, or there's still some improvements we can make?

@gorkang
Copy link
Author

gorkang commented Sep 27, 2018

No problem @sckott . Last time I checked, there were two problems:

  1. The function failed when the issn was not in issn_title.rda
  2. I had to download issn_title.rda manually

Cheers.

@sckott
Copy link
Contributor

sckott commented Sep 27, 2018

I'm not having that problem. just removed rorcid then reinstalled from github, loaded rorcid and issn_title is there in the session. will keep thinking about what the problem could be

@gorkang
Copy link
Author

gorkang commented Oct 3, 2018

Regarding the first issue:

If an ISSN exists, it works great. If it does not exist, gives an error:

issn_title[["1939-2222"]]
[1] "Journal of Experimental Psychology General"

issn_title[["0000-2222"]]
Error in issn_title[["0000-2222"]] : subscript out of bounds

With a function such as the following, we can avoid the error:

  get_title_from_issn <- function(issn) {
    tryCatch(issn_title[[issn]], error = function(e) {rcrossref::cr_journals(issn)$data$title})
  }

Regarding the second issue. After uninstalling using the gui it wasn't working, but using the remove.packages() function worked:

remove.packages("rorcid")
devtools::install_github("ropensci/rorcid")
library('rorcid')

Also, a final comment, for a single researcher with 20 review records (6 not in the issn_title file) it takes about 10s to fetch the journal titles. It is much better than the ~30s it used to take, but hopefully, there is still some room for improvement.

Thanks!

@sckott
Copy link
Contributor

sckott commented Oct 4, 2018

thanks - i'll take another look at the issn issue.

hopefully, there is still some room for improvement.

we'll continue to look for performance improvements 👍

@sckott
Copy link
Contributor

sckott commented Oct 4, 2018

note: still no ISSNs in the Crossref API /journals route, so can't work on update flow for the issn titles dataset

@sckott sckott added this to the v0.5 milestone Oct 22, 2018
sckott added a commit that referenced this issue Jun 5, 2019
make script for updating in inst/ignore/issn_title_collect.R
@sckott
Copy link
Contributor

sckott commented Jun 5, 2019

closing for now - added the script for updating the issn_title dataset in inst/ignore/issn_title_collect.R

@sckott sckott closed this as completed Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants