Skip to content
This repository has been archived by the owner on Mar 27, 2023. It is now read-only.

add plots and analysis from _cr_springer.Rmd again #211

Closed
maxheld83 opened this issue Jun 2, 2020 · 3 comments
Closed

add plots and analysis from _cr_springer.Rmd again #211

maxheld83 opened this issue Jun 2, 2020 · 3 comments
Assignees
Labels
plot something colorful

Comments

@maxheld83
Copy link
Contributor

same as #210 also temporarily removed to debug #82.

Must check whether there's something in here we're missing elsewhere.

@maxheld83 maxheld83 added the plot something colorful label Jun 2, 2020
@maxheld83 maxheld83 self-assigned this Jun 2, 2020
@maxheld83
Copy link
Contributor Author


title: "Comparing the indexing coverage of SpringerLink with that of Crossref"
author: "Najko Jahn"
output:
html_document:
keep_md: true
df_print: paged

knitr::opts_chunk$set(echo = TRUE, message=FALSE)

In its blog post, the INTACT project compares the indexing coverage of Crossref, a DOI registration agency for scholarly works, with that of SpringerLink, a digital library dedicated to content published by Springer. Examining five journals including European Radiology, they found that the article coverage differs between these two sources and concluded:

The results are clear: When it comes to journal metrics (both OA and total), Crossref data is too sketchy to rely on.

This is very harsh given the importance of Crossref to study the prevalence of open access and for open access monitoring. So, let's examine whether we come to the same conclusion.

Analyses

To do so, I firstly downloaded the yearly article volume from the journal European Radiology from SpringerLink, starting in 2015.

Let's load these metadata into R and obtain information about when and in which volumes articles were published:

library(tidyverse)
my_files <- list.files(pattern = ".csv")
springer <- purrr::map_df(my_files, readr::read_csv)
springer %>%
  count(`Publication Year`, `Journal Volume`)

Four records seem to represent journal information. There are also online-first articles published in 2017 and 2018, which have not appeared in a printed volume, yet.

Now, let's obtain metadata via the Crossref API using the rcrossref package, and check whether Crossref's and SpringerLink's indexing coverage of articles published in European Radiology 2015 and 2016 is identical. For this aim, we firstly used the from-pub-date parameter as the INTACT study did, and secondly, the from-print-pub-date parameter was used to avoid confusion between online-first and print publication.

library(rcrossref)
# R call representing from-pub-date query
cr_from_online <- rcrossref::cr_works(filter = c(issn = "0938-7994", 
                                        from_pub_date = "2015-01-01", 
                                        until_pub_date = "2016-12-31",
                                        type = "journal-article"),
                             limit = 1000, cursor = "*", cursor_max = 5)

# R call representing from-print-pub-date query
cr_from_print <- rcrossref::cr_works(filter = c(issn = "0938-7994", 
                                        from_print_pub_date = "2015-01-01", 
                                        until_print_pub_date = "2016-12-31",
                                        type = "journal-article"),
                             limit = 1000, cursor = "*", cursor_max = 5)

Are there different result sets?

Dataset obtained from querying by first date of publication:

cr_from_online$data %>% 
  count(volume)

Dataset obtained from querying by date of publication in a printed volume:

cr_from_print$data %>% 
    count(volume)

While articles queried by from-published-date were published in three different yearly volumes, filtering with from_print_pub_date results in an identical number of articles obtained via SpringerLink.

Finally, let's check whether the SpringerLink 2015-2016 and Crossref from_print_pub_date sets are equal using DOIs:

# filter 2015 and 2016 publications
springer_15_16 <- springer %>%
  filter(`Publication Year` %in% c(2015, 2016))
setequal(springer_15_16$`Item DOI`, cr_from_print$data$DOI)

Conclusion

In conclusion, by checking Crossref and SpringerLink for articles published in "European Radiology" no article coverage differences could be found between these two sources. However, when comparing the indexing coverage of Crossref and SpringerLink, query parameters must be harmonized in order to guarantee equal article sets.

Session info

sessionInfo()

@njahn82
Copy link
Collaborator

njahn82 commented Jun 2, 2020 via email

@maxheld83
Copy link
Contributor Author

thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
plot something colorful
Projects
None yet
Development

No branches or pull requests

2 participants