Najko Jahn 10/7/2019
To start with, a sample of grant-supported publications indexed in OpenAIRE with EC-funding acknwoledgement was used. Only H2020 EC-projects with participation from the University of Göttingen were investigated. These data were already obtained for the OpenAIRE Institutional Dashboard Pilot: https://subugoe.shinyapps.io/openaire_ugoe/
Load publications:
library(tidyverse)
library(jsonlite)
ugoe_pubs <- jsonlite::stream_in(file("data/pubs_ugoe.json"), verbose = FALSE) %>%
as_tibble()
In total, the sample consists of 1542 distinct records.
For matching, the DOI article identifier is used:
ugoe_dois_df <- ugoe_pubs %>%
select(pid, access, openaire_project_id, openaire_id) %>%
unnest(cols = c(pid)) %>%
mutate(pid_type = ifelse(grepl("^10.", pid), "doi", NA)) %>%
mutate(pid = tolower(pid)) %>%
mutate(openaire_project_id = as.character(openaire_project_id))
In total, 1313 records have a DOI, representing a share of 85.15 %.
Store records with DOIs in separate data.frame
ugoe_dois <- ugoe_dois_df %>%
filter(pid_type == "doi") %>%
distinct(pid) %>%
mutate(pid = trimws(pid))
Connect to WOS-KB and store data.frame in KB Table Space
require(RJDBC)
require(rJava)
.jinit()
jdbcDriver <-
JDBC(driverClass = "oracle.jdbc.OracleDriver", classPath = "inst/jdbc_driver/ojdbc8.jar")
jdbcConnection <-
dbConnect(
jdbcDriver,
"jdbc:oracle:thin:@//biblio-p-db01:1521/bibliodb01.fiz.karlsruhe",
Sys.getenv("kb_user"),
Sys.getenv("kb_pwd")
)
dbWriteTable(conn = jdbcConnection,
name = "openaire_ugoe_dois",
value = ugoe_dois,
overwrite = TRUE)
#> [1] TRUE
Matching using Oracle SQL statement:
select
wos_b_2019.items.doi,
wos_b_2019.items.doctype
from
wos_b_2019.items
inner join
openaire_ugoe_dois
on lower(wos_b_2019.items.doi) = openaire_ugoe_dois.pid
Summary stats
openaire_wos_doi %>%
group_by(DOCTYPE) %>%
summarise(n= n_distinct(DOI)) %>%
arrange(desc(n))
#> # A tibble: 6 x 2
#> DOCTYPE n
#> <chr> <int>
#> 1 Article 693
#> 2 Proceedings Paper 41
#> 3 Review 20
#> 4 Correction 3
#> 5 Editorial Material 2
#> 6 Letter 1
In total 759 DOIs were both indexed in OpenAIRE and Wos-KB.
Oracle SQL statement to obtain funding information from WoS-KB for the OpenAIRE-DOI-sample
select
wos_b_2019.fundingorganizations.fundingorganization,
wos_b_2019.grantnumbers.grantnumber,
wos_b_2019.items.doi
from
wos_b_2019.items
inner join
wos_b_2019.grantnumbers
on wos_b_2019.grantnumbers.fk_items = wos_b_2019.items.pk_items
inner join
wos_b_2019.fundingorganizations
on wos_b_2019.fundingorganizations.pk_fundingorganizations = wos_b_2019.grantnumbers.fk_fundingorganizations
inner join
openaire_ugoe_dois
on lower(wos_b_2019.items.doi) = openaire_ugoe_dois.pid
Identify the European Commission as funding organization using pattern matching.
openaire_wos %>%
as_tibble() %>%
mutate(eu_funding = ifelse(grepl("^EU|^Eur|^ERC", FUNDINGORGANIZATION), "EU", NA)) %>%
filter(eu_funding == "EU")
#> # A tibble: 1,128 x 4
#> FUNDINGORGANIZATION GRANTNUMBER DOI eu_funding
#> <chr> <chr> <chr> <chr>
#> 1 European ITN project (FP7-PEOPL… PITNGA-2011-289… 10.1007/JH… EU
#> 2 European Union 690575 10.1103/Ph… EU
#> 3 EU PITN-GA-2011-28… 10.1103/Ph… EU
#> 4 European Union 674896 10.1103/Ph… EU
#> 5 EU FP7 ITN INVISIB… 10.1103/Ph… EU
#> 6 European Research Council under… 617143 10.1103/Ph… EU
#> 7 European Union's Horizon resear… 690575 10.1088/14… EU
#> 8 European Union's Horizon resear… 674896 10.1088/14… EU
#> 9 European Union's Horizon resear… 674896 10.1007/JH… EU
#> 10 European Union's Horizon resear… 690575 10.1007/JH… EU
#> # … with 1,118 more rows
In total, 628 Web of Science records acknowledge the European Commission
as funder (wos_b_2019.fundingorganizations.fundingorganization
).
There is a certain variation how EU support is acknowledged in
wos_b_2019.fundingorganizations.fundingorganization
: 313 variants were
found based on our sample.
Example
openaire_wos %>%
as_tibble() %>%
mutate(eu_funding = ifelse(grepl("^EU|^Eur|^ERC", FUNDINGORGANIZATION), "EU", NA)) %>%
filter(eu_funding == "EU") %>%
distinct(FUNDINGORGANIZATION, GRANTNUMBER) %>%
head()
#> # A tibble: 6 x 2
#> FUNDINGORGANIZATION GRANTNUMBER
#> <chr> <chr>
#> 1 European ITN project (FP7-PEOPLE-ITN) PITNGA-2011-289442-I…
#> 2 European Union 690575
#> 3 EU PITN-GA-2011-289442
#> 4 European Union 674896
#> 5 EU FP7 ITN INVISIBLES
#> 6 European Research Council under the European Union… 617143
Extract EU Grant IDs, which end with a six-digit number
openaire_wos_grant <- openaire_wos %>%
mutate(eu_funding = ifelse(grepl("^EU|^Eur|^ERC", FUNDINGORGANIZATION), "EU", NA)) %>%
filter(eu_funding == "EU") %>%
mutate(eu_grant_id = str_extract(GRANTNUMBER, "[0-9]{6,6}")) %>%
mutate(doi = tolower(DOI)) %>%
inner_join(ugoe_dois_df, by = c("doi" = "pid", "eu_grant_id" = "openaire_project_id"))
In total, 538 WoS-KB records have identical Grant IDs.
How many GRANTNUMBER
values consists of a six-digit number:
openaire_wos %>%
mutate(eu_funding = ifelse(grepl("^EU|^Eur|^ERC", FUNDINGORGANIZATION), "EU", NA)) %>%
filter(eu_funding == "EU") %>%
filter(grepl("^[0-9]{6,6}", GRANTNUMBER)) %>%
as_tibble()
#> # A tibble: 858 x 4
#> FUNDINGORGANIZATION GRANTNUMBER DOI eu_funding
#> <chr> <chr> <chr> <chr>
#> 1 European Union 690575 10.1103/Phys… EU
#> 2 European Union 674896 10.1103/Phys… EU
#> 3 European Research Council under th… 617143 10.1103/Phys… EU
#> 4 European Union's Horizon research … 690575 10.1088/1475… EU
#> 5 European Union's Horizon research … 674896 10.1088/1475… EU
#> 6 European Union's Horizon research … 674896 10.1007/JHEP… EU
#> 7 European Union's Horizon research … 690575 10.1007/JHEP… EU
#> 8 European Union's Horizon research … 690575 10.1007/JHEP… EU
#> 9 European Union's Horizon research … 674896 10.1007/JHEP… EU
#> 10 European Union's Horizon research … 674896 10.1007/JHEP… EU
#> # … with 848 more rows