Skip to content

quantixed/PubMedLagR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PubMedLagR

The goal of PubMedLagR is to analyse the lag time between the publication of a scientific article and its indexing in PubMed. This package provides functions to retrieve publication data from PubMed, process it, and visualize the lag time trends over the years.

It can also be used to retrieve PubMed data into R for other purposes, such as bibliometric analyses, text mining, or any research that requires access to PubMed records.

Installation

You can install the development version of PubMedLagR from GitHub with:

# install.packages("pak")
pak::pak("quantixed/PubMedLagR")

Example

Once the package is installed, in a new project you can use the following code to retrieve PubMed records for a list of journals and years, and then convert the retrieved XML files into a data frame for analysis:

library(PubMedLagR)
jrnl_list <- c("EMBO J","J Cell Biol", "Nat Cell Biol")
yrs <- 2015:2025
retrieve_journal_year_records(jrnl_list, yrs, batch_size = 250)
pprs <- pubmed_xmls_to_df()

In the case of lots of XML files, you might want to save the data frames as CSVs instead of combining them into a single data frame in R. You can do this with the pubmed_xmls_to_csvs() function:

pubmed_xmls_to_csvs()
# load all csvs in Output/Data and combine into one data frame
csv_files <- list.files("Output/Data", pattern = "*.csv", full.names = TRUE)
data_list <- lapply(csv_files, read.csv)
pprs <- do.call(rbind, data_list)

Note

The default option is to include papers and exclude reviews when retrieving records - use papers_only = FALSE to disable this filter.

Similarly, when parsing the XML files to a data frame, there is a clean-up step which removes duplicates, filters out unwanted publication types, and ensures that only journal articles (i.e. papers) are included. You can disable this clean-up step by using clean = FALSE when calling pubmed_xmls_to_df(). When using pubmed_xmls_to_csv() the clean-up step is not applied, so all records in the XML files will be included in the resulting CSVs (and must be manually cleaned) if desired.

About

Publication lag time analysis from PubMed data

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages