R client for the PLoS Journals API
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

rplos

Project Status: Active – The project has reached a stable, usable state and is being actively developed. cran checks Build Status Build status codecov.io rstudio mirror downloads cran version

Install

You can get this package at CRAN here, or install it within R by doing

install.packages("rplos")

Or install the development version from GitHub

install.packages("devtools")
devtools::install_github("ropensci/rplos")
library("rplos")

What is this?

rplos is a package for accessing full text articles from the Public Library of Science journals using their API.

Information

You used to need a key to use rplos - you no longer do as of 2015-01-13 (or v0.4.5.999).

rplos tutorial: http://ropensci.org/tutorials/rplos_tutorial.html

PLOS API documentation: http://api.plos.org/

PLOS Solr schema is at https://gist.github.com/openAccess/9e76aa7fa6135be419968b1372c86957 but is 1.5 years old so may not be up to date.

Crossref API documentation here, and here. Note that we are working on a new package rcrossref (on CRAN) with a much fuller implementation of R functions for all Crossref endpoints.

Throttling

Beware, PLOS recently has started throttling requests. That is, they will give error messages like "(503) Service Unavailable - The server cannot process the request due to a high load", which means you've done too many requests in a certain time period. Here's what they say on the matter:

Please limit your API requests to 7200 requests a day, 300 per hour, 10 per minute and allow 5 seconds for your search to return results. If you exceed this threshold, we will lock out your IP address. If you're a high-volume user of the PLOS Search API and need more API requests a day, please contact us at api@plos.org to discuss your options. We currently limit API users to no more than five concurrent connections from a single IP address.

Quick start

Search

Search for the term ecology, and return id (DOI) and publication date, limiting to 5 items

searchplos('ecology', 'id,publication_date', limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1    44198     0
#> 
#> $data
#> # A tibble: 5 x 2
#>   id                           publication_date    
#>   <chr>                        <chr>               
#> 1 10.1371/journal.pone.0001248 2007-11-28T00:00:00Z
#> 2 10.1371/journal.pone.0059813 2013-04-24T00:00:00Z
#> 3 10.1371/journal.pone.0155019 2016-05-11T00:00:00Z
#> 4 10.1371/journal.pone.0080763 2013-12-10T00:00:00Z
#> 5 10.1371/journal.pone.0150648 2016-03-03T00:00:00Z

Get DOIs for full article in PLoS One

searchplos(q="*:*", fl='id', fq=list('journal_key:PLoSONE',
   'doc_type:full'), limit=5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1   204158     0
#> 
#> $data
#> # A tibble: 5 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pone.0044136
#> 2 10.1371/journal.pone.0155491
#> 3 10.1371/journal.pone.0168631
#> 4 10.1371/journal.pone.0058100
#> 5 10.1371/journal.pone.0168627

Query to get some PLOS article-level metrics, notice difference between two outputs

out <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'), fq='doc_type:full')
out_sorted <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'),
   fq='doc_type:full', sort='counter_total_all desc')
head(out$data)
#> # A tibble: 6 x 3
#>   id                           alm_twitterCount counter_total_all
#>   <chr>                                   <int>             <int>
#> 1 10.1371/journal.pone.0044136                2             15035
#> 2 10.1371/journal.pone.0155491                0              2025
#> 3 10.1371/journal.pone.0168631                0               703
#> 4 10.1371/journal.pone.0058100                0              5091
#> 5 10.1371/journal.pone.0168627                0              2392
#> 6 10.1371/journal.pone.0184491               10               745
head(out_sorted$data)
#> # A tibble: 6 x 3
#>   id                                     alm_twitterCount counter_total_a…
#>   <chr>                                             <int>            <int>
#> 1 10.1371/journal.pmed.0020124                       3281          2525131
#> 2 10.1371/annotation/80bd7285-9d2d-403a…                0          1235195
#> 3 10.1371/journal.pcbi.1003149                        195          1113107
#> 4 10.1371/journal.pone.0141854                       3437           878333
#> 5 10.1371/journal.pcbi.0030102                         64           752783
#> 6 10.1371/journal.pone.0088278                        964           606480

A list of articles about social networks that are popular on a social network

searchplos(q="*:*",fl=c('id','alm_twitterCount'),
   fq=list('doc_type:full','subject:"Social networks"','alm_twitterCount:[100 TO 10000]'),
   sort='counter_total_month desc')
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1       58     0
#> 
#> $data
#> # A tibble: 10 x 2
#>    id                           alm_twitterCount
#>    <chr>                                   <int>
#>  1 10.1371/journal.pone.0150989              241
#>  2 10.1371/journal.pbio.1002373              402
#>  3 10.1371/journal.pone.0183551              405
#>  4 10.1371/journal.pone.0175368             1114
#>  5 10.1371/journal.pone.0149777              217
#>  6 10.1371/journal.pone.0064841              168
#>  7 10.1371/journal.pone.0143611              104
#>  8 10.1371/journal.pone.0138717              180
#>  9 10.1371/journal.pone.0166570              677
#> 10 10.1371/journal.pone.0061981             2392

Show all articles that have these two words less then about 15 words apart

searchplos(q='everything:"sports alcohol"~15', fl='title', fq='doc_type:full', limit=3)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1      127     0
#> 
#> $data
#> # A tibble: 3 x 1
#>   title                                                                    
#>   <chr>                                                                    
#> 1 Alcohol Advertising in Sport and Non-Sport TV in Australia, during Child…
#> 2 Alcohol intoxication at Swedish football matches: A study using biologic…
#> 3 Symptoms of Insomnia and Sleep Duration and Their Association with Incid…

Narrow results to 7 words apart, changing the ~15 to ~7

searchplos(q='everything:"sports alcohol"~7', fl='title', fq='doc_type:full', limit=3)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1       71     0
#> 
#> $data
#> # A tibble: 3 x 1
#>   title                                                                    
#>   <chr>                                                                    
#> 1 Alcohol Advertising in Sport and Non-Sport TV in Australia, during Child…
#> 2 Alcohol intoxication at Swedish football matches: A study using biologic…
#> 3 Symptoms of Insomnia and Sleep Duration and Their Association with Incid…

Remove DOIs for annotations (i.e., corrections) and Viewpoints articles

searchplos(q='*:*', fl=c('id','article_type'),
   fq=list('-article_type:correction','-article_type:viewpoints'), limit=5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1  1994390     0
#> 
#> $data
#> # A tibble: 5 x 2
#>   id                                                  article_type    
#>   <chr>                                               <chr>           
#> 1 10.1371/journal.pone.0058099/materials_and_methods  Research Article
#> 2 10.1371/journal.pone.0030394/introduction           Research Article
#> 3 10.1371/journal.pone.0030394/results_and_discussion Research Article
#> 4 10.1371/journal.pone.0002157/materials_and_methods  Research Article
#> 5 10.1371/journal.pone.0030394/supporting_information Research Article

Faceted search

Facet on multiple fields

facetplos(q='alcohol', facet.field=c('journal','subject'), facet.limit=5)
#> $facet_queries
#> NULL
#> 
#> $facet_fields
#> $facet_fields$journal
#> # A tibble: 5 x 2
#>   term                             value
#>   <chr>                            <chr>
#> 1 plos one                         25190
#> 2 plos genetics                    572  
#> 3 plos medicine                    498  
#> 4 plos neglected tropical diseases 453  
#> 5 plos pathogens                   341  
#> 
#> $facet_fields$subject
#> # A tibble: 5 x 2
#>   term                          value
#>   <chr>                         <chr>
#> 1 biology and life sciences     26987
#> 2 medicine and health sciences  24098
#> 3 research and analysis methods 15433
#> 4 biochemistry                  13129
#> 5 physical sciences             10141
#> 
#> 
#> $facet_pivot
#> NULL
#> 
#> $facet_dates
#> NULL
#> 
#> $facet_ranges
#> NULL

Range faceting

facetplos(q='*:*', url=url, facet.range='counter_total_all',
 facet.range.start=5, facet.range.end=100, facet.range.gap=10)
#> $facet_queries
#> NULL
#> 
#> $facet_fields
#> NULL
#> 
#> $facet_pivot
#> NULL
#> 
#> $facet_dates
#> NULL
#> 
#> $facet_ranges
#> $facet_ranges$counter_total_all
#> # A tibble: 10 x 2
#>    term  value
#>    <chr> <chr>
#>  1 5     342  
#>  2 15    289  
#>  3 25    521  
#>  4 35    979  
#>  5 45    1475 
#>  6 55    1783 
#>  7 65    1904 
#>  8 75    1812 
#>  9 85    1686 
#> 10 95    1565

Highlight searches

Search for and highlight the term alcohol in the abstract field only

(out <- highplos(q='alcohol', hl.fl = 'abstract', rows=3))
#> $`10.1371/journal.pone.0201042`
#> $`10.1371/journal.pone.0201042`$abstract
#> [1] "\nAcute <em>alcohol</em> administration can lead to a loss of control over drinking. Several models argue"
#> 
#> 
#> $`10.1371/journal.pone.0185457`
#> $`10.1371/journal.pone.0185457`$abstract
#> [1] "Objectives: <em>Alcohol</em>-related morbidity and mortality are significant public health issues"
#> 
#> 
#> $`10.1371/journal.pone.0071284`
#> $`10.1371/journal.pone.0071284`$abstract
#> [1] "\n<em>Alcohol</em> dependence is a heterogeneous disorder where several signalling systems play important"

And you can browse the results in your default browser

highbrow(out)

highbrow

Full text urls

Simple function to get full text urls for a DOI

full_text_urls(doi='10.1371/journal.pone.0086169')
#> [1] "http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0086169&type=manuscript"

Full text xml given a DOI

(out <- plos_fulltext(doi='10.1371/journal.pone.0086169'))
#> 1 full-text articles retrieved 
#> Min. Length: 110717 - Max. Length: 110717 
#> DOIs: 10.1371/journal.pone.0086169 ... 
#> 
#> NOTE: extract xml strings like output['<doi>']

Then parse the XML any way you like, here getting the abstract

library("XML")
xpathSApply(xmlParse(out$`10.1371/journal.pone.0086169`), "//abstract", xmlValue)
#> [1] "Mammalian females pay high energetic costs for reproduction, the greatest of which is imposed by lactation. The synthesis of milk requires, in part, the mobilization of bodily reserves to nourish developing young. Numerous hypotheses have been advanced to predict how mothers will differentially invest in sons and daughters, however few studies have addressed sex-biased milk synthesis. Here we leverage the dairy cow model to investigate such phenomena. Using 2.39 million lactation records from 1.49 million dairy cows, we demonstrate that the sex of the fetus influences the capacity of the mammary gland to synthesize milk during lactation. Cows favor daughters, producing significantly more milk for daughters than for sons across lactation. Using a sub-sample of this dataset (N = 113,750 subjects) we further demonstrate that the effects of fetal sex interact dynamically across parities, whereby the sex of the fetus being gestated can enhance or diminish the production of milk during an established lactation. Moreover the sex of the fetus gestated on the first parity has persistent consequences for milk synthesis on the subsequent parity. Specifically, gestation of a daughter on the first parity increases milk production by ∼445 kg over the first two lactations. Our results identify a dramatic and sustained programming of mammary function by offspring in utero. Nutritional and endocrine conditions in utero are known to have pronounced and long-term effects on progeny, but the ways in which the progeny has sustained physiological effects on the dam have received little attention to date."

Search within a field

There are a series of convience functions for searching within sections of articles.

  • plosauthor()
  • plosabstract()
  • plosfigtabcaps()
  • plostitle()
  • plossubject()

For example:

plossubject(q='marine ecology',  fl = c('id','journal'), limit = 10)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1     3892     0
#> 
#> $data
#> # A tibble: 10 x 2
#>    id                                        journal 
#>    <chr>                                     <chr>   
#>  1 10.1371/journal.pone.0167252              PLOS ONE
#>  2 10.1371/journal.pone.0167252/title        PLOS ONE
#>  3 10.1371/journal.pone.0167252/abstract     PLOS ONE
#>  4 10.1371/journal.pone.0167252/references   PLOS ONE
#>  5 10.1371/journal.pone.0167252/body         PLOS ONE
#>  6 10.1371/journal.pone.0149852/title        PLOS ONE
#>  7 10.1371/journal.pone.0149852/abstract     PLOS ONE
#>  8 10.1371/journal.pone.0149852/references   PLOS ONE
#>  9 10.1371/journal.pone.0149852/body         PLOS ONE
#> 10 10.1371/journal.pone.0149852/introduction PLOS ONE

However, you can always just do this in searchplos() like searchplos(q = "subject:science"). See also the fq parameter. The above convenience functions are simply wrappers around searchplos, so take all the same parameters.

Search by article views

Search with term marine ecology, by field subject, and limit to 5 results

plosviews(search='marine ecology', byfield='subject', limit=5)
#>                             id counter_total_all
#> 5 10.1371/journal.pone.0201675                 0
#> 1 10.1371/journal.pone.0167252              1379
#> 2 10.1371/journal.pone.0021810              2883
#> 3 10.1371/journal.pone.0092590              9580
#> 4 10.1371/journal.pone.0149852             10494

Visualize

Visualize word use across articles

plosword(list('monkey','Helianthus','sunflower','protein','whale'), vis = 'TRUE')
#> $table
#>   No_Articles       Term
#> 1       12570     monkey
#> 2         548 Helianthus
#> 3        1529  sunflower
#> 4      142842    protein
#> 5        1758      whale
#> 
#> $plot

wordusage

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for rplos in R doing citation(package = 'rplos')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

This package is part of a richer suite called fulltext, along with several other packages, that provides the ability to search for and retrieve full text of open access scholarly articles. We recommend using fulltext as the primary R interface to rplos unless your needs are limited to this single source.


rofooter