Skip to content
master
Go to file
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
R
 
 
 
 
 
 
man
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

fulltext

cran checks Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-check codecov rstudio mirror downloads cran version

Get full text research articles

Checkout the package docs and the fulltext manual to get started.


rOpenSci has a number of R packages to get either full text, metadata, or both from various publishers. The goal of fulltext is to integrate these packages to create a single interface to many data sources.

fulltext makes it easy to do text-mining by supporting the following steps:

  • Search for articles - ft_search
  • Fetch articles - ft_get
  • Get links for full text articles (xml, pdf) - ft_links
  • Extract text from articles / convert formats - ft_extract
  • Collect all texts into a data.frame - ft_table

Previously supported use cases, extracted out to other packages:

  • Collect bits of articles that you actually need - moved to package pubchunks
  • Supplementary data from papers has been moved to the suppdata package

It's easy to go from the outputs of ft_get to text-mining packages such as tm and quanteda.

Data sources in fulltext include:

Authentication: A number of publishers require authentication via API key, and some even more draconian authentication processes involving checking IP addresses. We are working on supporting all the various authentication things for different publishers, but of course all the OA content is already easily available. See the Authentication section in ?fulltext-package after loading the package.

We'd love your feedback. Let us know what you think in the issue tracker (https://github.com/ropensci/fulltext/issues)

Article full text formats by publisher: https://docs.ropensci.org/fulltext/articles/formats

Installation

Stable version from CRAN

install.packages("fulltext")

Development version from GitHub

remotes::install_github("ropensci/fulltext")

Load library

library('fulltext')

Interoperability with other packages downstream

Note: this example not included in vignettes as that would require the two below packages in Suggests here. To see many examples and documentation see the package docs and the fulltext manual.

cache_options_set(path = (td <- 'foobar'))
res <- ft_get(c('10.7554/eLife.03032', '10.7554/eLife.32763'), type = "pdf")
library(readtext)
x <- readtext::readtext(file.path(cache_options_get()$path, "*.pdf"))
library(quanteda)
quanteda::corpus(x)

Contributors

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for fulltext: citation(package = 'fulltext')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

rofooter

You can’t perform that action at this time.