CRAN OpenData Task View
R Makefile
Latest commit dda4aa3 Feb 22, 2017 @ashander ashander Make and update todo

README.md

CRAN Task View: Open Data

Do not edit this README by hand. See CONTRIBUTING.md.

----------------- --------------------------------------------------
Maintainer: Jaime Ashander, Scott Chamberlain, Thomas Leeper
Contact: jashander at ucdavis.edu
Version: 2017-02-22
URL: https://CRAN.R-project.org/view=OpenData

This Task View contains information about using R to obtain, parse, manipulate, create, and share open data. Much open data is available on the web, and the WebTechnologies TaskView addresses how to obtain and parse web-based data. There is obvious overlap between the two TaskViews, so some packages are described on both. There is also a considerable amount of open data available as R packages on CRAN. We point readers to the crandatapkgs package to obtain information about currently available open data in R packages.

Another key issue in a data-focused TaskView is the meaning of "open" data. This TaskView covers many types of data that come with varying degrees of usage restrictions from public domain (or CC-0) data that is usable for any purpose to "freely available" data that is available at no cost but may have licenses that are not strictly speaking "open". Users should investigate the terms of use and licensing of any data referenced here before using it for any particular application. Additionally, the view lists wrappers for paid APIs, as well as those that require an account but are not necessarily subscription only. These are marked ($) and (K) respectively.

If you have any comments or suggestions for additions, revisions, or improvements for this taskview, go to GitHub and submit an issue, or make some changes and submit a pull request. If you can't contribute on GitHub, send Jaime an email. If you have an issue with one of the packages discussed below, please contact the maintainer of that package.

If you know of a web service, API, data source, or other online resource that is not yet supported by an R package, consider adding it to the package development to do list on GitHub.

Data Sharing and Archiving

Data sharing involves the dissemination of data in draft form or for a temporary period of time. rdrop2 (GitHub) is a Dropbox.com interface from R, providing access to a full suite of file operations, including dir/copy/move/delete operations, account information (including quotas) and the ability to upload and download files from any Dropbox account (K). boxr is a lightweight, high-level R interface for the box.com API (K). RAmazonS3 provides an interface to the Amazon Simple Storage Service (S3) (K).

Data archiving involves the production and dissemination of open data that is persistently accessible, typically in public repositories. The tools below may be useful for both archiving data and retrieving extant data from public archives.

  • ckanr: A generic R client to interact with the CKAN data portal software API ( http://ckan.org/). Allows user to swap out the base URL to use any CKAN instance. Source on GitHub.
  • dataone: Read/write access to data and metadata from the DataONE network of Member Node data repositories.
  • dvn (GitHub) provides access to Dataverse Network repositories. UNF implements the Universal Numeric Fingerprint, a format-independent data hashing algorithm used by Dataverse, to verify and cite a dataset.
  • factualR: Thin wrapper for the Factual.com server API.
  • The Rflickr (not on CRAN) package provides an R interface to the Flickr photo management and sharing application Web service. (not on CRAN) (K)
  • googlesheets (not on CRAN): Access private or public Google Sheets by title, key, or URL. Extract data or edit data. Create, delete, rename, copy, upload, or download spreadsheets and worksheets. Source on GitHub
  • gsheet: Download Google Sheets using just the sharing link. Spreadsheets can be downloaded as a data frame, or as plain text to parse manually. Source on GitHub
  • imguR (GitHub): A package to share plots using the image hosting service Imgur.com. knitr also has a function imgur_upload() to load images from literate programming documents.
  • infochimps (archived on CRAN; GitHub) is an R wrapper for the infochimps.com API services, from Drew Conway.
  • internetarchive (not on CRAN): API client for internet archive metadata. Source on GitHub.
  • jSonarR: Enables users to access MongoDB by running queries and returning their results in R data frames. jSonarR uses data processing and conversion capabilities in the jSonar Analytics Platform and the JSON Studio Gateway, to convert JSON to a tabular format.
  • OAIHarvester: Harvest metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). oai is a more recent package for OAI.
  • Quandl: A package that interacts directly with the Quandl API to offer data in a number of formats usable in R, as well as the ability to upload and search.
  • rdatamarket: Fetches data from DataMarket.com, either as timeseries in zoo form (dmseries) or as long-form data frames (dmlist).
  • rerddap (not on CRAN): A generic R client to interact with any ERDDAP instance, which is a special case of OPeNDAP ( https://en.wikipedia.org/wiki/OPeNDAP), or Open-source Project for a Network Data Access Protocol . Allows user to swap out the base URL to use any ERDDAP instance. Source on GitHub.
  • rfigshare: Programmatic interface for Figshare.com. Source on GitHub.
  • rscribd (not on CRAN): API client for publishing documents to Scribd.
  • RSocrata: (temporarily archived on CRAN for email bounce) Provided with a Socrata dataset resource URL, or a Socrata SoDA web API query, returns an R data frame. Converts dates to POSIX format. Supports CSV and JSON. Manages throttling by Socrata.

Web-based Open Data

Agriculture | Astronomy | Business | Chemistry | Climate | Earth Science | Ecology/Evolution | Finance | Genes/Genomes | Geocoding | Google Analytics | Google Web Services | Government and Economics | Literature/Text-mining | Maps | Marketing | NCBI | News | Other | Public Health | Social Media | Social Science | Sports | Web Analytics | Wikipedia |

Agriculture

  • cdlTools: download USDA National Agricultural Statistics Service (NASS) cropscape data for a specified state. Utilities for fips, abbreviation, and name conversion are also provided.
  • cimis: R package for retrieving data from CIMIS, the California Irrigation Management Information System. Available in CRAN archives only.
  • FAOSTAT: The package hosts a list of functions to download, manipulate, construct and aggregate agricultural statistics provided by the FAOSTAT (Food and Agricultural Organization of the United Nations) database.

Astronomy

  • RStars: Star-API provides API access to the American Museum of Natural History's Digital Universe Data, including positions, luminosity, color, and other data on over 100,000 stars as well as constellations, exo-planets, clusters and others. Source on GitHub.

Business

  • forbesListR (not on CRAN) offers access to a number of business-related datasets provided by Forbes.

Chemistry

  • rpubchem: Interface to the PubChem Collection.
  • webchem: Retrieve chemical information from a suite of web APIs for chemical information. Source on GitHub

Earth Science

  • dataRetrieval: Collection of functions to help retrieve USGS data from either web services or user-provided data files. on GitHub.
  • getlandsat: obtain landsat 8 Data from [Amazon Web Services public data sets] (https://aws.amazon.com/public-data-sets/landsat/) --- list images and fetch them, handles caching to prevent unnecessary additional requests.
  • hddtools: Hydrological data discovery tools - accesses data from NASA, Global Runoff Data Centre, Top-Down modelling Working Group. Source on GitHub
  • marmap: Import, plot and analyze bathymetric and topographic data from NOAA.
  • mregions: data from Marine Regions, including region metadata, GeoJSON data, as well as Shape files. Use cases include using data downstream to visualize geospatial data by marine region, mapping variation among different regions, and more.
  • raincpc: The Climate Prediction Center's (CPC) daily rainfall data for the entire world, from 1979 to the present, at a resolution of 50 km (0.5 degrees lat-lon). This package provides functionality to download and process the raw data from CPC.
  • rainfreq: Estimates of rainfall at desired frequency and desired duration are often required in the design of dams and other hydraulic structures, catastrophe risk modeling, environmental planning and management. One major source of such estimates for the USA is the NOAA National Weather Service's (NWS) division of Hydrometeorological Design Studies Center (HDSC). Raw data from NWS-HDSC is available at 1-km resolution and comes as a huge number of GIS files.
  • rFDSN: Search for and download seismic time series in miniSEED format (a minimalist version of the Standard for the Exchange of Earthquake Data) from International Federation of Digital Seismograph Networks repositories. This package can also be used to gather information about seismic networks (stations, channels, locations, etc) and find historical earthquake data (origins, magnitudes, etc).
  • rnrfa: Utility functions to retrieve data from the UK National River Flow Archive via an API (http://nrfa.ceh.ac.uk). There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information.
  • soilDB: A collection of functions for reading data from USDA-NCSS soil databases.
  • sos4R: A client for Sensor Observation Services (SOS) as specified by the Open Geospatial Consortium (OGC). It allows users to retrieve metadata from SOS web services and to interactively create requests for near real-time observation data based on the available sensors, phenomena, observations, etc. using thematic, temporal and spatial filtering.
  • waterData: An R Package for retrieval, analysis, and anomaly calculation of daily hydrologic time series data.
  • WaterML can retrieve WaterOneFlow Hydroserver data.

Climate

  • BerkeleyEarth: Data input for Berkeley Earth Surface Temperature. Archived on CRAN.
  • CHCN: A compilation of historical through contemporary climate measurements scraped from the Environment Canada Website Including tools for scraping data, creating metadata and formatting temperature files.
  • clifro: Designed to minimise the hassle in downloading data from New Zealand's National Climate Database via CliFlo. Source on GitHub
  • crn: Provides the core functions required to download and format data from the Climate Reference Network. Both daily and hourly data are downloaded from the ftp, a consolidated file of all stations is created, station metadata is extracted. In addition functions for selecting individual variables and creating R friendly datasets for them is provided.
  • darksky: the Dark Sky API, which provides current or historical global weather conditions.
  • decctools (archived on CRAN) provides functions for retrieving energy statistics from the United Kingdom Department of Energy and Climate Change and related data sources. The current version focuses on total final energy consumption statistics at the local authority, MSOA, and LSOA geographies. Methods for calculating the generation mix of grid electricity and its associated carbon intensity are also provided.
  • getCRUCLdata: download and import climatology data from University of East Anglia Climate Research Unit (CRU) CL2.0 data into R; calculate minimum temperature and maximum temperature; formats the data into a tidy data frame or a list of raster stack objects for use in an R session. CRU CL2.0 data are a gridded climatology of 1961-1990 monthly means released in 2002 and cover all land areas (excluding Antarctica) at 10-minute resolution.
  • GhcnDaily (archived on CRAN) downloads and processes Global Historical Climatology Network (GHCN) daily data from the National Climatic Data Center (NCDC).
  • GSODR provides acess to data from USA National Climatic Data Center (NCDC) Global Surface Summary of the Day (GSOD) weather stations, as well as functions for working with these data.
  • Metadata: Collates metadata for climate surface stations. Archived on CRAN.
  • meteoForecast: meteoForecast is a package to access to several Numerical Weather Prediction services both in raster format and as a time series for a location. Currenty it works with GFS, Meteogalicia, OpenMeteo, NAM, and RAP. [Source on GitHub](https://github.com/oscarperpinan/meteoForecast/
  • okmesonet: Retrieves Oklahoma (USA) Mesonet climatological data provided by the Oklahoma Climatological Survey.
  • prism (GitHub) provides access to Oregon State Prism climate data.
  • rclimateca: access Environment Canada data on temperature, precipitation, and wind data for more than 8,000 locations.
  • rdefra (GitHub) retrieves UK air pollution data from DEFRA's UK-AIR website.
  • rdwd: Handle climate data from the 'DWD' ('Deutscher Wetterdienst')
  • RFc (GitHub) can retrieve weather data from the FetchClimate Web Service.
  • riem (GitHub) offers access to Automated Surface Observing System (ASOS) stations (airports) in the whole world thanks to the Iowa Environment Mesonet website.
  • RNCEP: Obtain, organize, and visualize NCEP weather data.
  • rnoaa: R interface to NOAA Climate data API. See also countyweather which uses this interface to aggregate data at the county level.
  • rNOMADS: An interface to the NOAA Operational Model Archive and Distribution System (NOMADS) that allows download of global and regional weather model data, and supports a variety of models ranging from global weather data to an altitude of 40 km, to high resolution regional weather models, to wave and sea ice models. It can also retrieve archived NOMADS models. Source: rnomads.
  • ropenaq (GitHub) provides air quality data from the OpenAQ platform.
  • rWBclimate: R interface for the World Bank climate data. Source on GitHub
  • rWind: Tools for downloading, editing and transforming wind data from Global Forecast System (GFS) of the USA's National Weather Service (NWS).
  • rwunderground access historical weather information and forecasts from wunderground.com. Historical weather and forecast data includes, but is not limited to, temperature, humidity, windchill, wind speed, dew point, heat index. Additionally, the weather underground weather API also includes information on sunrise/sunset, tidal conditions, satellite/webcam imagery, weather alerts, hurricane alerts and historical record high/low temperatures.
  • SkyWatchr: satellite imagery and climate/atmospheric datasets from the SkyWatch API. Search by wavelength (band), cloud cover, resolution, location, date, etc.
  • stationaRy can retrieve hourly weather data from various global weather stations.
  • weatherData: Functions that help in fetching weather data from websites. Given a location and a date range, these functions help fetch weather data (temperature, pressure etc.) for any weather related analysis.
  • weatherr combines data from multiple APIs to obtain instant weather forecasts.
  • worldmet: import data from more than 30,000 surface meteorological sites around the world managed by the National Oceanic and Atmospheric Administration (NOAA) Integrated Surface Database

Ecology and Evolutionary Biology

  • BAAD: a Biomass And Allometry Database for woody plants (not on CRAN): an interface to access data from a data paper published in Ecology . Full source for the database is also on GitHub.
  • biomart retrieves data from a number of public biological data repositories including http://www.biomart.org, NCBI refseq, Gene Ontology.
  • dismo: Species distribution modeling, with wrappers to Google APIs for maps and geocoding.
  • ecoengine (GitHub) provides access to more than 2 million georeferenced specimen records from ecoengine from the Berkeley Natural History Museums.
  • ecoretriever (GitHub) provides an R interface to the EcoData Retriever via the EcoData Retriever's command line interface. The EcoData Retriever automates the tasks of finding, downloading, and cleaning ecological datasets, and then stores them in a local database (including SQLite, MySQL, etc.).
  • icesDatras: the DATRAS trawl survey database from ICES (International Council for the Exploration of the Sea).
  • natserv: access NatureServe data, image metadata, search taxonomic names, and make maps.
  • neotoma (GitHub) offers programmatic R interface to the Neotoma Paleoecological Database.
  • paleobioDB: Functions to wrap each endpoint of the PaleobioDB API, plus functions to visualize and process the fossil data. The API documentation for the Paleobiology Database can be found at http://paleobiodb.org/data1.1/.
  • pleiades: interact with Pleiades API for Archeological data --- get status data, places data, make GeoJSON maps.
  • rbison (GitHub) is a wrapper to the USGS Bison API.
  • Rcolombos: This package provides programmatic access to Colombos, a web based interface for exploring and analyzing comprehensive organism-specific cross-platform expression compendia of bacterial organisms.
  • ridigbio (not on CRAN) is an interface for https://www.idigbio.org/.
  • rebird (GitHub) is a programmatic interface to the eBird database.
  • rdopa (not on CRAN): Access data from the Digital Observatory for Protected Areas (DOPA) REST API.
  • Reol (GitHub) is an R interface to the Encyclopedia of Life (EOL) API. Includes functions for downloading and extracting information off the EOL pages.
  • rfishbase (GitHub) is a programmatic interface to fishbase.org.
  • rfisheries (GitHub) interacts with fisheries databases at openfisheries.org.
  • rnpn (GitHub): Wrapper to the National Phenology Network database API.
  • rredlist (GitHub) is an API client for the IUCN red list of threatened and engaged species.
  • rvertnet (GitHub) is a wrapper to the VertNet collections database API.
  • rYoutheria: A programmatic interface to web-services of Youtheria, an online database of mammalian trait data. Development version on GitHub here
  • spocc (GitHub) offers a programmatic interface to many species occurrence data sources, including GBIF, USGS's BISON, iNaturalist, Berkeley Ecoinformatics Engine eBird, AntWeb, and more as they sources become easily available. rinat provides another interface to iNaturalist. spoccutils (Github) provides various utilities for working with data retrieved using spocc.
  • TR8 (GitHub) contains a set of tools which take care of retrieving trait data for plant species from publicly available databases via web services (including: Biolflor, The Ecological Flora of the British Isles, LEDA traitbase, Ellenberg values for Italian Flora, Mycorrhizal intensity database).
  • traits (Github) can retrieve species trait data from many online sources.
  • rusda connects to a large number of USDA databases, especially for fungal-host combinations.

Biodiversity and Taxonomy

  • ALA4R: Atlas of Living Australia (ALA) provides tools to enable users of biodiversity information to find, access, combine and visualise data on Australian plants and animals.
  • flora: Retrieve taxonomical information of botanical names from the Flora do Brasil website.
  • icesVocab: the ICES (International Council for the Exploration of the Sea) Vocabularies Database (RECO POX).
  • rbhl: R interface to the Biodiversity Heritage Library (BHL) API.
  • rgbif: Interface to the Global Biodiversity Information Facility API methods. Source on GitHub
  • rnbn (GitHub) is an R interface to the UK National Biodiversity Network.
  • rPlant: An R interface to the the many computational resources iPlant offers through their RESTful application programming interface. Currently, rPlant functions interact with the iPlant foundational API, the Taxonomic Name Resolution Service API, and the Phylotastic Taxosaurus API. Before using rPlant, users will have to register with the iPlant Collaborative
  • taxize: Taxonomic information from around the web. A single unified interface to many web APIs for taxonomic data, including NCBI, ITIS, Tropicos and more. Source on GitHub
  • The tpl package doesn't interact with the web directly, but queries locally stored data from theplantlist.org, and data will be updated when theplantlist updates, which is not very often. There is another package for interacting with this same data, called Taxonstand.
  • worrms: World Register of Marine Species including searching by name, date and common names, searching using external identifiers, fetching synonyms, as well as fetching taxonomic children and taxonomic classification.

Phylogenetics

Finance

  • BatchGetSymbols download and organize financial data (from Yahoo or Google Finance) for multiple ticker symbols.
  • belex: historical financial data form the Belgrade Stock Exchange (Serbia)
  • dataonderivatives Post-GFC derivatives reforms have lifted the veil off over-the-counter (OTC) derivative markets. Swap Execution Facilities (SEFs) and Swap Data Repositories (SDRs) now publish data on swaps that are traded on or reported to those facilities (respectively). This package provides you the ability to get this data from supported sources.
  • Datastream2R (not on CRAN): Another package for accessing the Datastream service. This package downloads data from the Thomson Reuters DataStream DWE server, which provides XML access to the Datastream database of economic and financial information.
  • epidata: data from Economic Policy Institute on wages, inequality, and other economic indicators over time and among demographic groups. Data is usually updated monthly.
  • fImport: Environment for teaching "Financial Engineering and Computational Finance"
  • GetTDData downloads and aggregates data for Brazilian government issued bonds directly from the website of Tesouro Direto.
  • IBrokers: Provides native R access to Interactive Brokers Trader Workstation API. ($)
  • pdfetch: A package for downloading economic and financial time series from public sources.
  • quantmod: Functions for financial quantitative modelling as well as data acquisition, plotting and other utilities.
  • Rbitcoin: Ineract with Bitcoin. Both public and private API calls. Support HTTP over SSL. Debug messages of Rbitcoin, debug messages of RCurl, error handling.
  • rbitcoinchartsapi: An R package for the BitCoinCharts.com API. From their website: "Bitcoincharts provides financial and technical data related to the Bitcoin network and this data can be accessed via a JSON application programming interface (API)."
  • Rblpapi: R client for Bloomberg Finance L.P. Source on GitHub ($)
  • RCryptsy wraps the API for the Cryptsy crypto-currency trading platform. Source on GitHub. ($)
  • RDatastream (not on CRAN): An R interface to the Thomson Dataworks Enterprise SOAP API, with some convenience functions for retrieving Datastream data specifically. ($)
  • RJSDMX and rsdmx both retrieve data and metadata from SDMX compliant data providers.
  • TFX: Connects to TrueFX(tm) for free streaming real-time and historical tick-by-tick market data for dealable interbank foreign exchange rates with millisecond detail.
  • Thinknum: Interacts with the Thinknum API. ($)
  • tseries: Includes the get.hist.quote for historical financial data.
  • ustyc: US Treasury yield curve data retrieval. Development version on GitHub here.

Genes and Genomes

  • aggRmesh: R client for the National Center for Integrative Biomedical Informatics (NCIBI) data.
  • cgdsr: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS).
  • chromer: A programmatic interface to the Chromosome Counts Database. Source on GitHub
  • The mygene.r package is an R client for accessing Mygene.info annotation and query services.
  • GoogleGenomics reads data from the Google Genomics API and returns BioConductor-compatible S4 classes.
  • primerTree: Visually Assessing the Specificity and Informativeness of Primer Pairs.
  • rsnps: This package is a programmatic interface to various SNP datasets on the web: openSNP, NBCI's dbSNP database, and Broad Institute SNP Annotation and Proxy Search. This package started as a library to interact with openSNP alone, so most functions deal with openSNP.
  • seq2R: Detect compositional changes in genomic sequences - with some interaction with GenBank. Archived on CRAN.
  • seqinr: Exploratory data analysis and data visualization for biological sequence (DNA and protein) data.
  • SoyNAM: Genomic and multi-environmental soybean data. Soybean Nested Association Mapping (SoyNAM) project dataset funded by the United Soybean Board (USB), pre-formatted for general analysis and genome-wide association analysis using the NAM package.
  • ungeneanno: collate annotation and summary information about a set of genes from the publicly available resources at Uniprot and NCBI; including publication information from a search of the NCBI Pubmed database.
  • NCBI EUtils web services: See the NCBI section

Geocoding

Google Analytics

Google Web Services

  • bigrquery: An interface to Google's bigquery from R. Source on GitHub (K)
  • ganalytics (not on CRAN): Interface to Google Analytics APIs. Source on GitHub (K)
  • GAR: Interface to Google Analytics APIs. Source on GitHub (K)
  • GFusionTables (not on CRAN): An R interface to Google Fusion Tables. Google Fusion Tables is a data management system in the cloud. This package provides R functions to browse Fusion Tables catalog, retrieve data from Fusion Tables storage to R and to upload data from R to Fusion Tables (K)
  • googlePublicData: An R library to build Google's public data explorer DSPL metadata files.
  • googleVis: Interface between R and the Google chart tools.
  • gooJSON (Archived on CRAN) is a Google JSON data interpreter for R which contains a suite of helper functions for obtaining data from the Google Maps API JSON objects.
  • plotGoogleMaps: Plot SP or SPT(STDIF,STFDF) data as HTML map mashup over Google Maps.
  • plotKML: Visualization of spatial and spatio-temporal objects in Google Earth.
  • RAdwords: A package for loading Google Adwords data. Source on GitHub
  • RGA: Provides functions for accessing and retrieving data from the Google Analytics APIs. Also, the RGA package provides a shiny app to explore data. There is another R package for the same service (RGoogleAnalytics); see above entry.
  • RGoogleAnalytics: Provides functions for accessing and retrieving data from the Google Analytics API. Source on GitHub. There is another R package for the same service (RGA); see next entry.
  • The RGoogleDocs (not on CRAN) package is an example of using the RCurl and XML packages to quickly develop an interface to the Google Documents API.
  • RGoogleStorage (not on CRAN) provides programmatic access to the Google Storage API. This allows R users to access and store data on Google's storage. We can upload and download content, create, list and delete folders/buckets, and set access control permissions on objects and buckets.
  • RGoogleTrends (not on CRAN) provides programmatic access to Google Trends data. This is information about the popularity of a particular query.
  • translate: Bindings for the Google Translate API v2
  • translateR provides bindings for both Google and Microsoft translation APIs.

Government

There are a very large number of packages providing access to government data. Here is a list of these packages, arranged by country and/or other jurisdiction.

  • Australia: eechidna provides data from the 2013 Australian Federal Election (House of Representatives) and the 2011 Australian Census.
  • Brazil: BETS: Brazilian Economic Time Series from the Central Bank of Brazil, Getulio Vargas Foundation, and the Brazilian Institute of Geography. The package also provides tools for automated reporting (dynamic documents). ecoseries: interface to Bacen and Sidra APIs and data from IPEA in Brazil.
  • Denmark: dkstat (not on CRAN): A package to access the StatBank API from Statistics Denmark. taxdk (not on CRAN) provides tax information for Danish companies.
  • Europe :
  • Finland :
    • pxweb (GitHub) is a generic interface for the PX-Web/PC-Axis API. The PX-Web/PC-Axis API is used by organizations such as Statistics Sweden and Statistics Finland to disseminate data. The R package can interact with all PX-Web/PC-Axis APIs to fetch information about the data hierarchy, extract metadata and extract and parse statistics to R data.frame format.
    • sorvi (GitHub): Various tools for retrieving and working with Finnish open government data.
  • Germany: BerlinData (archived on CRAN): Easy access to http://daten.berlin.de. It allows you to search through the data catalogue and to download the data directly from within R. rdnb connects to resources of the German National Library.
  • India: usaqmindia provides data from the US air quality monitoring program in India for Delhi, Mumbai, Chennai, Hyderabad and Kolkata. Data source is US governement via this website.
  • Japan: govStatJPN offers functions to get public survey data in Japan. estatapi links to the Japanese government's e-Stat official statistics API. kokudosuuchi: interface with Kokudo Suuchi API the GIS data service of the Japanese government.
  • Mexico: inegiR (GitHub) can download official statistics for Mexico. Note: package functions and documentation are in Spanish. banxicoR scrape IQY calls to Bank of Mexico.
  • Netherlands: cbsodataR connects with the Statistics Netherlands datasets. Source on GitHub.
  • Poland :
    • saos (not on CRAN) is an interface to the API for SAOS, a repository of judgments from Polish common courts (district, regional and appellate) and the Supreme Court of Poland.
    • sejmRP (GitHub) provides data on deputies and voting in the Polish Diet.
  • Russia: sophisthse provides economic indicators from the Archive of Economic and Social Data
  • United States of America :
    • U.S. Census Bureau: acs can download, manipulate, and present data from the US Census American Community Survey. censusr connects to both ACS and SF1 datasets. idbr (GitHub) provides an interface to the U.S. Census Bureau international data base API. blsAPI (GitHub) can get data from the U.S. Bureau of Labor Statistics API. Users provide parameters as specified in http://www.bls.gov/developers/api_signature.htm and the function returns a JSON string. See also blscrapeR which also provides functions to analyze and visualize BLS data.
    • Education: LearnDC provides access to LearnDC's data on Washington DC charter schools.
    • Energy Department: EIAdata: U.S. Energy Information Administration (EIA) API client. See also eia (not on CRAN). energyr: Federal Energy Regulatory Commission data including electric company financials, natural gas company financials, hydropower plant data, liquified natural gas plant data, oil company financials, natural gas company financials, and natural gas storage field data.
    • Elections: elexr is an R interface to the Python elex library, which provides access to Associated Press election results. openelections (not on CRAN) connects to the openelections API. pollstR (GitHub): An R client for the Huffpost Pollster API. pvsR: An R package to interact with the Project Vote Smart API for scientific research. ropensecretsapi: An R package for the OpenSecrets.org web services API.
    • Federal Reserve: FredR: R Interface to the Federal Reserve Economic Data API. Source on GitHub
    • Justice Department: bjs2r: Get Bureau of Justice Statistics (BJS) data in R.
    • csp (GitHub) provides the complete Correlates of State Policy data set.
    • federalregister: Client package for the U.S. Federal Register API. Development version on GitHub here.
    • polidata (GitHub): Access to various political data APIs, including e.g. Google Civic Information API or Sunlight Congress API for US Congress data, and POPONG API for South Korea National Assembly data.
    • rodham retrieves text of Hillary Rodham Clinton's emails from her time as U.S. Secretary of State.
    • RPublica (GitHub) is a ProPublica API Client.
    • rsunlight (GitHub): R client for the Sunlight Labs APIs. There are functions for Sunlight Labs Congress, Transparency, Open States, Real Time Congress, Capitol Words, and Influence Explorer APIs. Data outputs are R lists. There are also a few convenience functions for visualizing data and writing data to .csv.
    • rtimes (GitHub) links to the New York Times APIs, including the Congress, Article Search, Campaign Finance, and Geographic APIs. The focus is on those that deal with political data, but throwing in Article Search and Geographic for good measure.
    • seeclickfixr (GitHub) is a client for retrieving citizens' service requests made to local governments through SeeClickFix.
    • wethepeople: An R client for interacting with the White House's "We The People" petition API.
  • United Kingdom: ukgasapi contains one function which allows users to access UK gas market information via National Grid's API. mnis: An API package for the Members' Name Information Service operated by the UK parliament. hansard download data from the Parliment API. ukpolice data from UK police database.
  • Other or international :
    • enigma (GitHub): Enigma holds many public datasets from governments, companies, universities, and organizations. Enigma provides an API for data, metadata, and statistics on each of the datasets. enigma is an R client to interact with the Enigma API, including getting the data and metadata for datasets in Enigma, as well as collecting statistics on datasets. In addition, you can download a gzipped csv file of a dataset if you want the whole dataset. An API key from Enigma is required to use enigma.
    • hdr (GitHub) is an interface to United Nations Development Programme Human Development Report API.
    • IMF: both IMFData (on GitHub) and imfr (on GitHub) use the International Monetary Fund's API.
    • manifestoR is an R client to access data and documents of the manifesto project
    • muckrock (GitHub) offers data from MuckRock about public domain information on FOIA requests in the U.S.
    • oec: use the Observatory of Economic Complexity's API in R to download international trade data in csv and create and D3Plus visualizations.
    • OECD Search and extract data from the OECD (possibly via an old version of the API, which was in currently in beta when the package was written). See OECD data.
    • PolitwoopsR (not on CRAN): Extract deleted tweet and politician data from the Politwoops project (tracks politicians on Twitter and records their deleted tweets).
    • psData (GitHub) provides access to various commonly used political science datasets, especially those providing country-level, comparative data.
    • World Bank: wbstats can extract data from the World Bank Data API and the World Bank Data Catalog API. WDI can search, extract and format data from the World Bank's World Development Indicators.

Literature, Metadata, Text, and Altmetrics

  • alm: R wrapper to the almetrics API platform developed by PLoS.
  • aRxiv (GitHub): An R client for the arXiv API, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
  • bibliometrix can import bibliographic data from SCOPUS and ISI Web of Science.
  • boilerpipeR: Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library.
  • gutenbergr (Gitub) searches and downloads text from Project Gutenberg.
  • JSTORr (Not on CRAN): Simple text mining of journal articles from JSTOR's Data for Research service
  • lumendb can retrieve copyright takedown notices from Lumen Database (formerly, Chilling Effects).
  • ngramr (Archived on CRAN) retrieves and plot word frequencies through time from the Google Ngram Viewer.
  • pubmed.mineR: An R package for text mining of PubMed Abstracts. Supports fetching text and XML from PubMed. easyPubMed and rpubmed (not on CRAN) provide other tools.
  • rAltmetric: Query and visualize metrics from Altmetric.com.
  • rbhl: R interface to the Biodiversity Heritage Library (BHL) API.
  • RefManageR: Import and Manage BibTeX and BibLaTeX references with RefManager.
  • rentrez: Talk with NCBI entrez using R.
  • RMendeley: Implementation of the Mendeley API in R. Archived on CRAN. It's been archived on CRAN temporarily until pacakge is updated for the new Mendeley API.
  • rmetadata (not on CRAN): Get scholarly metadata from around the web.
  • rorcid (not on CRAN): A programmatic interface the Orcid.org API.
  • rplos: A programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.
  • rscopus (GitHub) is an interface to the Elsevier Scopus API.
  • scholar provides functions to extract citation data from Google Scholar. Convenience functions are also provided for comparing multiple scholars and predicting future h-index values.
  • The Sxslt (not on CRAN) package is an R interface to Dan Veillard's libxslt translator. It allows R programmers to use XSLT directly from within R, and also allows XSL code to make use of R functions.
  • tm.plugin.webmining: Extensible text retrieval framework for news feeds in XML (RSS, ATOM) and JSON formats. Currently, the following feeds are implemented: Google Blog Search, Google Finance, Google News, NYTimes Article Search, Reuters News Feed, Yahoo Finance and Yahoo Inplay.
  • biorxivr: interface with bioRxiv preprint server

Maps

  • FedData can download geospatial data from a number of U.S. and international data sources.
  • ggmap: Allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2.
  • leafletR: Allows you to display your spatial data on interactive web-maps using the open-source JavaScript library Leaflet.
  • osmar: This package provides infrastructure to access OpenStreetMap data from different sources to work with the data in common R manner and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
  • osrm: access OpenStreetMap
  • The R2GoogleMaps (not on CRAN) package - which is different from RgoogleMaps - provides a mechanism to generate JavaScript code from R that displays data using Google Maps.
  • rcanvec: Provides an interface to the National Topographic System (NTS), which is the way in which a number of freely available Canadian datasets are organized. CanVec and CanVec+ datasets, which include all data used to create Canadian topographic maps, are two such datasets that are useful in creating vector-based maps for locations across Canada.
  • RgoogleMaps: This package serves two purposes: It provides a comfortable R interface to query the Google server for static maps, and use the map as a background image to overlay plots within R.
  • The RKML (not on CRAN) is an implementation that provides users with high-level facilities to generate KML, the Keyhole Markup Language for display in, e.g., Google Earth.
  • RKMLDevice (not on CRAN) allows to create R graphics in KML format in a manner that allows them to be displayed on Google Earth (or Google Maps).
  • olctools Google Open Location Code
  • rydn (not on CRAN): R package to interface with the Yahoo Developers network geolocation APIs.
  • tigris (not on CRAN) can read US Census Bureau TIGRIS shapefiles.
  • USAboundaries spatial objects with the boundaries of states or counties in the United States of America from 1629 to 2000 (from the Atlas of Historical County Boundaries).

NCBI

  • hoardeR: Information retrieval from NCBI databases, with main focus on Blast.
  • NCBI2R: Annotates lists of SNPs and/or genes, with current information from NCBI. The CRAN version is archived.
  • rentrez (GitHub): Talk with NCBI Eutils API using R. This is probably the best package to interact with NCBI EUtils. You can get data across all the databases in NCBI EUtils.
  • reutils (GitHub): Interface with NCBI databases such as PubMed, Genbank, or GEO via the Entrez Programming Utilities (EUtils).
  • RISmed: Download content from NCBI databases. Intended for analyses of NCBI database content, not reference management. See rpubmed for more literature oriented stuff from NCBI.

News

  • GuardianR: Provides an interface to the Open Platform's Content API of the Guardian Media Group. It retrieves content from news outlets The Observer, The Guardian, and guardian.co.uk from 1999 to current day. rdian (GitHub) is another Guardian API client.
  • rtimes (not on CRAN): R client for the New York Times APIs, including the Congress, Article Search, Campaign Finance, and Geographic APIs.
  • ZEIT: diezeit waps the ZEIT online content API (K).

Other

  • boxoffice: daily box office information (how much each movie earned in theaters) using data from either Box Office Mojo or The Numbers.
  • datamart: Provides an S4 infrastructure for unified handling of internal datasets and web based data sources. Examples include dbpedia, eurostat and sourceforge.
  • genderizeR: Uses the genderize.io API to predict gender from first names extracted from a text vector. Source on GitHub
  • mstranslator: An R wrapper for the Microsoft Translator API. Source on GitHub
  • MBTAr: Access Data from the Massachusetts Bay Transit Authority (MBTA) Web API
  • rechonest (Github) is an interface to access Echo Nest API. This package can be used to access artists, songs and music genres related data. (K)
  • redcapAPI: Access data stored in REDCap databases using an API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. Source on GitHub.
  • RForcecom: RForcecom provides a connection to Force.com and Salesforce.com from R.
  • rwars (not on CRAN): A connector to the SWAPI service, a database of Star Wars metadata.
  • slackr: R client for Slack.com messaging platform. Source on GitHub
  • sos4R: R client for the OGC Sensor Observation Service.
  • stackr (not on CRAN): An unofficial wrapper for the read-only features of the Stack Exchange API.
  • TMDb can retrieve data from The Movie Database.
  • zendeskR: This package provides an R wrapper for the Zendesk API. ($)

Public Health

  • cdcfluview: (not on CRAN) R client for CDC FluView data (WHO and ILINet).
  • nhanesA Utility to retrieve data from the National Health and Nutrition Examination Survey (NHANES).
  • openfda (not on CRAN) is an R client for openFDA.
  • rClinicalCodes: R tools for integrating with the http://www.clinicalcodes.org web repository
  • rclinicaltrials (GitHub): ClinicalTrials.gov is a registry and results database of publicly and privately supported clinical studies of human participants conducted around the world. This is an R client for that data.
  • UScancer constructs U.S. cancer data at the county level from SEER, IARC, and the U.S. Census Bureau.
  • vaers (not on CRAN) provides U.S. vaccine adverse event data from the VAERS vaccine surveillance program. vaersvax provides a subset a subset of these data for three months of 2016. vaersNDvax provides non-domestic data for the same period.
  • WHO: WHO (GitHub) provides an interface to the World Health Organization API. rgho (GitHub) connects to the WHO Global Health Observatory data.

Social media

  • Facebook: Rfacebook provides an interface to the Facebook API. (K)
  • Google+: plusser has been designed to to facilitate the retrieval of Google+ profiles, pages and posts. It also provides search facilities. Currently a Google+ API key is required for accessing Google+ data. tuber provides bindings for YouTube API. Only on Github for now. (K)
  • RedditExtractoR can retrieve data from the Reddit API.
  • Rlinkedin: is an R client for the LinkedIn API.
  • tumblr: tumblR (GitHub): R client for the Tumblr API ( https://www.tumblr.com/docs/en/api/v2). Tumblr is a microblogging platform and social networking website https://www.tumblr.com. (K)
  • Twitter: RTwitterAPI (not on CRAN) and twitteR provide an interface to the Twitter web API. streamR: This package provides a series of functions that allow R users to access Twitter's filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported. (K) Additionally, RKlout is an interface to Klout API v2. It fetches Klout Score for a Twitter Username/handle in real time. Klout is a silly ranking of Twitter influence.
  • SocialMediaLab provides a convenient wrapper around many other social media clients and enables the construction of network structures from those data.
  • SocialMediaMineR is an analytic tool that returns information about the popularity of a URL on social media sites.

Social science

  • asdfree: analyze survey data for free (not a package) provides lots of code examples for analyzing survey data in R. Also on github.
  • brewdata Retrieves and parses graduate admissions survey data from the Grad Cafe website.
  • gdeltr2 (not on CRAN) connects to the The Global Database of Events, Language, and Tone.
  • gesis provides access to the Leibniz-Institute for the Social Sciences Data Catalogue/Datenbestandkatalog (DBK).
  • icpsrdata offers programmatic retrieval of datasets from the Inter-university Consortium for Political and Social Research archive.
  • maddison (GitHub) provides GDP per capita data for all years AD 1 to 2010 from the Maddison Project.
  • ONETr searches and retrieves occupational data from O*NET Online. Development version on GitHub here.
  • pewdata uses RSelenium to retrieve datasets from the webpages of the Pew Research Center.
  • psidR contains functions to download and format longitudinal datasets from the Panel Study of Income Dynamics (PSID).
  • wordbankr (GitHub) connects to Wordbank, a database of childrens' developmental vocabulary.
  • The Zillow (not on CRAN) package provides an R interface to the Zillow Web Service API. It allows one to get the Zillow estimate for the price of a particular property specified by street address and ZIP code (or city and state), to find information (e.g. size of property and lot, number of bedrooms and bathrooms, year built.) about a given property, and to get comparable properties.

Sports

  • abettor (not on CRAN): Online betting exchange, Betfair, API wrapper in R. (K)
  • ballr (not on CRAN) is a client for Basketball-Reference.com.
  • bbscrapeR (not on CRAN): Tools for Collecting Data from nba.com and wnba.com.
  • cricketr provides tools for working with the ESPN Cricinfo Statsguru. Source on GitHub.
  • fbRanks: Association Football (Soccer) Ranking via Poisson Regression - uses time dependent Poisson regression and a record of goals scored in matches to rank teams via estimated attack and defense strengths.
  • nflscrapR (not on CRAN) scrapes NFL data since 2009.
  • nhlscrapr: Compiling the NHL Real Time Scoring System Database for easy use in R.
  • pitchRx: Tools for Collecting and Visualizing Major League Baseball PITCHfx Data
  • fitbitScraper (GitHub) can retrieve Fitbit data, based on email/password authentication.
  • fantasysocceR (not on CRAN) connects to fantasy soccer data.
  • pinnacle.API A Wrapper for the Pinnacle Sports API
  • retrosheet (Github) retrieves single-season baseball statistics from http://www.retrosheet.org.
  • yorkr provides access to cricket data from Cricsheet.

Web Analytics

  • GTrendsR (not on CRAN): R functions to perform and display Google Trends queries. Another Github package (rGtrends) is now deprecated, but supported a previous version of Google Trends and may still be useful for developers.
  • rgauges (Archived on CRAN) This package provides functions to interact with the Gaug.es API. Gaug.es is a web analytics service, like Google analytics. You have to have a Gaug.es account to use this package. ($) (K)
  • RGA: Provides functions for accessing and retrieving data from the Google Analytics APIs. Supports OAuth 2.0 authorization. Also, the RGA package provides a shiny app to explore data. There is another R package for the same service (RGoogleAnalytics); see above entry. (K)
  • RGoogleAnalytics (GitHub) provides functions for accessing and retrieving data from the Google Analytics API. There is another R package for the same service (RGA); see previous entry. (K)
  • RGoogleTrends (not on CRAN) provides programmatic access to Google Trends data. This is information about the popularity of a particular query.
  • RSiteCatalyst: Functions for accessing the Adobe Analytics (Omniture SiteCatalyst) Reporting API.

Wikipedia/Wikimedia

  • wikipediatrend (removed from CRAN): Provides access to Wikipedia page access statistics.
  • WikipediR: WikipediR is a wrapper for the MediaWiki API, aimed particularly at the Wikimedia 'production' wikis, such as Wikipedia. Source on GitHub
  • ores connects to ORES, an automated tool for detecting whether Wikimedia page edits are constructive.
  • pageviews retrieves page view data from Wikimedia-powered sites, including Wikipedia.
  • WikidataQueryServiceR and rwikidata (not on CRAN): Request data from (and some day probably edit data in) Wikidata.org, the free knowledgebase; the former uses the query service.
  • WikidataR: An R API wrapper for the Wikidata store of semantic data. Source on GitHub.

CRAN packages:

Related links: