Skip to content
Guy Yollin edited this page Mar 18, 2015 · 7 revisions

WRDS: A package to perform ECTL (Extract-Clean-Transform-Load) of Compustat and CRSP data from WRDS

Summary: Create a new R package for working with data from the CRSP/Compustat Merged (CCM) database from Wharton Research Data Services (WRDS).

Description: WRDS (https://wrds-web.wharton.upenn.edu/wrds/) is a web-based business data research service from The Wharton School at the University of Pennsylvania. It is a common portal for accessing the Compustat database of corporate fundamental data and the CRSP database of security prices and returns. This data is typically downloaded in large flat files which need subsequent ECTL operations performed prior to the data being usable for research and modeling. The wrds package is intended to automate and simply this ECTL process thus significantly minimizing the time and effort required to begin research using CCM data.

Related work: There are other projects to facilitate the downloading of data from WRDS via MATLAB (http://www.mathworks.com/matlabcentral/fileexchange/48333-okomarov-wrds) and Python (https://github.com/edwinhu/wrds), however, the authors are not aware of an R package that supports this process.

Potential tasks:

  • Function(s) for downloading CCM data from WRDS
  • Function(s) for inserting downloaded CCM data into a local SQLite database
  • Function(s) for extracting CCM data from the local SQLite database based on asset class, index constituent, etc.
  • Function(s) for aggregating/disaggregating and aligning data to different frequencies (i.e. quarterly, monthly, weekly)
  • Function(s) for interpolating missing data

WRDS requirement: Student must have access to Wharton Research Data Services through their university; unfortunately, project mentors cannot provide this required resource.

Skills required:

  • Knowledge of R and R package development
  • Ability to document R functions and data via Roxygen2
  • Ability to work with version control systems (R-forge/Github)
  • Ability to write clear vignettes demonstrating function usage

Test: TBD

Mentor: Guy Yollin ([@](mailto:gyollin {at} uw {dot} edu))