Skip to content
/ PPP Public

An R Package that downloads Paycheck Protection Program loan data and reads it in as a tidy data set

License

Notifications You must be signed in to change notification settings

kbmorales/PPP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPP

The goal of PPP is to download and assemble a unified, cleaned data set of Paycheck Protection Program loans issued in 2020.

The data is too large to share on GitHub, but this package will allow you to recreate the data locally.

The work for this package grew out of a project with DataKind’s DC chapter and the National Press Foundation.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("kbmorales/PPP")

Assembling PPP data

# Note: this function will download over 600MB of data to your local machine,
# and read a large dataframe into memory!
ppp_data = ppp_assemble()

Data Sources

PPP Data Dictionary

Though there are two versions of the PPP data released, both will have the same final structure as detailed below. The main structual difference between these versions is the removal of a JobsRetained variable, and the addition of JobsReported. Whether or not this is a simple semantic change is unknown at this point.

The final PPP dataset output by ppp_assemble() will contain the following columns:

variable original notes
LoanRange Yes no values for loan amounts under 150K
BusinessName Yes no values for loan amounts under 150K
Address Yes no values for loan amounts under 150K
City Yes
State Yes
Zip Yes
NAICSCode Yes
BusinessType Yes
RaceEthnicity Yes 89.3% “Unanswered”
Gender Yes 77.7% “Unanswered”
Veteran Yes 84.7% “Unanswered”
NonProfit Yes
JobsRetained Yes only in 2020-07-06 release
JobsReported Yes only in 2020-08-08 release
DateApproved Yes
Lender Yes
CD Yes
LoanAmount Yes no values for loan amounts over 150K
source_file No file name for where data was pulled
version No release date of PPP data
LoanRange_Unified No Loan ranges regardless of amount
JobsRetained_Grouped No brackets for # of ‘jobs retained’
JobsReported_Grouped No brackets for # of ‘jobs reported’
LoanRangeMin No minimum possible loan value
LoanRangeMax No maximum possible loan value
LoanRangeMid No middle estimated loan value
naics_lvl_1 No Most general NAICS industry class
naics_lvl_2 No Second level NAICS industry class
naics_lvl_3 No Third level NAICS industry class
naics_lvl_4 No Forth level NAICS industry class
naics_lvl_5 No Most specific NAICS industry class
NAICS_version No version of NAICS where code was found
NAICS_valid No was NAICS code matched?

About

An R Package that downloads Paycheck Protection Program loan data and reads it in as a tidy data set

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages