Skip to content

shanejorr/LSACscrapeR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrape LSAC admissions files

R package that scrapes single page LSAC CAS Summar Report PDF files and inserts the results into a list containing multiple R data frames.

Installation

The packagage can be installed from GitHub:

devtools::install_github('shanejorr/LSACscrapeR')

Usage

The package has one function, scrape_lsac_report(). This function returns a list, with each element of the list being a data frame corresponding to an element of information from the LSAC CAS summary report.

Users can then export this list of data frames to the medium of their choosing. For example, the following code blocks convert the data frames to a series of csv files and to an in memory SQLite database.

Converting data frames to csv files

This section is common to the csv and SQLite sections.

# common code for both csv and SQLite sections

# file name to single file that contains all PDF CAS summary reports for 2018
file_name <- "cas_report_2018.pdf"

#extract information from pdf files into a list of data frames
apps <- scrape_lsac_report(file_name)

Now, we'll create separate csv files for each data frame.

# iterate through every data frame, outputting as csv file
walk(names(apps),
     function(x) {
        write_csv(apps[[x]], 
                 path = str_c(x, ".csv"))
      })

Converting data frames to in memory SQLite data base

# create database connection and database file
con <- dbConnect(RSQLite::SQLite(), ":memory:")
  
# iterate through each table, adding to database
walk(names(apps),
      function(x) {
        dbWriteTable(con, value = apps[[x]], 
                     name = x, row.names = FALSE, 
                     overwrite = FALSE, append = TRUE)
      })
      
dbDisconnect(con)

Fields:

  • Document Type: CAS
  • Assembly Output Format: PDF
  • Eapps or CAS Reports first?: CAS Reports
  • Bookmark Order Settings: Standard
  • JD/CAS Doc Type: Report Summary

About

R package that scrapes single page LSAC CAS PDF files and inserts the results into a list containing multiple R data frames.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages