Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
EXECUTIVE BRANCH PRIVATELY FUNDED TRAVEL PARSER By Luke Rosiak The Washington Times Released into the public domain WARNING: THE DATA OUTPUTTED INTO processed.py BY THIS UTILITY LIKELY OMITS SOME RECORDS. The accuracy of the processed records is also not guaranteed. It is intended to make dealing with hundreds of variously formatted .xls and .xlsx forms detailing privately-funded travel of government officials in federal agencies easier (many of them are blank or contain many tabs, so they're very cumbersome). The links to the original Excel spreadsheets are included on each row so you can verify that the utility didn't somehow combine two records or something. Improvements and error checking are appreciated. Additionally, it omits PDFs, the addition of which would also be appreciated. Take a careful look at a handful of different Office of Government Ethics travel forms - even those filed on similar dates and with the same file extension - and you'll see why this is less than trivial... there are many different permutations of the form. My spot-checking didn't reveal any errors, but they could be there. I am pretty certain, meanwhile, that there are some records that are lost altogether when trying to shape the all.csv into the processed.csv because they depart from the major formats. The original files, from the new Ethics.gov, are here: https://explore.data.gov/Federal-Government-Finances-and-Employment/OGE-Travel-Reports/kxfh-um2n INSTRUCTIONS: *Run python download.py This will download all of the Excel files to your hard drive in the files/ directory, and contatenate all their contents into one CSV called all.csv for convenience. *Run python parseall.py This will turn all.csv, which is a mess of all kinds of differently-structured forms, into processed.csv, a flat, uniformly-structured CSV.