-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
28 lines (21 loc) · 1.69 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
EXECUTIVE BRANCH PRIVATELY FUNDED TRAVEL PARSER
By Luke Rosiak
The Washington Times
Released into the public domain
WARNING: THE DATA OUTPUTTED INTO processed.py BY THIS UTILITY MAY CONTAIN ERRORS, AND MAY OMIT SOME RECORDS.
It is a rough utility intended to make dealing with hundreds of variously formatted .xls and .xlsx forms detailing privately-funded travel
of government officials in federal agencies easier (many of them are blank or contain many tabs, so they're very cumbersome).
The links to the original Excel spreadsheets are included on each row so you can verify
that the utility didn't somehow combine two records or something.
Improvements and error checking are appreciated. Additionally, it omits PDFs, the addition of which would also be appreciated.
Take a careful look at a handful of different Office of Government Ethics travel forms - even those filed on similar dates and with the
same file extension - and you'll see why this is less than trivial... there are many different permutations of the form. My spot-checking didn't
reveal any errors, but they could be there. I am pretty certain, meanwhile, that there are records that are lost altogether when trying to shape the
all.csv into the processed.csv.
The files are here:
https://explore.data.gov/Federal-Government-Finances-and-Employment/OGE-Travel-Reports/kxfh-um2n
INSTRUCTIONS:
*Run python download.py
This will download all of the Excel files to your hard drive in the files/ directory, and contatenate all their contents into one CSV called all.csv for convenience.
*Run python parseall.py
This will turn all.csv, which is a mess of all kinds of differently-structured forms, into processed.csv, a flat, uniformly-structured CSV.