Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Crump

Known Vulnerabilities

A parser for the Virginia State Corporation Commission's business entity records, which are provided as a single, enormous fixed-width file. Named for Beverley T. Crump, the first member of the State Corporation Commission.

Crump retrieves the current SCC records (updated weekly) and turns them into CSV and JSON. Alternately, it can improve the quality of the data (formatting dates, ZIP codes, replacing internal status codes with human-readable translations, etc.), atomize the data into millions of individual JSON files, or create Elasticsearch-compatible bulk API data.

The most recent copy of the raw SCC data can be found at https://s3.amazonaws.com/virginia-business/current.zip.

Usage

usage: crump [-h] [-a] [-i file.txt] [-o output_dir] [-t] [-d] [-e] [-m]

optional arguments:
  -h, --help            show this help message and exit
  -a, --atomize         generate millions of per-record JSON files
  -i file.txt, --input file.txt
                        raw SCC data (default: cisbemon.txt)
  -o output_dir, --output output_dir
                        directory for JSON and CSV
  -t, --transform       format properly date, ZIP, etc. fields
  -d, --download        download the data file, if missing
  -e, --elasticsearch   create Elasticsearch bulk API data
  -m, --map             generate Elasticsearch index map

For general purposes, ./crump -td is probably the best way to invoke Crump. This will download the current data file and transform the data to make it adhere to basic data quality norms.

License

Released under the MIT License.

About

A parser for the Virginia State Corporation Commission's business registration records.

Topics

Resources

License

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.