CT Name Cleaner
Resolve village and coloquial Connecticut town names, as well as common misspellings of Connecticut town names to their official town names.
This is based on an R package of the same name by my colleague Andrew Ba Tran.
This installs a command line script, ctclean, as well as a library
by Jake Kara, email@example.com
pip install ctnamecleaner
Command line util
$ ctclean New\ Preston WASHINGTON $ ctclean "New Preston" WASHINGTON
When nothing is found, return None:
$ ctclean NotGonnaFindItsVille None
Set a custom value to return on error with the --error or -e flag:
$ ctclean NotGonnaFindItsVille --error "Ruh Roh" Ruh Roh
Use with Pandas dataframes
See the demo/ folder in this repo for an example of translating an entire column with the Lookup.clean_dataframe() method. It uses pandas' DataFrame.join() method, so it's faster than using the Lookup.cean() method and applying it with a lambda function yourself.
Extending with other data
Not in CT? Want to map other things, like population? Just make a spreadsheet and put it anywhere, online or locally, that Pandas .read_csv() can open.
You can specify a spreadsheet (local or remote) to use as the lookup table when you instantiate a Lookup object. You have to specify a path to the sheet as well as the name of the raw name column and the clean name column.
>>> l = lookup.Lookup(csv_url="http://path/to/your/sheet", raw_name_col="something", clean_name_col="something_else")