forked from votinginfoproject/pythonvalidator
-
Notifications
You must be signed in to change notification settings - Fork 0
micahstubbs/pythonvalidator
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
filevalidator.py is a python script to validate xml feeds. It accepts a list of files or a directory and a series of other parameters, and will validate either xml documents or archived versions of xml documents. All documents to be validated must end with ".xml" (in the case of archived files, the base file much end with ".xml", the file itself can be archived multiple times and may be within an archived folder though). The script creates a directory for validation logs, the default being the current directory, where the logs are initially outputted for each feed, and then archived. After validation is done, all extraneous files (such as extra log data and extracted archived files) are removed from the system. ************************** Command Line Parameters ************************** The following command line parameteres are accepted: -v Version number of the schema to validate against, defaults to 2.3 -s Schema file to validate, only necessary when you have a specific schema file you want to validate a feed against or have no internet connection to pull the version file from google, will default to using the version number pulled from google -d Directory to scan for xml feeds/archived files. Will search through all subdirectories and validate the files within. Can be combined with a list of files -o Output location for feed logs. Will create the folder if necessary and all feed validation logs will be contained within this folder, zipped up based on feed. Defaults to current working directory + "/feedlogs" -f File list of files to validate, can be combined with a directory Either a file or directory is required for the validation to work correctly. ************************** Semantic Validation ************************** The script will start be running a basic validation of the file against the schema, returning True if it is valid and false otherwise. After this the script runs through a semantic check where the following are validated: Warnings: Fields with tags but no values State abbreviations that are not 2 characters long Urls that do not match the given regular expression Errors: Non-numeric starting and ending house numbers Ending house numbers that are either 0 or are less than the starting house number Zip codes that are invalid (both 5 digit and 9 digit zip codes are accepted) Emails that are invalid Localities that do not match any of the types in the specification Total votes that are not equal to valid votes + overvotes + blank votes Fields supposed to be integers according to the schema that throw an error ************************** Street Segment Validation ************************** After the semantic validation, validation is done against street segments to check for no duplicate values with housing numbers and precincts, such as housing numbers included within two different street segment sets. The following are validated Warnings: House numbers that overlap but contain consistent precinct IDs Errors: House number that overlap and contain different precinct information Street segements with missing required fields Duplicates: Street Segments with different id's that contain all the same data (address, start/end house number, polling location, etc) ************************** Full Required Validation ************************** The final validation check is to cycle through all fields for required values that may be missing. The first validation check will stop after finding the first error, this check will find all of those errors and return them to the user. This will only be run if the first validation returned false. This will create an error file listing all missing fields and their element id's
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 100.0%