validate data stored in CSV, PRN, ODS or Excel files
Python Shell
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.idea/scopes
cutplace
docs
examples
tests
.coveragerc
.gitattributes
.gitignore
.landscape.yaml
.pre-commit-config.yaml
.travis.yml
AUTHORS.rst
LICENSE.txt
README.rst
build.xml
requirements-dev.txt
requirements.txt
setup.cfg
setup.py
tox.ini

README.rst

Build Status Test coverage Code Health

Cutplace is a tool and API to validate that tabular data stored in CSV, Excel, ODS and PRN files conform to a cutplace interface definition (CID).

As an example, consider the following customers.csv file that stores data about customers:

customer_id,surname,first_name,born,gender
1,Beck,Tyler,1995-11-15,male
2,Gibson,Martin,1969-08-18,male
3,Hopkins,Chester,1982-12-19,male
4,Lopez,Tyler,1930-10-13,male
5,James,Ana,1943-08-10,female
6,Martin,Jon,1932-09-27,male
7,Knight,Carolyn,1977-05-25,female
8,Rose,Tammy,2004-01-12,female
9,Gutierrez,Reginald,2010-05-18,male
10,Phillips,Pauline,1960-11-09,female

A CID can describe such a file in an easy to read way. It consists of three sections. First, there is the general data format:

  Property Value
D Format Delimited
D Encoding UTF-8
D Header 1
D Line delimiter LF
D Item delimiter ,

Next there are the fields stored in the data file:

  Name Example Empty Length Type Rule
F customer_id 3798     Integer 0...99999
F surname Miller   ...60    
F first_name John X ...60    
F date_of_birth 1978-11-27     DateTime YYYY-MM-DD
F gender male X   Choice female, male

Optionally you can describe conditions that must be met across the whole file:

  Description Type Rule
C customer must be unique IsUnique customer_id

The CID can be stored in common spreadsheet formats, in particular Excel and ODS, for example cid_customers.ods.

Cutplace can validate that the data file conforms to the CID:

$ cutplace cid_customers.ods customers.csv

Now add a new line with a broken date_of_birth:

73921,Harris,Diana,04.08.1953,female

Cutplace rejects this file with the error message:

customers.csv (R12C4): cannot accept field 'date_of_birth': date must match format YYYY-MM-DD (%Y-%m-%d) but is: '04.08.1953'

Additionally, cutplace provides an easy to use API to read and write tabular data files using a common interface without having to deal with the intrinsic of data format specific modules. To read and validate the above example:

import cutplace
import cutplace.errors

cid_path = 'cid_customers.ods'
data_path = 'customers.csv'
try:
    for row in cutplace.rows(cid_path, data_path):
        pass  # We could also do something useful with the data in ``row`` here.
except cutplace.errors.DataError as error:
    print(error)

For more information, read the documentation at http://cutplace.readthedocs.org/ or visit the project at https://github.com/roskakori/cutplace.