3 types of use case:
- Report and manage issues - requires sharing
- Error logging and analytics - why share?
- Data quality reporting for data publishers - requires sharing
- Establish transparency about errors in open data apps (data quality reports)
Store/Log Low-level Errors
As a Data Wrangler I want to Pipe my errors into an online system so that I can review them later
- What are errors? Example: Cell B20 in sheet X is empty and should be float
Associate an Error with an Issue
As a .... I want to have an error or set of errors associated with an issue (automatically?) so that fix them in bulk
Generate a Report of Errors
As a ... I want to generate an aggregate report of all the errors on a task and their associated issues so that I can see patterns
Create an Issue
As a Data User I want to report a problem with a dataset so that it can be fixed by the owner and I can see that it was fixed (or not!)
- An Issue can be an Error (as above) but also can be higher level - e.g. all dates are in yyyy-dd-mm format rather than yyyy-mm-dd
Close an Issue
As a Task Owner I want to close an issue so that I can indicate its fixed (or that it won't be fixed etc)
Be Notified of an update to an Issue
- repo_url -
- dataset_url -
- run_id -
Info on the actual error:
- record_id - row number in most cases
- source_path - input file name
- dest_path - output file name - ??
- source_field/attribute -
- dest_field - ??
- query (xpath, sql) - when you do scraping you have xpath or css selector etc
- value - erroneous value
- level - debug, info, error, warning
- error_type - ValidationError, TypeError, ValueError, ...
- message - JSON structured message with more info?
status = 'open', 'closed'
action = comment || closing || reopening