Column-to-Field mapping strategy #123

roll · 2016-10-09T17:41:28Z

Overview

We have a list of columns and list of fields. There are two ways to link them:

based on order (I suppose it's how JTS says we should do)
based on name

There is a check for it - https://github.com/frictionlessdata/goodtables-py/blob/next/goodtables/checks/head/non_matching_header.py - any logic could be implemented.

For now it uses current goodtables approach checking names using order but removing column from inspection if there is non matching name. Also there is not implemented order_fields option. But this approach could lead to problems described below. It could be solved removing whitespaces from names.

But also may be instead of order_fileds to use option to set mapping strategy:

map fields to columns using order
map fields to columns using names

Problems

@rgrp has wrote:

e.g. column named "Number" and JTS has " Number" (edt: see space!) then number column is ignored.

Qu: Why do we even use column names vs JTS names - can't we use JTS order in checking or is the idea we could validate arbitrary JSON data?

Tasks

For now non_matching_headers(order_fields=True) raises NotImplementedError but it should re-order fields if order_fields option is True. Needed some tricky algorithm because headers and fields could be duplicated etc. [IF NEEDED]

The text was updated successfully, but these errors were encountered:

rufuspollock · 2016-10-14T11:22:05Z

@roll i get this is tricky. For example, we'd generally like to be able to validate a stream of rows where each row is either an array or a dict (basically either what you get from python csv.reader or csv.DictReader ...). My point in the original issue is not that we must accomodate order stuff but just that we should not ignore something if not there - if we expected a column called name and not there i think we should warn or error.

roll · 2016-10-31T12:50:50Z

So goodtables.next is going to do the next thing based on raised problem:

if there is any difference between header and field_name it will add non-matching-header error
but if slugified names are the same ("Number" and "/space/Number" -> "number") it will NOT stop the future inspection of the field (for other errors)

Also order_fields options will force the system to re-order fields as much as possible close to headers before check above.

* fixed column producing for body context * removed column from schema checks only if name slugs are different * implemented order_fields algo * improved comments

@check

* Rebased on goodtables.next codebase (#118) * removed current codebase * added updated codebase * fixed linting * updated readme * updated readme * updated readme * fixed source checks * added dataset checks * min style change * removed ecode filter from filter_checks * renamed cells back to columns + row_number * added dataset check stubs * implemented dataset checks * fixed linting * moved __inspect_table next to inspect for better reading * fixed list.clear for python2 * added error limit to dataset errors * updated readme * updated readme * added breaking note to readme * added custom checks support * implemented custom profiles * fixed linting * added options order_fields and infer_fields * fixed extra_header * added comments * updated spec * renamed unordered_headers to non_matching_header * renamed col-number to column-number * min * updated added dataset errors to readme * splitted error and check concpets * fixed linting * min * fixed readme * updated readme * updated readme * fixed readme * fixed readme * added guard assertion to checks * updated custom checks API * fixed linting, readme * fixed linting, readme * updated readme * updated readme * typo * moved table errors to Inspector, deleted checks * added ability to profilies to return errors * fixed head checks not columns break * rebased check on in-place erorrs update * rebased profiles on in-place errors * fixed readme * fixed readme * fixed readme * fixed readme * fixed readme * no extra-header error if infer_fields is True * implemented proper non-matching-header without ordering * added custom_profiles, custom_checks arguments instead of global registry * moved default args to signatures * removed make release (use github releases instead!) * added return code to cli * improved cli error formatting * improved error messages, tests * updated examples * added custom examples * added inspector tests * added limit tests * fixed tests * fixed spec link * added checks options to cli * fixed profiles * added description to setup * added entry_points, keywords * moved ckan profile to examples * removed report from spec * updated version to v1 * updated install instruction for now * Fixed jsontableschema-error message (#133) * Rebased on granular tabulator exceptions (#115) * updated dependencies * rebases on new tabulator exceptions * Renamed profile to preset with simplified API (#124) * renamed profile to preset * removed errors, tables arguments from preset * Added infer_schema option, updated preset API (#128) * minor improvements * added infer_schema option false by default * Added support for schema constraints (#55) * updated jsontableschema version * implemented all constraints except unique * implemented unique constraint check * updated to jsontableschema-v0.8.2 * fixed linting * Added tables preset (#125) * added tables preset * fixed linting * added tables test * Implemented order_fields option (#123) * fixed column producing for body context * removed column from schema checks only if name slugs are different * implemented order_fields algo * improved comments * Rebased on external spec (#131) Rebased on external spec * Improved tests (#127) * added prev version data examples * implemented feature tests * moved files to data * updated verson * added features * marked spec test as xfail * Rebased on spec-v1.0.0-alpha1 (#131) * updated spec, added spec to API * added config with checks order * rebased in inspector on updated spec * updated @check API * rebased on spec message templates * fixed cutom checks * fixed linting * Updated readme note

roll added bug Something isn't working branch-next labels Oct 9, 2016

roll added this to the goodtables-v1 milestone Oct 9, 2016

roll self-assigned this Oct 9, 2016

roll changed the title ~~Implement non-matching-header with order_fields=True~~ Implement non-matching-header check with order_fields=True Oct 9, 2016

roll changed the title ~~Implement non-matching-header check with order_fields=True~~ Implement non-matching-header with order_fields=True Oct 9, 2016

roll added the priority label Oct 9, 2016

roll changed the title ~~Implement non-matching-header with order_fields=True~~ Column-to-Field mapping strategy Oct 14, 2016

roll mentioned this issue Oct 14, 2016

If JTS field names differ from column names then columns are not validated #94

Closed

roll added priority general General improvements and removed priority bug Something isn't working labels Oct 18, 2016

roll mentioned this issue Oct 31, 2016

Implemented order_fields option #143

Merged

roll added review and removed current labels Oct 31, 2016

roll pushed a commit that referenced this issue Oct 31, 2016

Implemented order_fields option (#123)

65691f5

* fixed column producing for body context * removed column from schema checks only if name slugs are different * implemented order_fields algo * improved comments

roll closed this as completed Oct 31, 2016

roll removed the review label Oct 31, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column-to-Field mapping strategy #123

Column-to-Field mapping strategy #123

roll commented Oct 9, 2016 •

edited

Loading

rufuspollock commented Oct 14, 2016

roll commented Oct 31, 2016 •

edited

Loading

Column-to-Field mapping strategy #123

Column-to-Field mapping strategy #123

Comments

roll commented Oct 9, 2016 • edited Loading

Overview

Problems

Tasks

rufuspollock commented Oct 14, 2016

roll commented Oct 31, 2016 • edited Loading

roll commented Oct 9, 2016 •

edited

Loading

roll commented Oct 31, 2016 •

edited

Loading