structure: how to report path to invalid data element #47

vlcinsky · 2018-10-24T21:21:10Z

cattrs version: 0.9.0
Python version: 3.6.ř
Operating System: Debian 9

Description

I want to cattrs to load complex nested data and in case some validation/conversion fails, I want to provide reasonable context information about what part of data did not work properly.

What I Did

Having attrs based classes: Config with attributes source, fetch and publish, each holding value of specific (attrs based) class Source, Fetch and Publish.

If some data element is wrong (e.g. expecting integer and providing string "5a"), the structure process fails raising ValueError("could not convert string to float: '5a'",)

However, the error does not include any contextual information about where in my nested input the problem was read from.

It would be nice to get some sort of path in the exception, which I could use. marshmallow and trafaret are examples of similar solutions providing contextual information.

The text was updated successfully, but these errors were encountered:

vlcinsky · 2018-10-25T10:16:16Z

Here is possibly quite crazy idea how to report an error incl. path within input data leading to the failure.

Requirements:

initially focus only on structure and assume input data in form of dict
allow reporting path to input data element causing raised conversion error
keep required changes to code (structure_hook functions) to bare minimum
tolerate structure_hook implementation not implementing new approach (possibly at cost of loosing part of path information)
run fast, try to avoid any extra operations during happy scenario

Concept:

path has form of list of __getitem__ arguments, e.g. ["oak", 1] data["oak"][1]
no need to cover non-iterable data types
focus on iterables. Store current position of iteration in local variable with agreed name cattrs_i. This is the only required change to structure_hook function implementation.
all path detection to be done within cattr.structure function
- detection is done by traversing traceback stack, inspecting local variables and collecting all values of cattrs_i variables in resulting path list.
- store the path in .path exception property and raise the catched exception

Here is code to demonstrate how to detect the path from an exception raised in deeply nested call. If you store the code into test_path_detection.py, it shall be executable using pytest (expecting python 3.6+).

def int_structure_hook(val, dtype):
    return int(val)


def list_structure_hook(lst, dtype):
    return [int_structure_hook(itm, int) for cattrs_i, itm in enumerate(lst)]


def dict_structure_hook(dct, dtype):
    return [list_structure_hook(val, list) for cattrs_i, val in dct.items()]


def structure(val):
    try:
        return dict_structure_hook(val, dict)
    except ValueError as exc:
        path = []
        tb = exc.__traceback__
        while tb:
            path_elm = tb.tb_frame.f_locals.get("cattrs_i")
            if path_elm:
                path.append(path_elm)
            tb = tb.tb_next
        exc.path = path
        raise exc


def test_it():
    try:
        res = structure({"oak": [1, "0aa", 3], "birch": [9, 2, 0]})
        print(f"Happy result is: {res}")
    except ValueError as exc:
        print(f"Path {exc.path}: has problem: {exc}")

When called:
$ pytest test_path_detection.py -sv
the printed output related to reported path is

Path ['oak', 1]: has problem: invalid literal for int() with base 10: '0aa'

What do you think of that? No perfect results, but something, what helps navigating close to source of problem in many cases. Definitely would require (small) modifications in existing converters.

Tinche · 2018-10-25T13:35:43Z

This is something I would definitely like to support, since getting an error somewhere deep can be very annoying indeed. Need to think about it.

vlcinsky · 2018-10-25T14:37:51Z

@Tinche take your time, it is not an easy problem.

Here is alternative method: pass path via explicit argument to conversion function:

"""alternative passing path context via argument `path`

Converters have singature: func(val, dtype, *path)
where `path` is the path to the current element (list of values)

When calling, one uses original `path` value with * and adds new selector to the end

fun(val, dtype, *path, index)

what results in extended `path` value within the deeper function.
"""


def int_structure_hook(val, dtype, *path):
    return int(val)


def list_structure_hook(lst, dtype, *path):
    return [int_structure_hook(itm, int, *path, i) for i, itm in enumerate(lst)]


def dict_structure_hook(dct, dtype, *path):
    return [list_structure_hook(val, list, *path, key) for key, val in dct.items()]


def structure(val, dtype):
    try:
        return dict_structure_hook(val, dict)
    except ValueError as exc:
        path = []
        tb = exc.__traceback__
        while tb:
            deeper_path = tb.tb_frame.f_locals.get("path")
            if deeper_path:
                path = deeper_path
            tb = tb.tb_next
        exc.path = path
        raise exc


def test_it():
    try:
        res = structure({"oak": [1, "0aa", 3], "birch": [9, 2, 0]}, dict)
        print(f"Happy result is: {res}")
    except ValueError as exc:
        print(f"Path {exc.path}: has problem: {exc}")
        assert exc.args[0] == "invalid literal for int() with base 10: '0aa'"
        assert isinstance(exc, ValueError)
        assert exc.path == ("oak", 1)

To avoid confusion with intermediate functions using path argument, traversing __traceback__ may check, that givel locals are within function which is registered at converter.

petergaultney · 2018-12-13T13:15:41Z

Incidentally, one of the hardest things to debug is when you have a NoneType that can't be converted into whatever the expected type is. Without a path, currently there's no way to even guess at which of the many nulls in your input it's failing on.

madsmtm · 2019-01-24T20:12:38Z

We could copy a few ideas from jsonschema, and potentially yield errors iteratively? Or maybe that's not really in scope, since cattrs need to return the new result.

But have a look at their ValidationError, there's a few fields there that we could potentially use.

Tmpod · 2020-11-16T16:32:19Z

Hey, sorry for kinda necro-post, but have there been any progress on this? It would really be very handy to have this feature :)

Tinche · 2020-11-19T01:02:07Z

This is probably the next big feature I work on :)

Tmpod · 2020-11-22T15:44:54Z

Nice to hear it! Sorry for the question, but do you have any ETA for it?

Tinche · 2023-07-08T01:08:35Z

So there's https://catt.rs/en/stable/validation.html#transforming-exceptions-into-error-messages in the last release, 23.1.x.

I'm going to close this as complete, let's open new tickets for any desired improvements!

Tinche closed this as completed Jul 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structure: how to report path to invalid data element #47

structure: how to report path to invalid data element #47

vlcinsky commented Oct 24, 2018

vlcinsky commented Oct 25, 2018

Tinche commented Oct 25, 2018

vlcinsky commented Oct 25, 2018

petergaultney commented Dec 13, 2018

madsmtm commented Jan 24, 2019 •

edited

Tmpod commented Nov 16, 2020

Tinche commented Nov 19, 2020

Tmpod commented Nov 22, 2020

Tinche commented Jul 8, 2023

structure: how to report path to invalid data element #47

structure: how to report path to invalid data element #47

Comments

vlcinsky commented Oct 24, 2018

Description

What I Did

vlcinsky commented Oct 25, 2018

Tinche commented Oct 25, 2018

vlcinsky commented Oct 25, 2018

petergaultney commented Dec 13, 2018

madsmtm commented Jan 24, 2019 • edited

Tmpod commented Nov 16, 2020

Tinche commented Nov 19, 2020

Tmpod commented Nov 22, 2020

Tinche commented Jul 8, 2023

madsmtm commented Jan 24, 2019 •

edited