Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

structure: how to report path to invalid data element #47

Closed
vlcinsky opened this issue Oct 24, 2018 · 9 comments
Closed

structure: how to report path to invalid data element #47

vlcinsky opened this issue Oct 24, 2018 · 9 comments

Comments

@vlcinsky
Copy link

  • cattrs version: 0.9.0
  • Python version: 3.6.ř
  • Operating System: Debian 9

Description

I want to cattrs to load complex nested data and in case some validation/conversion fails, I want to provide reasonable context information about what part of data did not work properly.

What I Did

Having attrs based classes: Config with attributes source, fetch and publish, each holding value of specific (attrs based) class Source, Fetch and Publish.

If some data element is wrong (e.g. expecting integer and providing string "5a"), the structure process fails raising ValueError("could not convert string to float: '5a'",)

However, the error does not include any contextual information about where in my nested input the problem was read from.

It would be nice to get some sort of path in the exception, which I could use. marshmallow and trafaret are examples of similar solutions providing contextual information.

@vlcinsky
Copy link
Author

Here is possibly quite crazy idea how to report an error incl. path within input data leading to the failure.

Requirements:

  • initially focus only on structure and assume input data in form of dict
  • allow reporting path to input data element causing raised conversion error
  • keep required changes to code (structure_hook functions) to bare minimum
  • tolerate structure_hook implementation not implementing new approach (possibly at cost of loosing part of path information)
  • run fast, try to avoid any extra operations during happy scenario

Concept:

  • path has form of list of __getitem__ arguments, e.g. ["oak", 1] data["oak"][1]
  • no need to cover non-iterable data types
  • focus on iterables. Store current position of iteration in local variable with agreed name cattrs_i. This is the only required change to structure_hook function implementation.
  • all path detection to be done within cattr.structure function
    • detection is done by traversing traceback stack, inspecting local variables and collecting all values of cattrs_i variables in resulting path list.
    • store the path in .path exception property and raise the catched exception

Here is code to demonstrate how to detect the path from an exception raised in deeply nested call. If you store the code into test_path_detection.py, it shall be executable using pytest (expecting python 3.6+).

def int_structure_hook(val, dtype):
    return int(val)


def list_structure_hook(lst, dtype):
    return [int_structure_hook(itm, int) for cattrs_i, itm in enumerate(lst)]


def dict_structure_hook(dct, dtype):
    return [list_structure_hook(val, list) for cattrs_i, val in dct.items()]


def structure(val):
    try:
        return dict_structure_hook(val, dict)
    except ValueError as exc:
        path = []
        tb = exc.__traceback__
        while tb:
            path_elm = tb.tb_frame.f_locals.get("cattrs_i")
            if path_elm:
                path.append(path_elm)
            tb = tb.tb_next
        exc.path = path
        raise exc


def test_it():
    try:
        res = structure({"oak": [1, "0aa", 3], "birch": [9, 2, 0]})
        print(f"Happy result is: {res}")
    except ValueError as exc:
        print(f"Path {exc.path}: has problem: {exc}")

When called:
$ pytest test_path_detection.py -sv
the printed output related to reported path is

Path ['oak', 1]: has problem: invalid literal for int() with base 10: '0aa'

What do you think of that? No perfect results, but something, what helps navigating close to source of problem in many cases. Definitely would require (small) modifications in existing converters.

@Tinche
Copy link
Member

Tinche commented Oct 25, 2018

This is something I would definitely like to support, since getting an error somewhere deep can be very annoying indeed. Need to think about it.

@vlcinsky
Copy link
Author

@Tinche take your time, it is not an easy problem.

Here is alternative method: pass path via explicit argument to conversion function:

"""alternative passing path context via argument `path`

Converters have singature: func(val, dtype, *path)
where `path` is the path to the current element (list of values)

When calling, one uses original `path` value with * and adds new selector to the end

fun(val, dtype, *path, index)

what results in extended `path` value within the deeper function.
"""


def int_structure_hook(val, dtype, *path):
    return int(val)


def list_structure_hook(lst, dtype, *path):
    return [int_structure_hook(itm, int, *path, i) for i, itm in enumerate(lst)]


def dict_structure_hook(dct, dtype, *path):
    return [list_structure_hook(val, list, *path, key) for key, val in dct.items()]


def structure(val, dtype):
    try:
        return dict_structure_hook(val, dict)
    except ValueError as exc:
        path = []
        tb = exc.__traceback__
        while tb:
            deeper_path = tb.tb_frame.f_locals.get("path")
            if deeper_path:
                path = deeper_path
            tb = tb.tb_next
        exc.path = path
        raise exc


def test_it():
    try:
        res = structure({"oak": [1, "0aa", 3], "birch": [9, 2, 0]}, dict)
        print(f"Happy result is: {res}")
    except ValueError as exc:
        print(f"Path {exc.path}: has problem: {exc}")
        assert exc.args[0] == "invalid literal for int() with base 10: '0aa'"
        assert isinstance(exc, ValueError)
        assert exc.path == ("oak", 1)

To avoid confusion with intermediate functions using path argument, traversing __traceback__ may check, that givel locals are within function which is registered at converter.

@petergaultney
Copy link
Contributor

Incidentally, one of the hardest things to debug is when you have a NoneType that can't be converted into whatever the expected type is. Without a path, currently there's no way to even guess at which of the many nulls in your input it's failing on.

@madsmtm
Copy link
Contributor

madsmtm commented Jan 24, 2019

We could copy a few ideas from jsonschema, and potentially yield errors iteratively? Or maybe that's not really in scope, since cattrs need to return the new result.

But have a look at their ValidationError, there's a few fields there that we could potentially use.

@Tmpod
Copy link

Tmpod commented Nov 16, 2020

Hey, sorry for kinda necro-post, but have there been any progress on this? It would really be very handy to have this feature :)

@Tinche
Copy link
Member

Tinche commented Nov 19, 2020

This is probably the next big feature I work on :)

@Tmpod
Copy link

Tmpod commented Nov 22, 2020

Nice to hear it! Sorry for the question, but do you have any ETA for it?

@Tinche
Copy link
Member

Tinche commented Jul 8, 2023

So there's https://catt.rs/en/stable/validation.html#transforming-exceptions-into-error-messages in the last release, 23.1.x.

I'm going to close this as complete, let's open new tickets for any desired improvements!

@Tinche Tinche closed this as completed Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants