Skip to content

Conversation

Gr1N
Copy link
Contributor

@Gr1N Gr1N commented May 16, 2018

Hi!

This is my first try to improve errors format (discussion can be found in #162). I'd like to get review on changes and discuss some points.

At first here are simple example which will be used to show new errors format:

from typing import Dict, List, Union
from uuid import UUID

from pydantic import BaseModel, ValidationError


class TestDict(BaseModel):
    x: int
    y: int
    z: str


class TestModel(BaseModel):
    a: List[TestDict]
    b: int
    c: TestDict
    d: Union[int, UUID]
    e: Dict[int, int]
    f: List[Union[int, UUID]]


try:
    TestModel.parse_obj({
        'a': [
            {
                'x': 'not_int',
                'y': 42,
                'z': 'string',
            },
        ],
        'b': 'string',
        'c': {
            'y': 42,
        },
        'd': 'string',
        'e': {
            'foo': 'bar',
        },
        'f': [
            'string',
        ],
    })
except ValidationError as e:
    print(e.json())

And errors output:

[
  {
    "loc": "a.0.x",
    "msg": "invalid literal for int() with base 10: 'not_int'",
    "type": "value_error"
  },
  {
    "loc": "b",
    "msg": "invalid literal for int() with base 10: 'string'",
    "type": "value_error"
  },
  {
    "loc": "c.x",
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": "c.z",
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": "d",
    "msg": "invalid literal for int() with base 10: 'string'",
    "type": "value_error"
  },
  {
    "loc": "d",
    "msg": "badly formed hexadecimal UUID string",
    "type": "value_error"
  },
  {
    "loc": "e.key",
    "msg": "invalid literal for int() with base 10: 'foo'",
    "type": "value_error"
  },
  {
    "loc": "f.0",
    "msg": "invalid literal for int() with base 10: 'string'",
    "type": "value_error"
  },
  {
    "loc": "f.0",
    "msg": "badly formed hexadecimal UUID string",
    "type": "value_error"
  }
]

Questions:

  1. Is this format good to go (msg kept for backward compatibility and can be removed, context can be easily added)?
  2. As you can see we are using key postfix if error occurred in dictionary key, is it okay?
  3. As you can see there are two errors for field d (and f.0) because they are complex and contains sub fields, I think we need to expose this information in loc, maybe by using postfix, something like d.sub.0 and d.sub.1. Any ideas?
  4. Big question how to display errors in plain text? For now it's very simple:
a.0.x
  invalid literal for int() with base 10: 'not_int' (type=value_error)
b
  invalid literal for int() with base 10: 'string' (type=value_error)
c.x
  field required (type=value_error.missing)
c.z
  field required (type=value_error.missing)
d
  invalid literal for int() with base 10: 'string' (type=value_error)
d
  badly formed hexadecimal UUID string (type=value_error)
e.key
  invalid literal for int() with base 10: 'foo' (type=value_error)
f.0
  invalid literal for int() with base 10: 'string' (type=value_error)
f.0
  badly formed hexadecimal UUID string (type=value_error)

TODO:

  • Discuss all questions
  • Write tests for exceptions.py module
  • Write tests for new functions in utils.py module

@codecov
Copy link

codecov bot commented May 16, 2018

Codecov Report

Merging #179 into master will decrease coverage by 0.1%.
The diff coverage is 98.91%.

@@            Coverage Diff             @@
##           master     #179      +/-   ##
==========================================
- Coverage     100%   99.89%   -0.11%     
==========================================
  Files           9        9              
  Lines         988      992       +4     
  Branches      209      205       -4     
==========================================
+ Hits          988      991       +3     
- Partials        0        1       +1

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good, thank you very much.

I haven't been through everything in detail, but here are my initial thoughts.

To answer your questions:

  1. I don't mind whether we do it in one PR or multiple but I think it'll cause less pain for users if we release all the changes in one go, eg. before we next release we should move to context etc. Maybe a new PR.
  2. Maybe we should use __key__ to reduce the chance of confusion with a real key?
  3. I think we should have one error per location. The point here is to provide the most usful information and duplicate error will be a brainfuck for end uers.
  4. I think this looks good


@property
def display_errors(self):
return '\n'.join(' ' * i + msg for i, msg in _render_errors(self.errors_dict))
def flatten_errors(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a property it's name shouldn't be a verb, just flat_errors is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree



Error = namedtuple('Error', ['exc', 'track', 'index'])
class Error:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a smaller performance gain, can we use __slots__ here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this class isn't required at all and we should just store a tuple, named tuple or dict?

class Error:
def __init__(self, exc: Exception, *, loc: Union[str, int] = None) -> None:
self.exc_info = exc
self.exc_type = type(exc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this isn't necessary as we can call type(self.exc_info) when it's needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

for i, v_ in v_iter:
single_result, single_errors = self._validate_singleton(v_, values, i, cls)
single_result, single_errors = self._validate_singleton(v_, values, f'{loc}.{i}', cls)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should keep loc as a tuple until it's rendered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@samuelcolvin
Copy link
Member

Also we will need to do something if a key includes a dot.

I'm not sure what the solution is, but I guess the first step is to store the loc as a tuple so users have the option to work on the raw tuple.

We could:

  • replace with \. using backslashes will be very ugly in json
  • replace with <dot> then we have to escape lt and gt
  • maybe clearer to just return a list for loc if it has more than one item.

not sure. ideas?

@Gr1N
Copy link
Contributor Author

Gr1N commented May 18, 2018

I don't mind whether we do it in one PR or multiple but I think it'll cause less pain for users if we release all the changes in one go, eg. before we next release we should move to context etc. Maybe a new PR.

Agree with you, better to ship all changes in one release. I want to introduce context in another PR, this PR is too big already and it can be hard to review if I add more changes.

Maybe we should use __key__ to reduce the chance of confusion with a real key?

Good idea, also I think in future we can add ability to change __key__ using config.

I think we should have one error per location. The point here is to provide the most usful information and duplicate error will be a brainfuck for end uers.

I think we can introduce one error, something like:

types = ' or '.join([field.type_ for field in self.sub_fields])
TypeError('f'value is not a valid {types}, got {display_as_type(v)}'')

What do you think?

maybe clearer to just return a list for loc if it has more than one item.

Let's go with this option, it's simple to implement and can be very useful for developers.

@Gr1N
Copy link
Contributor Author

Gr1N commented May 19, 2018

@samuelcolvin ping :)

@samuelcolvin
Copy link
Member

I'm away, will look on Wednesday.

@Gr1N
Copy link
Contributor Author

Gr1N commented May 20, 2018

One more example to discuss about sub fields:

class TestDict(BaseModel):
    x: int
    y: int
    z: str


class TestModel(BaseModel):
    d: Union[int, TestDict]


try:
    TestModel.parse_obj({
        'd': {
            'x': 42,
        },
    })
except ValidationError as e:
    print(e.json())
[
  {
    "loc": [
      "d"
    ],
    "msg": "int() argument must be a string, a bytes-like object or a number, not 'dict'",
    "type": "type_error"
  },
  {
    "loc": [
      "d",
      "y"
    ],
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": [
      "d",
      "z"
    ],
    "msg": "field required",
    "type": "value_error.missing"
  }
]

Looking at this example I have no idea what to do with errors.

@samuelcolvin
Copy link
Member

Good idea, also I think in future we can add ability to change key using config.

Regarding multiple errors in the same location. There are basically two cases:

1. all type errors

We get a type error when trying each sub-type. Here we should definitely try to return one saying not an instance of foo, bar or spam, I guess with a type like type_error.union.

This might require some changes to the errors we raise. For example in your example with Union[int, UUID] we should probably change the errors on both int and UUID to always be TypeErrors. This is really correct, since xyz is not of type UUID rather than a bad a value of UUID.

2. value errors / not all type errors.

I guess here we really have to return all the errors. It's ugly, but it's the only way of retaining all the information someone might need.

I'm sorry to change my mind but I guess the best way of doing this is to return each error separately, so have duplicate locations in the list of errors. We can do something pretty when displaying errors so we have

d
  invalid literal for int() with base 10: 'string' (type=value_error)
  badly formed hexadecimal UUID string (type=value_error)

I guess we can add {'union': True} to the context when we implement that to give extra information.

@Gr1N
Copy link
Contributor Author

Gr1N commented May 22, 2018

I guess here we really have to return all the errors. It's ugly, but it's the only way of retaining all the information someone might need.
I'm sorry to change my mind but I guess the best way of doing this is to return each error separately, so have duplicate locations in the list of errors.

No problem, it's already implemented so I don't need to change anything.

We can do something pretty when displaying errors so we have

I don't think that we need to merge error messages, let's keep it as simple as possible.

@samuelcolvin please review code and let me know if I need something to improve.

@samuelcolvin
Copy link
Member

ok, will do, busy today so might be tomorrow.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, mostly small things.

I think the larger changes can all be left to other PRs.

return str(type_)

class Error:
__slots__ = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, can we keep this concise:

__slots__ = 'exc_info', 'loc'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed



class ValidationError(ValueError):
__slots__ = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed



def display_errors(errors):
display = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think

display = [
    ' -> '.join(e['loc']) + f'\n  {e["msg"]} (type={e["type"]})' for e in errors
]

or something would be more elegant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

return '\n'.join(display)


def flatten_errors(errors, *, loc=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cleaner to make this a generator then call list(flatten_errors(...)) where it's called above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please describe why you want to see this as generator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be much cleaner and more explicit with less code.

def flatten_errors(errors, *, loc=None):
    for error in errors:
        if isinstance(error, Error):
            if isinstance(error.exc_info, ValidationError):
                yield from flatten_errors(error.exc_info.errors, loc=error.loc)
            else:
                yield {
                    'loc': error.loc if loc is None else loc + error.loc,
                    'msg': error.msg,
                    'type': error.type_,
                }
        elif isinstance(error, list):
            yield from flatten_errors(error)
        else:
            raise RuntimeError(f'Unknown error object: {error}')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! I forgot about yield from

elif isinstance(error, list):
flat.extend(flatten_errors(error))
else:
raise TypeError(f'Unknown error object: {error}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should raise another type exception here (eg.RuntimeError) to avoid confusion with type errors elsewhere in pydantic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, RuntimeError better in this case


@property
def display_errors(self):
return '\n'.join(' ' * i + msg for i, msg in _render_errors(self.errors_dict))
def flat_errors(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change this to be a method and internal. eg. _flat_errors()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, for example in my case I would like to access .flat_errors in my project and convert this simple representation to something else, also I don't want to use .json() (or .serialize()) I need Python object at first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can you just make it a method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

'type': error.type_,
})
elif isinstance(error, list):
flat.extend(flatten_errors(error))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should only happen in the case of errors from a Union here we should check if all errors are type_errors and if so group them into on type_error.union.

Again maybe this should wait for another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but I think better place to implement logic like this in method _validate_singleton, where we're iterating thought sub fields. And yes, let's discuss this in another PR.


result, errors = {}, []
for k, v_ in v_iter.items():
key_result, key_errors = self.key_field.validate(k, values, 'key', cls)
v_loc = loc, '__key__'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for loc to already be a tuple here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be, found an issue, thanks!

None,
],
})
assert exc_info.value.display_errors == """a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easier to read if you do

    assert exc_info.value.display_errors == """\
a
  invalid literal for int() with base 10: 'not_int' (type=value_error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed



def test_validation_error_display_errors():
with pytest.raises(ValidationError) as exc_info:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe of these tests can be rewritten as one parameterized test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Gr1N
Copy link
Contributor Author

Gr1N commented May 23, 2018

@samuelcolvin review it again please

@samuelcolvin samuelcolvin merged commit 31683f8 into pydantic:master May 23, 2018
@samuelcolvin
Copy link
Member

awesome, thank you very much. We'll need to remember to update history before the next release.

@Gr1N Gr1N deleted the errors-format branch May 23, 2018 14:00
@Gr1N Gr1N mentioned this pull request May 31, 2018
alexdrydew pushed a commit to alexdrydew/pydantic that referenced this pull request Dec 23, 2023
* changes to sequences to improve performance

* add list[any] pydantic benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants