New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors format #179

Merged
merged 13 commits into from May 23, 2018

Conversation

2 participants
@Gr1N
Collaborator

Gr1N commented May 16, 2018

Hi!

This is my first try to improve errors format (discussion can be found in #162). I'd like to get review on changes and discuss some points.

At first here are simple example which will be used to show new errors format:

from typing import Dict, List, Union
from uuid import UUID

from pydantic import BaseModel, ValidationError


class TestDict(BaseModel):
    x: int
    y: int
    z: str


class TestModel(BaseModel):
    a: List[TestDict]
    b: int
    c: TestDict
    d: Union[int, UUID]
    e: Dict[int, int]
    f: List[Union[int, UUID]]


try:
    TestModel.parse_obj({
        'a': [
            {
                'x': 'not_int',
                'y': 42,
                'z': 'string',
            },
        ],
        'b': 'string',
        'c': {
            'y': 42,
        },
        'd': 'string',
        'e': {
            'foo': 'bar',
        },
        'f': [
            'string',
        ],
    })
except ValidationError as e:
    print(e.json())

And errors output:

[
  {
    "loc": "a.0.x",
    "msg": "invalid literal for int() with base 10: 'not_int'",
    "type": "value_error"
  },
  {
    "loc": "b",
    "msg": "invalid literal for int() with base 10: 'string'",
    "type": "value_error"
  },
  {
    "loc": "c.x",
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": "c.z",
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": "d",
    "msg": "invalid literal for int() with base 10: 'string'",
    "type": "value_error"
  },
  {
    "loc": "d",
    "msg": "badly formed hexadecimal UUID string",
    "type": "value_error"
  },
  {
    "loc": "e.key",
    "msg": "invalid literal for int() with base 10: 'foo'",
    "type": "value_error"
  },
  {
    "loc": "f.0",
    "msg": "invalid literal for int() with base 10: 'string'",
    "type": "value_error"
  },
  {
    "loc": "f.0",
    "msg": "badly formed hexadecimal UUID string",
    "type": "value_error"
  }
]

Questions:

  1. Is this format good to go (msg kept for backward compatibility and can be removed, context can be easily added)?
  2. As you can see we are using key postfix if error occurred in dictionary key, is it okay?
  3. As you can see there are two errors for field d (and f.0) because they are complex and contains sub fields, I think we need to expose this information in loc, maybe by using postfix, something like d.sub.0 and d.sub.1. Any ideas?
  4. Big question how to display errors in plain text? For now it's very simple:
a.0.x
  invalid literal for int() with base 10: 'not_int' (type=value_error)
b
  invalid literal for int() with base 10: 'string' (type=value_error)
c.x
  field required (type=value_error.missing)
c.z
  field required (type=value_error.missing)
d
  invalid literal for int() with base 10: 'string' (type=value_error)
d
  badly formed hexadecimal UUID string (type=value_error)
e.key
  invalid literal for int() with base 10: 'foo' (type=value_error)
f.0
  invalid literal for int() with base 10: 'string' (type=value_error)
f.0
  badly formed hexadecimal UUID string (type=value_error)

TODO:

  • Discuss all questions
  • Write tests for exceptions.py module
  • Write tests for new functions in utils.py module

Gr1N added some commits May 14, 2018

@codecov

This comment has been minimized.

codecov bot commented May 16, 2018

Codecov Report

Merging #179 into master will decrease coverage by 0.1%.
The diff coverage is 98.91%.

@@            Coverage Diff             @@
##           master     #179      +/-   ##
==========================================
- Coverage     100%   99.89%   -0.11%     
==========================================
  Files           9        9              
  Lines         988      992       +4     
  Branches      209      205       -4     
==========================================
+ Hits          988      991       +3     
- Partials        0        1       +1
@samuelcolvin

Looking very good, thank you very much.

I haven't been through everything in detail, but here are my initial thoughts.

To answer your questions:

  1. I don't mind whether we do it in one PR or multiple but I think it'll cause less pain for users if we release all the changes in one go, eg. before we next release we should move to context etc. Maybe a new PR.
  2. Maybe we should use __key__ to reduce the chance of confusion with a real key?
  3. I think we should have one error per location. The point here is to provide the most usful information and duplicate error will be a brainfuck for end uers.
  4. I think this looks good
@property
def display_errors(self):
return '\n'.join(' ' * i + msg for i, msg in _render_errors(self.errors_dict))
def flatten_errors(self):

This comment has been minimized.

@samuelcolvin

samuelcolvin May 17, 2018

Owner

if this is a property it's name shouldn't be a verb, just flat_errors is good.

This comment has been minimized.

@Gr1N

Gr1N May 18, 2018

Collaborator

agree

Error = namedtuple('Error', ['exc', 'track', 'index'])
class Error:

This comment has been minimized.

@samuelcolvin

samuelcolvin May 17, 2018

Owner

for a smaller performance gain, can we use __slots__ here.

This comment has been minimized.

@samuelcolvin

samuelcolvin May 17, 2018

Owner

maybe this class isn't required at all and we should just store a tuple, named tuple or dict?

class Error:
def __init__(self, exc: Exception, *, loc: Union[str, int] = None) -> None:
self.exc_info = exc
self.exc_type = type(exc)

This comment has been minimized.

@samuelcolvin

samuelcolvin May 17, 2018

Owner

I guess this isn't necessary as we can call type(self.exc_info) when it's needed?

This comment has been minimized.

@Gr1N

Gr1N May 18, 2018

Collaborator

yep

for i, v_ in v_iter:
single_result, single_errors = self._validate_singleton(v_, values, i, cls)
single_result, single_errors = self._validate_singleton(v_, values, f'{loc}.{i}', cls)

This comment has been minimized.

@samuelcolvin

samuelcolvin May 17, 2018

Owner

perhaps we should keep loc as a tuple until it's rendered?

This comment has been minimized.

@Gr1N

Gr1N May 18, 2018

Collaborator

yep

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 17, 2018

Also we will need to do something if a key includes a dot.

I'm not sure what the solution is, but I guess the first step is to store the loc as a tuple so users have the option to work on the raw tuple.

We could:

  • replace with \. using backslashes will be very ugly in json
  • replace with <dot> then we have to escape lt and gt
  • maybe clearer to just return a list for loc if it has more than one item.

not sure. ideas?

@Gr1N

This comment has been minimized.

Collaborator

Gr1N commented May 18, 2018

I don't mind whether we do it in one PR or multiple but I think it'll cause less pain for users if we release all the changes in one go, eg. before we next release we should move to context etc. Maybe a new PR.

Agree with you, better to ship all changes in one release. I want to introduce context in another PR, this PR is too big already and it can be hard to review if I add more changes.

Maybe we should use __key__ to reduce the chance of confusion with a real key?

Good idea, also I think in future we can add ability to change __key__ using config.

I think we should have one error per location. The point here is to provide the most usful information and duplicate error will be a brainfuck for end uers.

I think we can introduce one error, something like:

types = ' or '.join([field.type_ for field in self.sub_fields])
TypeError('f'value is not a valid {types}, got {display_as_type(v)}'')

What do you think?

maybe clearer to just return a list for loc if it has more than one item.

Let's go with this option, it's simple to implement and can be very useful for developers.

@Gr1N

This comment has been minimized.

Collaborator

Gr1N commented May 19, 2018

@samuelcolvin ping :)

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 19, 2018

I'm away, will look on Wednesday.

@Gr1N

This comment has been minimized.

Collaborator

Gr1N commented May 20, 2018

One more example to discuss about sub fields:

class TestDict(BaseModel):
    x: int
    y: int
    z: str


class TestModel(BaseModel):
    d: Union[int, TestDict]


try:
    TestModel.parse_obj({
        'd': {
            'x': 42,
        },
    })
except ValidationError as e:
    print(e.json())
[
  {
    "loc": [
      "d"
    ],
    "msg": "int() argument must be a string, a bytes-like object or a number, not 'dict'",
    "type": "type_error"
  },
  {
    "loc": [
      "d",
      "y"
    ],
    "msg": "field required",
    "type": "value_error.missing"
  },
  {
    "loc": [
      "d",
      "z"
    ],
    "msg": "field required",
    "type": "value_error.missing"
  }
]

Looking at this example I have no idea what to do with errors.

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 22, 2018

Good idea, also I think in future we can add ability to change key using config.

Regarding multiple errors in the same location. There are basically two cases:

1. all type errors

We get a type error when trying each sub-type. Here we should definitely try to return one saying not an instance of foo, bar or spam, I guess with a type like type_error.union.

This might require some changes to the errors we raise. For example in your example with Union[int, UUID] we should probably change the errors on both int and UUID to always be TypeErrors. This is really correct, since xyz is not of type UUID rather than a bad a value of UUID.

2. value errors / not all type errors.

I guess here we really have to return all the errors. It's ugly, but it's the only way of retaining all the information someone might need.

I'm sorry to change my mind but I guess the best way of doing this is to return each error separately, so have duplicate locations in the list of errors. We can do something pretty when displaying errors so we have

d
  invalid literal for int() with base 10: 'string' (type=value_error)
  badly formed hexadecimal UUID string (type=value_error)

I guess we can add {'union': True} to the context when we implement that to give extra information.

@Gr1N

This comment has been minimized.

Collaborator

Gr1N commented May 22, 2018

I guess here we really have to return all the errors. It's ugly, but it's the only way of retaining all the information someone might need.
I'm sorry to change my mind but I guess the best way of doing this is to return each error separately, so have duplicate locations in the list of errors.

No problem, it's already implemented so I don't need to change anything.

We can do something pretty when displaying errors so we have

I don't think that we need to merge error messages, let's keep it as simple as possible.

@samuelcolvin please review code and let me know if I need something to improve.

@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 22, 2018

ok, will do, busy today so might be tomorrow.

@samuelcolvin

Looking good, mostly small things.

I think the larger changes can all be left to other PRs.

return str(type_)
class Error:
__slots__ = (

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

nitpick, can we keep this concise:

__slots__ = 'exc_info', 'loc'

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

fixed

class ValidationError(ValueError):
__slots__ = (

This comment has been minimized.

@samuelcolvin

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

fixed

def display_errors(errors):
display = []

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

i think

display = [
    ' -> '.join(e['loc']) + f'\n  {e["msg"]} (type={e["type"]})' for e in errors
]

or something would be more elegant

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

updated

return '\n'.join(display)
def flatten_errors(errors, *, loc=None):

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

I think cleaner to make this a generator then call list(flatten_errors(...)) where it's called above.

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

Can you please describe why you want to see this as generator?

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

it will be much cleaner and more explicit with less code.

def flatten_errors(errors, *, loc=None):
    for error in errors:
        if isinstance(error, Error):
            if isinstance(error.exc_info, ValidationError):
                yield from flatten_errors(error.exc_info.errors, loc=error.loc)
            else:
                yield {
                    'loc': error.loc if loc is None else loc + error.loc,
                    'msg': error.msg,
                    'type': error.type_,
                }
        elif isinstance(error, list):
            yield from flatten_errors(error)
        else:
            raise RuntimeError(f'Unknown error object: {error}')

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

great! I forgot about yield from

elif isinstance(error, list):
flat.extend(flatten_errors(error))
else:
raise TypeError(f'Unknown error object: {error}')

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

Maybe we should raise another type exception here (eg.RuntimeError) to avoid confusion with type errors elsewhere in pydantic?

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

agree, RuntimeError better in this case

@property
def display_errors(self):
return '\n'.join(' ' * i + msg for i, msg in _render_errors(self.errors_dict))
def flat_errors(self):

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

I think we should change this to be a method and internal. eg. _flat_errors()

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

I don't think so, for example in my case I would like to access .flat_errors in my project and convert this simple representation to something else, also I don't want to use .json() (or .serialize()) I need Python object at first.

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

ok, can you just make it a method.

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

done

'type': error.type_,
})
elif isinstance(error, list):
flat.extend(flatten_errors(error))

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

I think this should only happen in the case of errors from a Union here we should check if all errors are type_errors and if so group them into on type_error.union.

Again maybe this should wait for another PR.

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

You're right, but I think better place to implement logic like this in method _validate_singleton, where we're iterating thought sub fields. And yes, let's discuss this in another PR.

result, errors = {}, []
for k, v_ in v_iter.items():
key_result, key_errors = self.key_field.validate(k, values, 'key', cls)
v_loc = loc, '__key__'

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

is it possible for loc to already be a tuple here?

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

it can be, found an issue, thanks!

None,
],
})
assert exc_info.value.display_errors == """a

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

easier to read if you do

    assert exc_info.value.display_errors == """\
a
  invalid literal for int() with base 10: 'not_int' (type=value_error)

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

fixed

def test_validation_error_display_errors():
with pytest.raises(ValidationError) as exc_info:

This comment has been minimized.

@samuelcolvin

samuelcolvin May 23, 2018

Owner

I think maybe of these tests can be rewritten as one parameterized test.

This comment has been minimized.

@Gr1N

Gr1N May 23, 2018

Collaborator

fixed

Gr1N added some commits May 23, 2018

@Gr1N

This comment has been minimized.

Collaborator

Gr1N commented May 23, 2018

@samuelcolvin review it again please

@samuelcolvin samuelcolvin merged commit 31683f8 into samuelcolvin:master May 23, 2018

3 checks passed

codecov/project 99.89% (-0.11%) compared to 9061cb2
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details
@samuelcolvin

This comment has been minimized.

Owner

samuelcolvin commented May 23, 2018

awesome, thank you very much. We'll need to remember to update history before the next release.

@Gr1N Gr1N deleted the Gr1N:errors-format branch May 23, 2018

@Gr1N Gr1N referenced this pull request May 23, 2018

Closed

Errors format is too complex #162

@Gr1N Gr1N referenced this pull request May 31, 2018

Merged

Error context and message #183

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment