Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat NaN as None/null #1779

Closed
vkolotov opened this issue Jul 29, 2020 · 6 comments
Closed

Treat NaN as None/null #1779

vkolotov opened this issue Jul 29, 2020 · 6 comments

Comments

@vkolotov
Copy link

vkolotov commented Jul 29, 2020

Feature Request

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.6.1
            pydantic compiled: True
                 install path: /Users/*/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pydantic
               python version: 3.8.0 (default, Apr  1 2020, 05:50:17)  [Clang 10.0.1 (clang-1001.0.46.4)]
                     platform: macOS-10.14.6-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

On our project we have to deal with pandas. Pandas treats "non existing values" (nulls) as NaN (which is weird in the first instance, nevertheless we have to live with that). We use pydantic to validate data that is generated by pandas, in order to do this we convert pandas DataFrame to dictionary (or list of dictionaries) and then convert dictionaries to pydantic dataclasses. The resulted dataclasses are serialised into json (then stored in DB or in Elasticsearch).

The problem here is, DataFrame's may contain NaN values, they get transferred to pydantic dataclasses without any error (as NaN is an instance of float), then those NaNs get serialised into json as nulls (which is not correct, they are not optional).

We would like to catch this "nullability" issue as a part of the normal pydantic validation, e.g. if a float field that is not Optional gets assigned with NaN, we would like to get a validation error (e.g. None is not allowed value).

Providing custom validators via @validator or via __get_validators__ is not an option as we don't want to specify all these for each dataclass. What would be great if we could:

  1. Either specify a replacement for pydantic.validators.float_validator and make it global/default.
  2. Or "configure" somehow pydantic so that the existing validators.float_validator correctly handles NaNs.

There is another solution that we currently use - defining custom type for float, but this requires using this type in each dataclass. It is a viable solution, but still not great.

@PrettyWood
Copy link
Member

PrettyWood commented Aug 4, 2020

Hello @vkolotov
Thank you for you feedback and for having thought about different solutions. But your usecase seems very precise and I'm afraid the desired feature would add a lot of complexity for no real benefit.
A solution like this one doesn't help at all ?

from math import isnan

from pydantic import BaseModel as PydanticBaseModel, validator


class BaseModel(PydanticBaseModel):
    @validator('*')
    def change_nan_to_none(cls, v, field):
        if field.outer_type_ is float and isnan(v):
            return None
        return v


class Model(BaseModel):
    a: float
    b: float


class Model2(BaseModel):
    m: Model
    n: float


model = Model(a=float('nan'), b=3.1)
print(repr(model))
print(repr(Model2(m=model, n=float('nan'))))
Model(a=None, b=3.1)
Model2(m=Model(a=None, b=3.1), n=None)

@stephprobst
Copy link

stephprobst commented Nov 29, 2020

I think there is an error in the signature of change_nan_to_none, which will cause an error. The correct signature should be:

def change_nan_to_none(cls, v, values, config, field):

@ToddG
Copy link

ToddG commented Jul 23, 2021

I think there is an error in the signature of change_nan_to_none, which will cause an error. The correct signature should be:

def change_nan_to_none(cls, v, values, config, field):

According to the docs, the original method signature is correct, and your additions are also correct, but optional:

A few things to note on validators:

    validators are "class methods", so the first argument value they receive is the UserModel class, not an instance of UserModel.
    the second argument is always the field value to validate; it can be named as you please
    you can also add any subset of the following arguments to the signature (the names must match):
        values: a dict containing the name-to-value mapping of any previously-validated fields
        config: the model config
        field: the field being validated. Type of object is pydantic.fields.ModelField.
        **kwargs: if provided, this will include the arguments above not explicitly listed in the signature

See: https://pydantic-docs.helpmanual.io/usage/validators/

@matthiaskoenig
Copy link

How would I make the provided solution work with JSON serialization? It is possible to create the models, but reading the model from JSON results in the issue that there none in the data

  File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Model
a
  none is not an allowed value (type=type_error.none.not_allowed)
import json
from pathlib import Path

with tempfile.TemporaryDirectory() as tmp_dir:
    json_path = Path(tmp_dir) / "example.json"
    with open(json_path, "w") as f_json:
        f_json.write(model.json())

    with open(json_path, "r") as f_json:
        d = json.load(f_json)
        model2 = Model(**d)

I tried to change the model to `a=Optional[float], but then I get the error:

pydantic.error_wrappers.ValidationError: 1 validation error for Model
a
  must be real number, not NoneType (type=type_error)

How would I change the model to work with NaNs also for JSON?

NaN/Inf/-Inf values are very common in scientific contexts.

@Moisan
Copy link

Moisan commented Nov 2, 2021

It seems like the root issue is the inability of Pydantic to refuse NaN without a field specific validation. I think it would be useful to have a specific constrained type for floats that accepts all floats except NaN.

@adriangb
Copy link
Member

I think this workaround will be easier in V2 because you can create a "constrained type" much easier and integrate it with type checkers, IDEs, etc.

from math import isnan
from typing import Annotated, Any, Optional, TypeVar

from pydantic import BaseModel
from pydantic.annotated_arguments import BeforeValidator


def coerce_nan_to_none(x: Any) -> Any:
    if isnan(x):
        return None
    return x

T = TypeVar('T')

NoneOrNan = Annotated[Optional[T], BeforeValidator(coerce_nan_to_none)]


class Model(BaseModel):
    a: NoneOrNan[float]
    b: NoneOrNan[float]


m = Model(a=float('nan'), b=3.1)
assert m.a is None
assert m.b == 3.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants