Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pydantic v2.0 #4750

Merged
merged 15 commits into from
Jan 17, 2024
Merged

Support pydantic v2.0 #4750

merged 15 commits into from
Jan 17, 2024

Conversation

adonath
Copy link
Member

@adonath adonath commented Sep 15, 2023

Description
This pull request starts the refactoring to support Pydantic v2.0. I'm running into many issues, but mostly because the Gammapy code base is very inconsistent. Pydantic complains correctly most of the time. I'll start collecting issues here...

Update:

  • Gammapy used None as default in many place, without declaring it as allowed type. I change to use Optional in these cases. I think we should rather change to more meaningful default values, bot not in this PR.
  • I consistently changed to a new introduce PathType, which does a minimal validation based on make_path
  • I had to change to arbitrary_types_allowed=True for now, because in Pydantiv v2.0 there is now support for __get_validators__. This would need to be supported using __get_pydantic_core_schema__ (https://docs.pydantic.dev/latest/usage/types/custom/#customizing-validation-with-__get_pydantic_core_schema__) instead. Which I consider to much work for now.
  • Beside the incompatibility with ray the transition seems to work.

@adonath
Copy link
Member Author

adonath commented Sep 15, 2023

Currently ray only supports pydantic<2.0 (https://github.com/ray-project/ray/blob/master/python/setup.py#L265), we cannot update right now. We have to wait for ray to update as well...

@adonath
Copy link
Member Author

adonath commented Sep 18, 2023

Corresponding issue in ray: ray-project/ray#39722

@dioptre
Copy link

dioptre commented Sep 25, 2023

Thanks for this. Probably many people waiting on it.

@adonath adonath force-pushed the support_pydantic_v2.0 branch 2 times, most recently from 3a9d708 to 1aea6f6 Compare November 3, 2023 15:16
@adonath
Copy link
Member Author

adonath commented Nov 3, 2023

The CI still fails here because of ray. Interestingly it works for me locally, so I can get succeeding test with ray[default] == 2.8.0 and pydantic=2.4.2. On closer inspection the versions are not compatible, because ray still requires pydantic < 2.0, but accidentally it seems to work. Not sure what to do about this...

@adonath
Copy link
Member Author

adonath commented Jan 10, 2024

I rebased and updated to ray 2.9. This should be ready to review now. Here are some questionable behaviors I found:

  • I think converting None to SkyCoord(np.nan, np.nan) is not a good idea. Either the object is optional or it is not.
  • In config.py we rely on the default values being defined, this required setting validate_defaults=True, even if the default is None, which make the value optional. I think we should replace them with meaningful defaults and probably use validate_defaults=False again.

@adonath adonath marked this pull request as ready for review January 10, 2024 17:47
@adonath
Copy link
Member Author

adonath commented Jan 10, 2024

I still have to fix some places that raise deprecation warnings...

@adonath
Copy link
Member Author

adonath commented Jan 11, 2024

The remaining test fails seems unrelated...

@maxnoe
Copy link
Member

maxnoe commented Jan 11, 2024

I think converting None to SkyCoord(np.nan, np.nan) is not a good idea. Either the object is optional or it is not.

I fully agree. We have a similar issue in the data model of pointing I think, where for some reason it is required to provide a ra/dec pointing but SkyCoord(nan, nan) is allowed

raise ValidationError(
f"Incorrect position. Expect SkyCoord got {type(v)} instead."
raise ValueError(
f"Incorrect position. Expect SkyCoord in altaz frame got {type(v)} instead."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the frame should be in icrs, and not alt/az... Maybe a copy/paste typo?

setup.cfg Outdated Show resolved Hide resolved
BeforeValidator(validate_angle),
]

EnergyType = Annotated[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Astropy Quantity supports generic typing over the physical quantity, maybe that's a better solution than individual definitions?

In [1]: from astropy.units import Quantity

In [2]: Quantity['length']
Out[2]: typing.Annotated[astropy.units.quantity.Quantity, PhysicalType('length')]

In [3]: Quantity['energy']
Out[3]: typing.Annotated[astropy.units.quantity.Quantity, PhysicalType({'energy', 'torque', 'work'})]

In [4]: Quantity['angle']
Out[4]: typing.Annotated[astropy.units.quantity.Quantity, PhysicalType('angle')]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was not aware of this. However in our case, we mostly do it for the custom validation and serialization methods and only the energy type is handled with a quantity object. I guess Pydantic does not understand the PhysicalType annotation, so we need a validator in any case.

Copy link
Member

@maxnoe maxnoe Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work nicely:

from typing import Annotated, get_args
from pydantic import BaseModel, BeforeValidator, ConfigDict, ValidationError
from pydantic.functional_validators import AfterValidator
import astropy.units as u
from functools import partial


def validate_physical_type(value: u.Quantity, physical_type: u.PhysicalType):
    if value.unit is None:
        raise ValueError(f"Expected physical_type: {physical_type}, but value had no unit")

    if (t := value.unit.physical_type) != physical_type:
        raise ValueError(f"Expected physical_type: {physical_type}, got {t}")

    return value


class ValidatedQuantity(u.Quantity):
    @classmethod
    def __class_getitem__(cls, physical_type: str):
        q = super().__class_getitem__(physical_type)
        validate = partial(validate_physical_type, physical_type=get_args(q)[1])
        return Annotated[
            q,
            BeforeValidator(lambda v: cls(v, copy=False)),
            AfterValidator(validate)
        ]


class Foo(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)
    length : ValidatedQuantity['length'] = 5 * u.m


Foo(length=5 * u.cm)

try:
    Foo(length=5 * u.deg)
except ValidationError as e:
    print(e)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks generally useful! I think for this PR it does not add any value, because currently we only have the EnergyType. But I'm pretty sure we will need it in future, so maybe you just want to contribute it in a follow up PR yourself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending it for an optional required shape might also be interesting, such that the following works:

class Foo(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)
    arbitrary_length:  ValidatedQuantity['length'] = [[5, 5], [3, 3]]  * u.m
    scalar_length : ValidatedQuantity['length', ()] = 5 * u.m
    other_length: ValidatedQuantity['length', (2,)] = [5, 3] * u.m

When no shape is given, no validation is performed.

Co-authored-by: Bruno Khélifi <khelifi@in2p3.fr>
Copy link
Contributor

@registerrier registerrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adonath . This looks really nice.

Regarding SkyCoord(np.nan, np.nan) as empty values returned by the validator, I think the point is to avoid errors when accessing some information like this. It also allows easy concatenation of such coordinates when stacking metadata at DL4.

For extra_allow=False, the main point is how to allow for flexibility for metadata class additions for specific experiments/observatories. I was assuming that we could allow addition of any Metadata derived object with its specific FITS serialization scheme. Could there be a way to allow only extra MetaData objects?

@@ -169,8 +131,8 @@ class ExcessMapConfig(GammapyBaseConfig):


class BackgroundConfig(GammapyBaseConfig):
method: BackgroundMethodEnum = None
exclusion: Path = None
method: Union[BackgroundMethodEnum, None] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not Optional here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks , I think I just forgot to use Optional here...

pointing: Optional[PointingInfoMetaData]
target: Optional[TargetMetaData]
location: Optional[Union[EarthLocation, str]]
_tag: ClassVar[Literal["observation"]] = "observation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does two things:

  • _tag is currently not part of the model definition, because by convention Pydantic treats attributes starting with underscore as private attributes. This is also why the ClassVar is needed. The tag is assigned to the class not the instance.
  • The Literal only allows exactly this value. However here it has not a real function, except for documentation.

I would actually propose to make the tag part of the model definition and use the Literal. This will fix the value to the value defined by the Literal.

model_config = ConfigDict(
arbitrary_types_allowed=True,
validate_assignment=True,
extra="forbid",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to allow specific types, i.e. MetaData derived entries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know this is not possible. We would need to work with a container such as List[MetaData] or Dict[str, MetaData] to allow for the type validation.

return cls(creator=creator, date=date)
_tag: ClassVar[Literal["creator"]] = "creator"
creator: Optional[str] = f"Gammapy {version}"
date: Optional[TimeType] = Field(default_factory=Time.now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here default_factory is a function that generates the default value? Hence the now instead of now()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly! This is the recommended Pydantic pattern for default factories.

instrument: Optional[str]
sub_array: Optional[str]
observation_mode: Optional[str]
obs_id: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether some formats would provide IDs that are not ints. It is probably simpler to assume a unique type indeed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was confused by this. We could certain support both if needed. But right now I'm not aware of any data that would require the string.

if v is None:
return SkyCoord(np.nan, np.nan, unit="deg", frame="icrs")
elif isinstance(v, SkyCoord):
@field_validator("position", mode="after")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the mode after here?
Why reimplementing validate_radec_mean? Rather call it differently no?


radec_mean: Optional[SkyCoord]
altaz_mean: Optional[Union[SkyCoord, AltAz]]
radec_mean: Optional[SkyCoord] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be practical to create a SkyCoordICRSType and SkyCoordAltAzType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be more elegant. I can introduce those.

@MRegeard
Copy link
Member

Don't we also need to update the environment-dev.yaml file, which currently says pydantic<2.0 ?

@bkhelifi
Copy link
Member

Don't we also need to update the environment-dev.yaml file, which currently says pydantic<2.0 ?

And the codemeta also

@adonath
Copy link
Member Author

adonath commented Jan 12, 2024

The last remaining test fail was related to the fact that we use Time objects with multiple values, instead of scalar. I think we should actually make explicit in future when we allow scalar values and when arrays. See the comment #4750 (comment), I made to @maxnoe 's suggestion.

Thanks @bkhelifi and @MRegeard, I also updated the meta and conda env file.

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (315f0ca) 75.69% compared to head (4369788) 75.69%.
Report is 13 commits behind head on main.

❗ Current head 4369788 differs from pull request most recent head 16d849d. Consider uploading reports for the commit 16d849d to get more accurate results

Files Patch % Lines
gammapy/utils/types.py 89.74% 8 Missing ⚠️
gammapy/analysis/core.py 0.00% 1 Missing ⚠️
gammapy/estimators/points/sed.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4750   +/-   ##
=======================================
  Coverage   75.69%   75.69%           
=======================================
  Files         228      229    +1     
  Lines       33841    33836    -5     
=======================================
- Hits        25616    25613    -3     
+ Misses       8225     8223    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adonath
Copy link
Member Author

adonath commented Jan 12, 2024

@registerrier @QRemy PR is ready to merge from my side...

@adonath adonath added this to the 1.2 milestone Jan 12, 2024
Copy link
Contributor

@registerrier registerrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adonath . This looks good.

there is a remaining failure visible in the tutorials. Could it be due to this definition that is not supported in the current scheme:

config.datasets.safe_mask.parameters = {"offset_max": 2.5 * u.deg}

Since there is no generic Quantity type defined with its serializer this has to fail when printing config .

@adonath
Copy link
Member Author

adonath commented Jan 15, 2024

@registerrier This is due to the generic parameters container. Those values are not validated and cannot really be serialized safely. Users can put arbitrary things and it will always fail...I think we should deprecate the parameters. The duplication of the maker config is maybe the lesser evil.

Copy link
Contributor

@registerrier registerrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adonath . This looks good! No further comment.

@adonath
Copy link
Member Author

adonath commented Jan 16, 2024

@maxnoe or @bkhelifi please approve or make further comments. Thanks!

Copy link
Member

@bkhelifi bkhelifi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @adonath for this important upgrade!

@adonath
Copy link
Member Author

adonath commented Jan 17, 2024

Thanks @registerrier and @bkhelifi, I'll go ahead and merge now.

@adonath adonath merged commit 1438552 into gammapy:main Jan 17, 2024
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants