-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pydantic v2.0 #4750
Support pydantic v2.0 #4750
Conversation
8ed2411
to
95ec247
Compare
Currently |
Corresponding issue in ray: ray-project/ray#39722 |
Thanks for this. Probably many people waiting on it. |
3a9d708
to
1aea6f6
Compare
The CI still fails here because of ray. Interestingly it works for me locally, so I can get succeeding test with |
1aea6f6
to
f7aed3d
Compare
I rebased and updated to ray 2.9. This should be ready to review now. Here are some questionable behaviors I found:
|
I still have to fix some places that raise deprecation warnings... |
The remaining test fails seems unrelated... |
I fully agree. We have a similar issue in the data model of pointing I think, where for some reason it is required to provide a ra/dec pointing but |
gammapy/utils/metadata.py
Outdated
raise ValidationError( | ||
f"Incorrect position. Expect SkyCoord got {type(v)} instead." | ||
raise ValueError( | ||
f"Incorrect position. Expect SkyCoord in altaz frame got {type(v)} instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the frame should be in icrs, and not alt/az... Maybe a copy/paste typo?
BeforeValidator(validate_angle), | ||
] | ||
|
||
EnergyType = Annotated[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Astropy Quantity supports generic typing over the physical quantity, maybe that's a better solution than individual definitions?
In [1]: from astropy.units import Quantity
In [2]: Quantity['length']
Out[2]: typing.Annotated[astropy.units.quantity.Quantity, PhysicalType('length')]
In [3]: Quantity['energy']
Out[3]: typing.Annotated[astropy.units.quantity.Quantity, PhysicalType({'energy', 'torque', 'work'})]
In [4]: Quantity['angle']
Out[4]: typing.Annotated[astropy.units.quantity.Quantity, PhysicalType('angle')]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I was not aware of this. However in our case, we mostly do it for the custom validation and serialization methods and only the energy type is handled with a quantity object. I guess Pydantic does not understand the PhysicalType
annotation, so we need a validator in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work nicely:
from typing import Annotated, get_args
from pydantic import BaseModel, BeforeValidator, ConfigDict, ValidationError
from pydantic.functional_validators import AfterValidator
import astropy.units as u
from functools import partial
def validate_physical_type(value: u.Quantity, physical_type: u.PhysicalType):
if value.unit is None:
raise ValueError(f"Expected physical_type: {physical_type}, but value had no unit")
if (t := value.unit.physical_type) != physical_type:
raise ValueError(f"Expected physical_type: {physical_type}, got {t}")
return value
class ValidatedQuantity(u.Quantity):
@classmethod
def __class_getitem__(cls, physical_type: str):
q = super().__class_getitem__(physical_type)
validate = partial(validate_physical_type, physical_type=get_args(q)[1])
return Annotated[
q,
BeforeValidator(lambda v: cls(v, copy=False)),
AfterValidator(validate)
]
class Foo(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
length : ValidatedQuantity['length'] = 5 * u.m
Foo(length=5 * u.cm)
try:
Foo(length=5 * u.deg)
except ValidationError as e:
print(e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks generally useful! I think for this PR it does not add any value, because currently we only have the EnergyType
. But I'm pretty sure we will need it in future, so maybe you just want to contribute it in a follow up PR yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extending it for an optional required shape might also be interesting, such that the following works:
class Foo(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
arbitrary_length: ValidatedQuantity['length'] = [[5, 5], [3, 3]] * u.m
scalar_length : ValidatedQuantity['length', ()] = 5 * u.m
other_length: ValidatedQuantity['length', (2,)] = [5, 3] * u.m
When no shape is given, no validation is performed.
Co-authored-by: Bruno Khélifi <khelifi@in2p3.fr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adonath . This looks really nice.
Regarding SkyCoord(np.nan, np.nan)
as empty values returned by the validator, I think the point is to avoid errors when accessing some information like this. It also allows easy concatenation of such coordinates when stacking metadata at DL4.
For extra_allow=False
, the main point is how to allow for flexibility for metadata class additions for specific experiments/observatories. I was assuming that we could allow addition of any Metadata
derived object with its specific FITS serialization scheme. Could there be a way to allow only extra MetaData
objects?
gammapy/analysis/config.py
Outdated
@@ -169,8 +131,8 @@ class ExcessMapConfig(GammapyBaseConfig): | |||
|
|||
|
|||
class BackgroundConfig(GammapyBaseConfig): | |||
method: BackgroundMethodEnum = None | |||
exclusion: Path = None | |||
method: Union[BackgroundMethodEnum, None] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not Optional
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks , I think I just forgot to use Optional
here...
pointing: Optional[PointingInfoMetaData] | ||
target: Optional[TargetMetaData] | ||
location: Optional[Union[EarthLocation, str]] | ||
_tag: ClassVar[Literal["observation"]] = "observation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does two things:
_tag
is currently not part of the model definition, because by convention Pydantic treats attributes starting with underscore as private attributes. This is also why theClassVar
is needed. The tag is assigned to the class not the instance.- The Literal only allows exactly this value. However here it has not a real function, except for documentation.
I would actually propose to make the tag part of the model definition and use the Literal
. This will fix the value to the value defined by the Literal.
model_config = ConfigDict( | ||
arbitrary_types_allowed=True, | ||
validate_assignment=True, | ||
extra="forbid", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to allow specific types, i.e. MetaData
derived entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know this is not possible. We would need to work with a container such as List[MetaData]
or Dict[str, MetaData]
to allow for the type validation.
return cls(creator=creator, date=date) | ||
_tag: ClassVar[Literal["creator"]] = "creator" | ||
creator: Optional[str] = f"Gammapy {version}" | ||
date: Optional[TimeType] = Field(default_factory=Time.now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here default_factory
is a function that generates the default value? Hence the now
instead of now()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly! This is the recommended Pydantic pattern for default factories.
instrument: Optional[str] | ||
sub_array: Optional[str] | ||
observation_mode: Optional[str] | ||
obs_id: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether some formats would provide IDs that are not ints. It is probably simpler to assume a unique type indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was confused by this. We could certain support both if needed. But right now I'm not aware of any data that would require the string.
if v is None: | ||
return SkyCoord(np.nan, np.nan, unit="deg", frame="icrs") | ||
elif isinstance(v, SkyCoord): | ||
@field_validator("position", mode="after") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the mode after
here?
Why reimplementing validate_radec_mean
? Rather call it differently no?
gammapy/utils/metadata.py
Outdated
|
||
radec_mean: Optional[SkyCoord] | ||
altaz_mean: Optional[Union[SkyCoord, AltAz]] | ||
radec_mean: Optional[SkyCoord] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be practical to create a SkyCoordICRSType
and SkyCoordAltAzType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be more elegant. I can introduce those.
Signed-off-by: Quentin Remy <quentin.remy@mpi-hd.mpg.de>
Don't we also need to update the |
And the codemeta also |
The last remaining test fail was related to the fact that we use Thanks @bkhelifi and @MRegeard, I also updated the meta and conda env file. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #4750 +/- ##
=======================================
Coverage 75.69% 75.69%
=======================================
Files 228 229 +1
Lines 33841 33836 -5
=======================================
- Hits 25616 25613 -3
+ Misses 8225 8223 -2 ☔ View full report in Codecov by Sentry. |
@registerrier @QRemy PR is ready to merge from my side... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adonath . This looks good.
there is a remaining failure visible in the tutorials. Could it be due to this definition that is not supported in the current scheme:
config.datasets.safe_mask.parameters = {"offset_max": 2.5 * u.deg} |
Since there is no generic
Quantity
type defined with its serializer this has to fail when printing config
.
@registerrier This is due to the generic |
4369788
to
16d849d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adonath . This looks good! No further comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @adonath for this important upgrade!
Thanks @registerrier and @bkhelifi, I'll go ahead and merge now. |
Description
This pull request starts the refactoring to support Pydantic v2.0. I'm running into many issues, but mostly because the Gammapy code base is very inconsistent. Pydantic complains correctly most of the time. I'll start collecting issues here...
Update:
None
as default in many place, without declaring it as allowed type. I change to useOptional
in these cases. I think we should rather change to more meaningful default values, bot not in this PR.PathType
, which does a minimal validation based onmake_path
arbitrary_types_allowed=True
for now, because in Pydantiv v2.0 there is now support for__get_validators__
. This would need to be supported using__get_pydantic_core_schema__
(https://docs.pydantic.dev/latest/usage/types/custom/#customizing-validation-with-__get_pydantic_core_schema__) instead. Which I consider to much work for now.ray
the transition seems to work.