Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better annotations support #1166

Open
hgrecco opened this issue Aug 29, 2020 · 40 comments
Open

Better annotations support #1166

hgrecco opened this issue Aug 29, 2020 · 40 comments

Comments

@hgrecco
Copy link
Owner

hgrecco commented Aug 29, 2020

With PEP560 we could now try to have a better annotations experience for Pint. Briefly, my proposal would be to do something like this

class Model:

    value: Quantity['m/s']

or

class Model:

    value: Quantity['[length]/[time]']

and the provide to a nice API to check for this.

What do you think?

@hgrecco
Copy link
Owner Author

hgrecco commented Aug 30, 2020

and also these examples would be:

>>> @ureg.awrap
... def mypp(length: Quantity['meter]') -> Quantity['second]':
...     return pendulum_period(length)

and

>>> @ureg.acheck
... def pendulum_period(length: Quantity['[length]'):
...     return 2*math.pi*math.sqrt(length/G)

where awrapand acheck is are annotated equivalents to wrap and check.

@jmuhlich
Copy link

jmuhlich commented Sep 1, 2020

It would also be fantastic to have Mypy support for checking these annotations statically! Happy to contribute where I can.

@dopplershift
Copy link
Contributor

I haven't started using annotations in MetPy (yet), so I don't have any practical experience to rely on to see any obvious gotchas. In general, though, those look reasonable.

@hgrecco
Copy link
Owner Author

hgrecco commented Dec 28, 2020

I was playing with this concept. Some things to discuss:

  1. Can annotations be done with units (e.g. m/s) and dimensions (e.g. [length]/[seconds])?
    Yes. As there are valid use cases for both (e.g. wrapping vs checking)
  2. What is the output type of Quantity['m/s']?
  • A str? No.
  • A Quantity?
  • A new class (e.g. TypedQuantity)?
  • A UnitContainer or similar?
  1. Can we annotate with ureg.meter?

@jules-ch
Copy link
Collaborator

jules-ch commented Dec 28, 2020

It would be nice to set the expected type magnitude should return either np.array, float or any supported types. It is sometimes confusing for the user to guess it when it is simply typed with Quantity.

Like collection types are doing like List[float] or Tuple[str, int], you then know what's inside.

@claudiofinizio
Copy link

Hello,
I am writing a webapp for designing rural water supplies and I made extensive use of both pint and mypy. I would therefore be glad to contribute exposing Quantity to mypy.

My objective is to write code like follows:

class WaterPipeline:
    @property`
    def get_pipeline_pathlength(self): -> Quantity['length']
        """"""

In my case annotations should be done using dimensions: in the example above it is important to check that a Quantity['length'] is returned, but such length may be expressed in meters or kilometers.

I also agree with @jules-ch comment above about the expected type of magnitude.

@dopplershift
Copy link
Contributor

@hgrecco NumPy has also been adding annotation support for ndarray inputs, so it would be important IMO to make sure whatever is done here compatible/sensible with that.

@hgrecco
Copy link
Owner Author

hgrecco commented Dec 29, 2020

How about something like

  • Quantity: any quantity
  • Quantity[t]: a quantity with t magnitude. eg. float, int, ndarray (or list)
  • Quantity[s] with s a string: a quantity with units (or dimensions) given by s. eg. m/s, [length]/[time]
  • Quantity[t, s]

But I would be more worried how to handle this.

  • What is the output type?
  • Which convenience functions or methods do we add to make this useful?
  • How do we explain this?

@jules-ch
Copy link
Collaborator

jules-ch commented Dec 29, 2020

type annotation of the magnitude should be the first thing we should target since Quantity type is a container just like List Tuple. Second should be unit or dimension.
Just like you said @hgrecco, so something like

Quantity [type, unit]
Quantity [type, dimension]

Type annotation for mypy usage, and we can have dimensions or unit telling the user which unit or dimension expect at first.
And we can go further with checking unit or dimension at runtime.

@hgrecco
Copy link
Owner Author

hgrecco commented Dec 29, 2020

For some internal projects, I have tried three different approaches to annotations for the output of Something[args] (where something is a class):

  1. an instance of another class. This is what python 3.9 does for containers. e.g. list[str] returns GenericAlias(list, str). Two options branch here: (1a) use GenericAlias or (1b) create a new class with extra methods.
  2. an instance of TypedSomething which is a subclass of Something and args are stored as instance variables.
  3. a new class (a different for every arg)

I would discourage (3) in pint but not so sure about the other two.
Option 1a is the simple way to go but no so ergonomic. Option 1b is better, because new methods could be added to test for equivalence between annotations or if a given quantity satisfy an annotation.

Option 2 would allow for things like the following:

ScalarVelocityQ = Quantity[float, '[speed]']
q1 = ScalarVelocityQ(3, 'm/s')
q2 = ScalarVelocityQ(3, 's') # Exception is raised

In any case, I think we need to add good annotation introspection capability because we want to be able to evolve this without breaking everything. We need to avoid having to provide something like this https://stackoverflow.com/a/52664522/482819

@jules-ch
Copy link
Collaborator

jules-ch commented Jan 4, 2021

We could take a look at https://docs.python.org/3/library/typing.html#typing.Annotated which describe what we want to achieve I think.

@claudiofinizio
Copy link

type annotation of the magnitude should be the fist thing we should target since Quantity type is a container just like List Tuple. Second should be unit or dimension.
Just like you said @hgrecco, so something like

Quantity [type, unit]
Quantity [type, dimension]

Type annotation for mypy usage, and we can have dimensions or unit telling the user which unit or dimension expect at first.
And we can go further with checking unit or dimension at runtime.

Referring @jules-ch, in my opinion Quantity is not just a container. My perception: if I read somebody's code, I would like first to see if the return value of a function represents, say, a length, or energy, or pressure or whatever. Only after I would be interested to understand if that energy is, say, integer, float or some numpy type. Or at least, this is the way I see when "you first glance at somebody's code"

In short I think Quantity[dimension] should be the first info somebody looks for. Accordingly, I think "option 2" proposed by @hgrecco: ScalarVelocityQ = Quantity[float, '[speed]'] seems me the best approach.

@tgpfeiffer
Copy link

Just as a note, not sure how relevant it is to this issue: I tried to add type annotations to the python-measurement library a while ago, hoping that I could write something like l: Length = Length(2, "m") / 5 or v: Speed = Length(2, "m") / Time(1.5, "s") if there is an appropriate @overload annotation for Length.__div__. However, as I briefly summarized in coddingtonbear/python-measurement#43 (comment) (enum item (3)) and also discussed in python/mypy#4985 (comment), annotations for operators like __mul__ and __div__ are a bit trickier than for ordinary methods, because the resulting type of a * b is not only determined by the left operand's __mul__ method, but could also come from the right operand's __rmul__ method. As I wrote above, I'm not sure how relevant this is for annotating the Pint module, but you may hit this at some point, so I just wanted to leave a note here.

@jules-ch
Copy link
Collaborator

jules-ch commented Jan 6, 2021

There are multiple use cases that we should address:

  • Static type analysis with mypy (magnitude type falls under this)
    • Quantity should be a Generic.
  • Documentation (which dimension or unit to expect)
    • it's difficult (see @tgpfeiffer comment) to do static analysis with this info.
  • Runtime check which is related to @hgrecco comment.
    ScalarVelocityQ = Quantity[float, '[speed]']
    q1 = ScalarVelocityQ(3, 'm/s')
    q2 = ScalarVelocityQ(3, 's') # Exception is raised

IMO the best option is :

Make Quantity Generic & use utilities class to return Annotated Types with PEP593 with Metadata that can be used for runtime checks.

  
T = TypeVar("T")
class Quantity(Generic[T],QuantityGeneric, PrettyIPython, SharedRegistryObject):
  ...
  
    @property
    def magnitude(self) -> T:
        """Quantity's magnitude. Long form for `m`"""
        return self._magnitude
  ...
    def __iter__(self) -> Iterator[T]:
  ...
    def to(self, other=None, *contexts, **ctx_kwargs) -> "Quantity[T]":

I tried something like this :

from typing import _tp_cache, _type_check
from typing import _AnnotatedAlias


class QuantityAlias(_AnnotatedAlias, _root=True):
    def __call__(self, *args, **kwargs):
        quantity = super().__call__(*args, **kwargs)
        
        if self.__metadata__:
            dim = quantity._REGISTRY.get_dimensionality(self.__metadata__[0])
            if not quantity.check(dim):
                raise TypeError("Dimensionality not matched")

        return quantity


class TypedQuantity:
    @_tp_cache
    def __class_getitem__(cls, params):
        from pint.quantity import Quantity
        msg = "TypedQuantity[t, ...]: t must be a type."
        origin = _type_check(Quantity[params[0]], msg)
        metadata = tuple(params[1:])
        return QuantityAlias(origin, metadata)

Here we make a simple check at runtime for dimension just like @hgrecco example.

So TypedQuantity[float, "[length]"] will be translated to Annotated[Quantity[float], "length"]

We could go further like it is done here https://docs.python.org/3/library/typing.html#typing.Annotated.

We could translate to something like Annotated[Quantity[float], DimensionCheck("length")].

Those metadata can be added to the instance if needed.

I'll try to Draft a PR.

@hgrecco
Copy link
Owner Author

hgrecco commented Jan 8, 2021

@jules-ch I really like your proposal. I am eager to see the draft PR. Great discussion everybody!

@jules-ch jules-ch mentioned this issue Mar 7, 2021
4 tasks
@jamesbraza
Copy link

I would like to make a plug within my company's software team to use pint for units. Having typing is a huge plus.

I see #1259 was merged, is that the only PR needed for typing, or is there more work to be done? When do you think a release will be cut that incorporates that PR?

@jules-ch
Copy link
Collaborator

We'll make 0.18 release soon, prob end of the month.

pint typing support will be experimental at first, I still need to document it.
I'll push for a new version of documentation, just haven't got the time lately.

@nunupeke
Copy link

Hi. I'm currently experimenting with the new typing features in v0.18 (#1259). How would I annotate functions or classes that handle float / np.ndarray equivalently to Quantity[float] / Quantity[np.ndarray]. For example, how would I annotate the following generic function correctly:

from typing import TypeVar
import numpy as np
from pint import Quantity

A = TypeVar('A', np.ndarray, Quantity[np.ndarray])

def get_index(array: A, i: int) -> ???:
    return array[i]

I am aware that the same is relatively straightforward for example for lists,

from typing import TypeVar, List

T  = TypeVar('T')

def get_index(l: List[T], i: int) -> T:
    return l[i]

but I'm having a hard time translating it to the pint.Quantity context.

@tgpfeiffer
Copy link

I think you'd need to use numpy.typing.NDArray[X] rather than numpy.ndarray and then you can return X, see https://stackoverflow.com/a/68817265/3663881 (although array[i] could be something else than X if array is a higher-dimensional array; I guess we need to wait for shape support in numpy.typing before you can actually write that safely).

@nunupeke
Copy link

Ok, you are right. My example function is not ideal. What I was really trying to find is an annotation that says: "if you use numpy arrays here, expect scalars there" and equivalently "if you use array quantites here, expect scalar quantities there" or vice versa. Another example:

from typing import TypeVar, Generic
import numpy as np
from pint import Quantity

A = TypeVar('A', np.ndarray, Quantity[np.ndarray])

class Converter(Generic[A]):
    def __init__(self, scale: "float in case A is np.ndarray / Quantity[float] in case A is Quantity[np.ndarray]"):
        self.scale = scale

    def convert(self, array: A) -> A:
        return A/self.scale

@tgpfeiffer
Copy link

I see. I think in that case you are looking for typing.overload, there you can have multiple annotations for the same function that specify further what goes in and out.

For the function you are implementing I think you will need a type annotation like

def get_index(array: Union[np.ndarray, Quantity[np.ndarray]], i: int) -> Union[float, Quantity[float]]:
    return array[i]

but as you write that's not specific enough, a mypy run on

data = np.asarray([3., 4.])
data_q = Q_(data, 'meter')

reveal_type(get_index(data, 0))
reveal_type(get_index(data_q, 0))

prints

test.py:20: note: Revealed type is "Union[builtins.float, pint.quantity.Quantity[builtins.float]]"
test.py:21: note: Revealed type is "Union[builtins.float, pint.quantity.Quantity[builtins.float]]"

If you add @overload declarations like

@overload
def get_index(array: np.ndarray, i: int) -> float: ...

@overload
def get_index(array: Quantity[np.ndarray], i: int) -> Quantity[float]: ...

then mypy prints

test.py:20: note: Revealed type is "builtins.float"
test.py:21: note: Revealed type is "pint.quantity.Quantity[builtins.float]"

@MichaelTiemannOSC
Copy link
Collaborator

MichaelTiemannOSC commented Dec 27, 2021

I'm now suddenly interested in this. We have data providers handing us a mis-mash of TWh and PJ energy generation data and we'd like to keep our units straight. We are also using Pydantic. My first attempt to add a Quantity field resulted in this error message (using Pint 0.18):

TypeError: Fields of type "<class 'pint.quantity.Quantity'>" are not supported.

Worked around by adding

    class Config:
            arbitrary_types_allowed = True

to the models I'm enhancing with Quantity.

@shimwell
Copy link

Super interested in the use of Pint type hinting with Pydantic types.

Wondering if you were able to add something like PositiveFloat or other Pydantic types to your example @MichaelTiemannOSC

from pydantic import BaseModel, PositiveFloat
from pint import Quantity

            
class PowerPlant(BaseModel):
    power_generation: Quantity['watt']
    class Config:
        arbitrary_types_allowed = True

noor_solar = PowerPlant(power_generation=Quantity(160, 'megawatt'))

noor_solar.power_generation

@MichaelTiemannOSC
Copy link
Collaborator

Should be able to share some findings soon. I have an issue filed with pandas to sort out an ExtensionArray problem (pandas-dev/pandas#45240) and am working with some smart people (copied) on how to make this play well with both database connectors and REST APIs.

@erikerlandson @caldeirav @joriscram

@jules-ch
Copy link
Collaborator

jules-ch commented Mar 2, 2022

@hgrecco astropy introduced something similar that we can implement, using Annotated typing that I outlined in previous comments.

astropy/astropy@0deb5c5

@deeplook
Copy link

deeplook commented May 3, 2022

Really curious on any progress on this as I'm getting into this very topic and have some ugly workarounds like:

from pydantic import BaseModel, validator
from pint import Quantity

ureg = pint.UnitRegistry()

class MyModel(BaseModel):
    distance: str
    
    @validator("distance")
    def is_length(cls, v):
        q = ureg.Quantity(v)
        assert q.check("[length]"), "dimensionality must be [length]"
        return q
>>> MyModel(distance="2 ly").distance
2 light_year

@mcleantom
Copy link

mcleantom commented May 3, 2022

Really curious on any progress on this as I'm getting into this very topic and have some ugly workarounds like:

from pydantic import BaseModel, validator
from pint import Quantity

ureg = pint.UnitRegistry()

class MyModel(BaseModel):
    distance: str
    
    @validator("distance")
    def is_length(cls, v):
        q = ureg.Quantity(v)
        assert q.check("[length]"), "dimensionality must be [length]"
        return q
>>> MyModel(distance="2 ly").distance
2 light_year

I made a quick, slightly nicer, workaround based off your workaround

from pydantic import BaseModel
import pint


class PintType:
    Q = pint.Quantity

    def __init__(self, q_check: str):
        self.q_check = q_check

    def __get_validators__(self):
        yield self.validate

    def validate(self, v):
        q = self.Q(v)
        assert q.check(self.q_check), f"Dimensionality must be {self.q_check}"
        return q


Length = PintType("[length]")

class MyModel(BaseModel):
    distance: Length

    class Config:
        json_encoders = {
            pint.Quantity: str
        }

@deeplook
Copy link

deeplook commented May 3, 2022

I made a quick, slightly nicer, workaround based off your workaround

Indeed, thanks!

@sanbales
Copy link

sanbales commented Sep 24, 2022

Thank you all for posting this, it has been incredibly helpful.

One thing I had to mention is that I was having issues with the example above because the fields were objects and not classes, so I tweaked things a bit to support jsonschema output and assignment validation.

Here is a public gist with a more complete example.

Open to any suggestions on how to improve this:

from pint import Quantity, Unit, UnitRegistry
from pydantic import BaseModel


registry = UnitRegistry()

schema_extra = dict(definitions=[
    dict(
        Quantity=dict(type="string"),
    )
])


def quantity(dimensionality: str) -> type:
    """A method for making a pydantic compliant Pint quantity field type."""

    try:
        registry.get_dimensionality(dimensionality)
    except KeyError:
        raise ValueError(f"{dimensionality} is not a valid dimensionality in pint!")

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, value):
        quantity = Quantity(value)
        assert quantity.check(cls.dimensionality), f"Dimensionality must be {cls.dimensionality}"
        return quantity

    @classmethod
    def __modify_schema__(cls, field_schema):
        field_schema.update(
            {"$ref": "#/definitions/Quantity"}
        )
    
    return type(
        "Quantity",
        (Quantity,),
        dict(
            __get_validators__=__get_validators__,
            __modify_schema__=__modify_schema__,
            dimensionality=dimensionality,
            validate=validate,
        ),
    )


class MyModel(BaseModel):

    distance: quantity("[length]")
    speed: quantity("[length]/[time]")

    class Config:
        validate_assignment = True
        schema_extra = schema_extra
        json_encoders = {
            Quantity: str,
        }
model = MyModel(distance="1.5 ly", speed="15 km/hr")
model
>>> MyModel(distance=<Quantity(1.5, 'light_year')>, speed=<Quantity(15.0, 'kilometer / hour')>)

# check the jsonschema, could make the definition for Quantity better...
print(MyModel.schema_json(indent=2))
>>> {
  "title": "MyModel",
  "type": "object",
  "properties": {
    "distance": {
      "$ref": "#/definitions/Quantity"
    },
    "speed": {
      "$ref": "#/definitions/Quantity"
    }
  },
  "required": [
    "distance",
    "speed"
  ],
  "definitions": [
    {
      "Quantity": {
        "type": "string"
      }
    }
  ]
}

# convert to a python dictionary
model.dict()
>>> {'distance': 1.5 <Unit('light_year')>, 'speed': 15.0 <Unit('kilometer / hour')>}

# serialize to json
print(model.json(indent=2))
>>> {
  "distance": "1.5 light_year",
  "speed": "15.0 kilometer / hour"
}

import json

# load from json
MyModel.parse_obj(json.loads(model.json()))
>>> MyModel(distance=<Quantity(1.5, 'light_year')>, speed=<Quantity(15.0, 'kilometer / hour')>)

# test that it raises error when assigning wrong quantity kind
model.distance = "2 m/s"

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In [14], line 1
----> 1 model.distance = "2 m/s"

File C:\mf\envs\jafte\lib\site-packages\pydantic\main.py:385, in pydantic.main.BaseModel.__setattr__()

ValidationError: 1 validation error for MyModel
distance
  Dimensionality must be [length] (type=assertion_error)

@MichaelTiemannOSC
Copy link
Collaborator

MichaelTiemannOSC commented Oct 9, 2022

@sanbales that was incredibly helpful code! I'm now trying to build a production_quantity function that validates that a given Quantity is among the types of quantities that we deal with in "production". I have written this:

schema_extra = dict(definitions=[
    dict(
        Quantity=dict(type="string"),
        ProductionQuantity=dict(type="List[str]"),
    )
])

class ProductionQuantity(BaseModel):

    dims_list: List[str]

    @validator('dims_list')
    def units_must_be_registered(cls, v):
        for d in v:
            try:
                registry.get_dimensionality(d)
            except KeyError:
                raise ValueError(f"{d} is not a valid dimensionality in pint!")
        return v

    class Config:
        validate_assignment = True
        schema_extra = schema_extra
        json_encoders = {
            Quantity: str,
        }

def production_quantity(dims_list: List[str]) -> type:
    """A method for making a pydantic compliant Pint production quantity."""

    try:
        for dimensionality in dims_list:
            registry.get_dimensionality(dimensionality)
    except KeyError:
        raise ValueError(f"{dimensionality} is not a valid dimensionality in pint!")

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, value):
        quantity = Quantity(value)
        for dimensionality in cls.dims_list:
            if quantity.check(dimensionality):
                return quantity
        raise DimensionalityError(value.units, f"in [{cls.dims_list}]")

    @classmethod
    def __modify_schema__(cls, field_schema):
        field_schema.update(
            {"$ref": "#/definitions/ProductionQuantity"}
        )
    
    return type(
        "ProductionQuantity",
        (ProductionQuantity,),
        dict(
            __get_validators__=__get_validators__,
            __modify_schema__=__modify_schema__,
            dims_list=dims_list,
            validate=validate,
        ),
    )

But pydantic gives me this error, which I haven't been able to fully grok:

TypeError: The type of ProductionQuantity.dims_list differs from the new default value; if you wish to change the type of this field, please use a type annotation

@MichaelTiemannOSC
Copy link
Collaborator

MichaelTiemannOSC commented Oct 9, 2022

I got past that error by changing the dims_list=dims_list to f"List[str] = {dims_list}" (based on a reading of pydantic/pydantic#757 (comment)

I'm still working out some other bits, so please don't take the above as correct reference code. It's more a reference to my current state of problems than a solution.

@sanbales
Copy link

Thanks @MichaelTiemannOSC , I was not familiar with the pydantic check for redefined field types, this is good to know! I didn't intend my code to be the correct reference code either, but it'd be nice to have a well defined way of integrating pint with pydantic. Appreciate you looking into it and sharing your code. I've gone back and updated that gist a few more times since I posted this, trying to make it a bit cleaner, but it feels like it could be simplified further.

@MichaelTiemannOSC
Copy link
Collaborator

MichaelTiemannOSC commented Oct 10, 2022

Cool. Here's a link to the code repository where I'm bringing together Pint, pydantic, uncertainties, and pandas: https://github.com/MichaelTiemannOSC/ITR/tree/template-v2

LecrisUT added a commit to LecrisUT/tmt that referenced this issue Mar 6, 2024
hgrecco/pint#1166
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
LecrisUT added a commit to LecrisUT/tmt that referenced this issue Mar 6, 2024
hgrecco/pint#1166
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
LecrisUT added a commit to LecrisUT/tmt that referenced this issue Mar 8, 2024
hgrecco/pint#1166
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
LecrisUT added a commit to LecrisUT/tmt that referenced this issue Mar 8, 2024
hgrecco/pint#1166
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
LecrisUT added a commit to LecrisUT/tmt that referenced this issue Apr 2, 2024
hgrecco/pint#1166
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
LecrisUT added a commit to LecrisUT/tmt that referenced this issue Apr 2, 2024
hgrecco/pint#1166
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
@edelmanjm
Copy link

Sorry to neco this thread, but what's the status on type hints? The previously linked repo appears to be gone.

@MichaelTiemannOSC
Copy link
Collaborator

MichaelTiemannOSC commented Jul 21, 2024

It has since been merged into the main repository: https://github.com/os-climate/ITR. Not that this repository doesn't itself contain pandas, pint, or pint-pandas. I have created some local versions of those, but things have drifted as uncertainties proved more challenging to bring into pint-pandas than expected.

@uellue
Copy link

uellue commented Sep 3, 2024

Thank you for the examples! Here's an example on how to make it work with annotations in Pydantic 2: https://github.com/LiberTEM/LiberTEM-schema/blob/c096d5337f21c78232134ad9d9af19b8405b1992/src/libertem_schema/__init__.py#L1

(edit: inline code here)

from typing import Any, Sequence

from typing_extensions import Annotated
from pydantic_core import core_schema
from pydantic import (
    BaseModel,
    GetCoreSchemaHandler,
    WrapValidator,
    ValidationInfo,
    ValidatorFunctionWrapHandler,
)

import pint


__version__ = '0.1.0.dev0'

ureg = pint.UnitRegistry()


class DimensionError(ValueError):
    pass


_pint_base_repr = core_schema.tuple_positional_schema(items_schema=[
    core_schema.float_schema(),
    core_schema.str_schema()
])


def to_tuple(q: pint.Quantity):
    base = q.to_base_units()
    return (float(base.magnitude), str(base.units))


class PintAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:
        return core_schema.json_or_python_schema(
            json_schema=_pint_base_repr,
            python_schema=core_schema.is_instance_schema(pint.Quantity),
            serialization=core_schema.plain_serializer_function_ser_schema(
                to_tuple
            ),
        )


_length_dim = ureg.meter.dimensionality
_angle_dim = ureg.radian.dimensionality
_pixel_dim = ureg.pixel.dimensionality


def _make_handler(dimensionality: str):
    def is_matching(
                q: Any, handler: ValidatorFunctionWrapHandler, info: ValidationInfo
            ) -> pint.Quantity:
        # Ensure target type
        if isinstance(q, pint.Quantity):
            pass
        elif isinstance(q, Sequence):
            magnitude, unit = q
            # Turn into Quantity: measure * unit
            q = magnitude * ureg(unit)
        else:
            raise ValueError(f"Don't know how to interpret type {type(q)}.")
        # Check dimension
        if not q.check(dimensionality):
            raise DimensionError(f"Expected dimensionality {dimensionality}, got quantity {q}.")
        # Return target type
        return q

    return is_matching


Length = Annotated[
    pint.Quantity, PintAnnotation, WrapValidator(_make_handler(_length_dim))
]
Angle = Annotated[
    pint.Quantity, PintAnnotation, WrapValidator(_make_handler(_angle_dim))
]
Pixel = Annotated[
    pint.Quantity, PintAnnotation, WrapValidator(_make_handler(_pixel_dim))
]


class Simple4DSTEMParams(BaseModel):
    '''
    Basic calibration parameters of a strongly simplified model
    of a 4D STEM experiment.

    See https://github.com/LiberTEM/Microscope-Calibration
    and https://arxiv.org/abs/2403.08538
    for the technical details.
    '''
    overfocus: Length
    scan_pixel_pitch: Length
    camera_length: Length
    detector_pixel_pitch: Length
    semiconv: Angle
    cy: Pixel
    cx: Pixel
    scan_rotation: Angle
    flip_y: bool

Usage from https://github.com/LiberTEM/LiberTEM-schema/blob/c096d5337f21c78232134ad9d9af19b8405b1992/tests/test_schemas.py#L1

def test_smoke():
    params = Simple4DSTEMParams(
        overfocus=0.0015 * ureg.meter,
        scan_pixel_pitch=0.000001 * ureg.meter,
        camera_length=0.15 * ureg.meter,
        detector_pixel_pitch=0.000050 * ureg.meter,
        semiconv=0.020 * ureg.radian,
        scan_rotation=330. * ureg.degree,
        flip_y=False,
        cy=(32 - 2) * ureg.pixel,
        cx=(32 - 2) * ureg.pixel,
    )
    as_json = params.model_dump_json()
    pprint.pprint(("as json", as_json))
    from_j = from_json(as_json)
    pprint.pprint(("from json", from_j))
    res = Simple4DSTEMParams.model_validate(from_j)
    pprint.pprint(("validated", res))
    assert isinstance(res.overfocus, Quantity)
    assert isinstance(res.flip_y, bool)
    assert res == params
  • Users can populate schema with plain pint.Quantity in any way they like, as long as the magnitude maps to float
  • Serialization and deserialization to (float, str)
  • Validator works as expected and catches dimension mismatch
  • Emit SI base units, accept all units supported by Pint

To be figured out:

  • Support other types than float, possibly with Unions and appropriate representations
  • Allow JSON validation of dimensionality, possibly by enforcing SI base units as string constants?

Is this useful? If yes, what would be a good way to make it easily available to others?

CC @sk1p

@blakeNaccarato
Copy link

blakeNaccarato commented Sep 3, 2024

@uellue

Is this useful?

Yes!

If yes, what would be a good way to make it easily available to others?

I'm reminded of the project organization and code layout of https://github.com/p2p-ld/numpydantic, which exposes NumPy array shape validation as a Pydantic annotation similarly to how your code customizes Pint types. Could be useful to adopt some of its structure if you were to wrap up the Pint validator as its own pip installable thing!

Ping me if you need any pointers regarding PyPI packaging/release workflows (edit: Ah, I see over at LiberTEM you've already got release workflows down).

Edit

  • Allow JSON validation of dimensionality, possibly by enforcing SI base units as string constants?

I initially linked numpydantic above as an example for project layout for simple distribution of your annotated Pint types, but the abstract parent class numpydantic.interface.Interface may actually be directly useful in implementing some of your Pint semantics, as it handles general numeric types (e.g. Python floats/ints but also NumPy types).

Numpydantic is stable at 1.0, but the "coming soon" goals of general metadata and extensibility may make it easier to implement some of these Pint needs. I would say a standalone package that exposes the simple validator is the closer reach, then using the metadata/extensible bits of Numpydantic in the future for robustness without having to re-implement a bunch of machinery.

@uellue
Copy link

uellue commented Sep 4, 2024

Good to hear that you like it! :-)

I'm reminded of the project organization and code layout of https://github.com/p2p-ld/numpydantic, which exposes NumPy array shape validation as a Pydantic annotation similarly to how your code customizes Pint types. Could be useful to adopt some of its structure if you were to wrap up the Pint validator as its own pip installable thing!

Oh, that one looks interesting! Indeed, this one could be a good template and address the magnitude portion.

Numpydantic is stable at 1.0, but the "coming soon" goals of general metadata and extensibility may make it easier to implement some of these Pint needs. I would say a standalone package that exposes the simple validator is the closer reach, then using the metadata/extensible bits of Numpydantic in the future for robustness without having to re-implement a bunch of machinery.

Hm ok, probably good to experiment a bit and explore options before releasing 0.1. It feels like numpydantic as-is can provide the magnitude, and the units are orthogonal to it, similar to how pint handles it, right? That would mean the schema composition "magnitude + units" makes sense as a separate package and not part of numpydantic or pint.

@uellue
Copy link

uellue commented Sep 18, 2024

Here's a version that integrates with numpydantic. Thank you for the pointer!

LiberTEM/LiberTEM-schema#7

@uellue
Copy link

uellue commented Sep 18, 2024

LiberTEM/LiberTEM-schema#7

Would the machinery make sense as part of pint, by the way? One could put it into pint.schema, for example. Would mostly require documentation, possibly more tests, and probably more default type definitions like Speed, Weight etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests