update to pydantic 2 #625

thomas-maschler · 2024-02-10T01:48:10Z

Related Issue(s):

Update to pydantic 2.0 #593

Description:
This PR updates dependencies to Pydantic 2 and Stac-Pydantic 3
It removes the internal Stac, Search and Opertator Types in favor of Types from Stac Pydantic

PR Checklist:

pre-commit hooks pass locally
Tests pass (run make test)
Documentation has been updated to reflect changes, if applicable, and docs build successfully (run make docs)
Changes are added to the CHANGELOG.

lossyrob · 2024-02-21T01:06:46Z

@thomas-maschler , moving the STAC types away from the TypeDict implementations in stac_fastapi.types to Pydantic types could have performance implications. Specifically I'm worried about the STAC types encoding Client output. These were originally moved away from Pydantic models as they tended to be slow for large payloads, e.g. returning an ItemCollection from search that contains 1000 large Sentinel 2 Items. Running Pydantic validation against known valid Items that come out of PgSTAC as part of response causes additional latency without a lot of value. I know the Pydantic 2 backend changes means validation will be faster, but are we introducing the potential for increased latency in the API routes because of the move away from TypedDicts in e.g. BaseCoreClient in this PR? If so, could the upgrade be done for but leave the TypeDict usage as is? Or have you seen that these changes don't introduce additional latency due to the magic of Rust 🪄?

thomas-maschler · 2024-02-21T01:48:07Z

I can see your concerns regarding latency. I did not benchmark this myself. Others reported a speed improvement of x5 between pydantic 1 and 2. I was assuming that this would be enough to address latency concerns.

The benefits I saw in keeping the pydandic models are peace of mind, due to basic checks and setting of default values you wouldn't have to store in the DB. I have a use case, where data are stored in a legacy system and I have to translate them to STAC at runtime. Being able to push part of this into Pydantic makes the job easier.

If latency is still the main concern I could see several routes

Keep TypedDicts
Be explicit in the documentation that responses are not type checked and it is the responsibility of the user to do so.
Allow to skip validation for ItemCollections and Collections

from pydantic import SkipValidation
from stac_pydantic import api
from typing import Sequence

class ItemCollectionUnsafe(api.ItemCollection):
    features: Sequence[SkipValidation[api.Item]]
 
class CollectionsUnsafe(api.Collections):
    collections: Sequence[SkipValidation[api.Collection]]

With minimal code changes, users can use the Unsafe subclasses as their return model, telling pydantic to skip validation for items and collections when making bulk requests.

Add some configuration options to let the user choose between current implementation and options 1 or 2 above.

rhysrevans3 · 2024-02-21T09:47:56Z

Couple of notes from my pull request:

If typing List and Dict are being used this part will need to be changed as in it's current form it will miss off the Optional typing of nested Lists & Dicts.
The types.links seems to be covered by stac-pydantic links and link_factory.
The stac-pydantic Item/Collection/ItemCollection types are not json serialisable so will need to be json encoded/converted to dictionaries before they're returned from the API.

thomas-maschler · 2024-02-21T19:15:29Z

@rhysrevans3, regarding your first point, can you give me an example of when this fails? I cannot replicate it. I tried this, where the filter attribute is defined as Optional[Dict]

@pytest.mark.parametrize(
    "filter,passes",
    [(None, True), ({"test": "test"}, True), ("test==test", False), ([], False)],
)
def test_create_post_request_model(filter, passes):
    extensions = [FilterExtension()]
    request_model = create_post_request_model(extensions, BaseSearchPostRequest)

    if not passes:
        with pytest.raises(ValidationError):
            model = request_model(filter=filter)
    else:
        model = request_model(
            collections=["test1", "test2"],
            ids=["test1", "test2"],
            bbox=[0, 0, 1, 1],
            datetime="2020-01-01T00:00:00Z",
            limit=10,
            filter=filter,
            **{"filter-crs": "epsg:4326", "filter-lang": "cql2-text"},
        )

        assert model.collections == ["test1", "test2"]
        assert model.filter_crs == "epsg:4326"
        assert model.filter == filter

lossyrob · 2024-02-21T19:31:27Z

@thomas-maschler understood about the value of validation in the response classes for other use cases.

I think a solution where users could skip validation based on configuration would be great. In my previous testing (a year or so ago) it was the Pydantic validation that was causing slowdowns. I'd still want to validate that there's no other performance hits besides validation for using Pydantic models over TypeDicts before integrating these changes, but I have enough confidence that this would address the performance implications, and any unforeseen issues could be addressed in follow up work.

Would you be willing to add that SkipValidation configuration to this PR?

vincentsarago · 2024-02-21T19:47:50Z

about performances, in TiPg we also went for the TypedDict way because pydantic validation was really slowing down the response (even with pydantic 2.0)

rhysrevans3 · 2024-02-22T09:34:58Z

@thomas-maschler I found it was an issue for nested List/Dicts types: for example the sortby field in the SortExtensionPostRequest is

sortby: Optional[List[PostSortModel]]

But when I check the FieldInfo shows required=True

>>> from stac_fastapi.extensions.core.sort.request import SortExtensionPostRequest
>>>SortExtensionPostRequest.model_fields['sortby']
FieldInfo(annotation=Union[List[SortExtension], NoneType], required=True)

thomas-maschler · 2024-02-22T16:01:20Z

@rhysrevans3 this is b/c we didn't set a default value. Right now the model requires either None or a list of PostSortModel. If you set the default to None the FieldInfo will say required=False. However, None is filled in when sortby is not provided so it still works even without a default value.

Regarding your point 2.
The type.Links are no Pydantic Models and are not used anywhere in the package. I prefer to leave them in right now to not break too much. I am not sure if any of the backends currently rely on it.

Regarding your point 3.
All pydantic models are json serializable. either with .model_dump_json() (output as str) or .model_dump(mode="json") (output as dict). I added some test, directly returning the pydantic Item, Collection and ItemCollection and this works for me.

I am still working on the switch between TypeDict and Pydantic response models and will probably push my updates later today.

rhysrevans3 · 2024-02-23T09:09:44Z

@thomas-maschler ah that makes more sense thank you for the explanation. I guess this just need to be filled in the backend I'm using.

Point 2 that sounds like a good idea. But don't the links in the api.landing_page used here need to be stac-pydantic Links? I guess with the validation removed this won't be an issue.

Point 3 sorry yes I meant json.dumps or orjson won't serialise them which I think fastapi uses for returning JSON/GEOJSON content. So was more of a note that the other repos will need to serialise or convert to a dict before returning the api routes. I think this effect the landing pages. however, I might be missing something obvious.

thomas-maschler · 2024-02-23T21:24:21Z

@lossyrob can you take a look at my recent changes, in particular 8118f10.

In types.config I added a parameter validate_response that defaults to False (same as now).
I added types.stac back in
I added types.response_model which implements the switch
In api.core I made sure we always use response_model classes and added some extra type-checking
I added additional tests to make sure this works correctly, in particular, test_response_model and test_app

I stumbled over some other bugs regarding field aliases in the filter extension that I didn't manage to fix. I will file an issue for that.
I also noticed that there are still some pydantic extension models that are either out of sync with stac-pydantic or not present in stac-pydantic. I will port them over and open another PR once they make their way through the system.

jonhealy1 · 2024-04-10T10:23:27Z

@vincentsarago I thought we had results here that seemed to make sense?

thomas-maschler · 2024-04-10T10:33:14Z

I believe we now test for correct validation.

vincentsarago · 2024-04-10T10:35:32Z

I believe we now test for correct validation.

@thomas-maschler Yes

@vincentsarago I thought we had results #650 (comment) that seemed to make sense?

@jonhealy1 absolutely, so this mean that validation does not impact the performance 🤯. I would still keep this optional but it might be great to mention this somewhere

thomas-maschler · 2024-04-10T10:36:49Z

Ok, I'll update the readme

vincentsarago · 2024-04-24T20:32:48Z

thanks @thomas-maschler I was going update this branch tomorrow morning 🙏

let see if we can get this merged before the end of the week 😄

jonhealy1

Nice work. The readme looks great - just a couple of spelling things

CHANGES.md

README.md

stac_fastapi/api/setup.py

stac_fastapi/extensions/setup.py

stac_fastapi/types/setup.py

stac_fastapi/types/stac_fastapi/types/version.py

jonhealy1

Nice work!

…astapi into HEAD

vincentsarago

🚀 thanks @thomas-maschler

Notes:

we should wait for the next stac-pydantic release with Adjust ItemProperties Validation. stac-pydantic#131
I've created a maint-2.x branch (https://github.com/stac-utils/stac-fastapi/tree/maint-2.x) if we need to continue updating old stac-fastapi

jonhealy1 · 2024-04-26T07:17:03Z

@vincentsarago when you say wait do you mean wait to merge?

vincentsarago · 2024-04-26T07:18:14Z

@jonhealy1 sorry I meant, wait for the 3.0 release :-)

I'll merge this now 🥳

vincentsarago · 2024-04-26T07:27:56Z

🤯 😍

There is almost no big difference when there is or is not model validation ❤️

https://stac-utils.github.io/stac-fastapi/benchmarks.html

thomas-maschler added 8 commits February 9, 2024 20:39

update to pydantic 2

013caee

update changelog

669587e

typo

a168498

add CI for Python 3.12

cd9f75e

drop support for python 3.8

781c46e

update python version for docs

ce7f6e8

update python for docs docker container

19cce99

update python version in dockerfile

36044d6

thomas-maschler mentioned this pull request Feb 12, 2024

Updating to stac pydantic 3 #627

Closed

4 tasks

thomas-maschler added 7 commits February 12, 2024 11:53

handle post requests

9210501

test wrapper

1fa87b7

pass through StacBaseModel

653f3a6

keep py38

6727568

change install order

e01e95a

lint

db5cfb6

revert back to >=3.8 in setup.py

02f2702

This was referenced Feb 20, 2024

Update to pydantic 2.0 #593

Closed

Datetime as null is not accepted by the STAC FASTAPI #637

Closed

thomas-maschler added 2 commits February 23, 2024 16:04

add switch to use either TypeDict or StacPydantic Response

8118f10

lint and format with ruff

b52f216

vincentsarago added this to the 3.0.0 milestone Apr 9, 2024

thomas-maschler added 2 commits April 10, 2024 07:03

Add text about response validation to readme.

6b0949a

merge main

9b14b04

vincentsarago added this to In Progress in v3.0 Apr 11, 2024

vincentsarago mentioned this pull request Apr 24, 2024

Patch/fix doc urls in landing #673

Merged

thomas-maschler added 2 commits April 24, 2024 15:12

fix warning

6f1b478

merge main

0bb2019

thomas-maschler requested a review from jonhealy1 April 24, 2024 20:12

jonhealy1 requested changes Apr 25, 2024

View reviewed changes

vincentsarago and others added 4 commits April 26, 2024 08:44

update from main

6d943aa

remove versions

d46d287

fix

641614a

Update README.md

3b50a6d

jonhealy1 self-requested a review April 26, 2024 06:55

jonhealy1 approved these changes Apr 26, 2024

View reviewed changes

vincentsarago mentioned this pull request Apr 26, 2024

pin sub-modules (api/extensions) to specific version #678

Open

vincentsarago added 2 commits April 26, 2024 09:09

update changelog

37594ec

Merge branch 'pydantic2' of https://github.com/thomas-maschler/stac-f…

83d19f8

…astapi into HEAD

vincentsarago approved these changes Apr 26, 2024

View reviewed changes

vincentsarago merged commit 63cac39 into stac-utils:main Apr 26, 2024
7 checks passed

vincentsarago mentioned this pull request Apr 26, 2024

use Collection Pydantic model in PutCollection transaction #679

Merged

vincentsarago moved this from In Progress to Done in v3.0 May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update to pydantic 2 #625

update to pydantic 2 #625

thomas-maschler commented Feb 10, 2024

lossyrob commented Feb 21, 2024

thomas-maschler commented Feb 21, 2024 •

edited

rhysrevans3 commented Feb 21, 2024

thomas-maschler commented Feb 21, 2024

lossyrob commented Feb 21, 2024

vincentsarago commented Feb 21, 2024

rhysrevans3 commented Feb 22, 2024

thomas-maschler commented Feb 22, 2024

rhysrevans3 commented Feb 23, 2024

thomas-maschler commented Feb 23, 2024

jonhealy1 commented Apr 10, 2024

thomas-maschler commented Apr 10, 2024

vincentsarago commented Apr 10, 2024

thomas-maschler commented Apr 10, 2024

vincentsarago commented Apr 24, 2024

jonhealy1 left a comment

jonhealy1 left a comment

vincentsarago left a comment

jonhealy1 commented Apr 26, 2024

vincentsarago commented Apr 26, 2024

vincentsarago commented Apr 26, 2024 •

edited

update to pydantic 2 #625

update to pydantic 2 #625

Conversation

thomas-maschler commented Feb 10, 2024

lossyrob commented Feb 21, 2024

thomas-maschler commented Feb 21, 2024 • edited

rhysrevans3 commented Feb 21, 2024

thomas-maschler commented Feb 21, 2024

lossyrob commented Feb 21, 2024

vincentsarago commented Feb 21, 2024

rhysrevans3 commented Feb 22, 2024

thomas-maschler commented Feb 22, 2024

rhysrevans3 commented Feb 23, 2024

thomas-maschler commented Feb 23, 2024

jonhealy1 commented Apr 10, 2024

thomas-maschler commented Apr 10, 2024

vincentsarago commented Apr 10, 2024

thomas-maschler commented Apr 10, 2024

vincentsarago commented Apr 24, 2024

jonhealy1 left a comment

Choose a reason for hiding this comment

jonhealy1 left a comment

Choose a reason for hiding this comment

vincentsarago left a comment

Choose a reason for hiding this comment

jonhealy1 commented Apr 26, 2024

vincentsarago commented Apr 26, 2024

vincentsarago commented Apr 26, 2024 • edited

thomas-maschler commented Feb 21, 2024 •

edited

vincentsarago commented Apr 26, 2024 •

edited