Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark schema fields with None default as Optional to pass pydantic v2 validation #651

Merged
merged 12 commits into from
Jan 6, 2024

Conversation

danielzuegner
Copy link
Contributor

Summary

This PR marks all Fields with None default value as Optional[type] in the type annotation. This enables to pass the Pydantic validation when deserializing. For instance, the following fails for me with atomate=0.0.12 and pydantic=2.5.2:

from atomate2.common.schemas.elastic import ElasticDocument
ElasticDocument.model_validate_json(ElasticDocument().json())

gives:

ValidationError: 6 validation errors for ElasticDocument
structure
  Value error, Must provide Structure, the as_dict form, or the proper [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/value_error
elastic_tensor
  Input should be an object [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/model_type
derived_properties
  Input should be an object [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/model_type
fitting_data
  Input should be an object [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/model_type
fitting_method
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/string_type
order
  Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/int_type

This means that unless all fields are not None in a ElasticDocument, deserializing with validation will fail. This PR fixes the issue.

The behavior change from pydantic v1 to v2 is also confirmed in this GitHub issue.

Checklist

Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request
title.

Before a pull request can be merged, the following items must be checked:

  • Code is in the standard Python style.
    The easiest way to handle this is to run the following in the correct sequence on
    your local machine. Start with running ruff and ruff format on your new code. This will
    automatically reformat your code to PEP8 conventions and fix many linting issues.
  • Doc strings have been added in the Numpy docstring format.
    Run ruff on your code.
  • Type annotations are highly encouraged. Run mypy to
    type check your code.
  • Tests have been added for any new functionality or bug fixes.
  • All linting and tests pass.

Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is highly
recommended that you use the pre-commit hook provided in the repository. Simply run
pre-commit install and a check will be run prior to allowing commits.

Copy link

codecov bot commented Dec 16, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (554903c) 76.07% compared to head (335e30c) 75.92%.

❗ Current head 335e30c differs from pull request most recent head 58bea82. Consider uploading reports for the commit 58bea82 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #651      +/-   ##
==========================================
- Coverage   76.07%   75.92%   -0.16%     
==========================================
  Files          83       83              
  Lines        7031     7031              
  Branches     1042     1042              
==========================================
- Hits         5349     5338      -11     
- Misses       1362     1374      +12     
+ Partials      320      319       -1     
Files Coverage Δ
src/atomate2/common/schemas/cclib.py 77.55% <100.00%> (ø)
src/atomate2/common/schemas/defects.py 87.91% <100.00%> (ø)
src/atomate2/common/schemas/elastic.py 88.65% <100.00%> (ø)
src/atomate2/common/schemas/phonons.py 97.14% <100.00%> (ø)

... and 1 file with indirect coverage changes

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielzuegner Are you able to add a test covering deserialization with optional fields missing?

src/atomate2/common/schemas/phonons.py Outdated Show resolved Hide resolved
src/atomate2/common/schemas/phonons.py Outdated Show resolved Hide resolved
@mkhorton
Copy link
Member

mkhorton commented Jan 3, 2024

Are you able to add a test covering deserialization with optional fields missing?

@janosh Is this strictly necessary, given that it's pydantic functionality?

@mkhorton
Copy link
Member

mkhorton commented Jan 3, 2024

I'm assuming this is related to a change of behavior in Pydantic v2, although I have not been able to find documentation of this yet.

@janosh
Copy link
Member

janosh commented Jan 3, 2024

@janosh Is this strictly necessary, given that it's pydantic functionality?

I think it would be nice to have but of course it's really up to @utf. Given that missing Optionals cause errors in user land, a test for this seems no less useful than elsewhere.

@mkhorton
Copy link
Member

mkhorton commented Jan 3, 2024

To clarify the change this PR makes is that there was an implicit Optional previously (the default was None, but this was not reflected in the type hint), and this PR makes this an explicit Optional instead, so that the default value does not cause a validation error in pydantic v2; in this sense, there is no new functionality here, which is why I'm suggesting a test is not strictly required (the original functionality was also merged without a test, so a fix could also be merged, although of course a test is preferred).

More generally, this kind of functionality is implicitly tested by the validation performed by pydantic itself, which is how this error was detected. Given that this is a regression and the currently released version of atomate2 is broken for elastic workflows (#659) unless this is addressed, I would suggest we prioritize merging or otherwise fixing if this PR is not sufficient. Perhaps there's something I'm missing here, I'm not sure.

My comment here is, in part, because I'm reluctant to add tests for areas I'm not as familiar with, but I would like to see the existing functionality restored.

@JaGeo
Copy link
Member

JaGeo commented Jan 3, 2024

@mkhorton While I completely agree that this needs to be urgently fixed, I would like to add that adding tests for this part would be very relevant (e.g., in a future PR). The changes in pydantic disrupted atomate2 quite a lot.

@janosh
Copy link
Member

janosh commented Jan 4, 2024

More generally, this kind of functionality is implicitly tested by the validation performed by pydantic itself, which is how this error was detected.

Exactly, which should make the test code minimal. Just putting the 2 lines @danielzuegner posted above in a test function might do the trick and would allow us to catch breaking pydantic changes before we make a release in the future.

from atomate2.common.schemas.elastic import ElasticDocument
ElasticDocument.model_validate_json(ElasticDocument().json())

@danielzuegner
Copy link
Contributor Author

More generally, this kind of functionality is implicitly tested by the validation performed by pydantic itself, which is how this error was detected.

Exactly, which should make the test code minimal. Just putting the 2 lines @danielzuegner posted above in a test function might do the trick and would allow us to catch breaking pydantic changes before we make a release in the future.

from atomate2.common.schemas.elastic import ElasticDocument
ElasticDocument.model_validate_json(ElasticDocument().json())

I'm on it, will push something in the next minutes/ hour or so

@danielzuegner
Copy link
Contributor Author

More generally, this kind of functionality is implicitly tested by the validation performed by pydantic itself, which is how this error was detected.

Exactly, which should make the test code minimal. Just putting the 2 lines @danielzuegner posted above in a test function might do the trick and would allow us to catch breaking pydantic changes before we make a release in the future.

from atomate2.common.schemas.elastic import ElasticDocument
ElasticDocument.model_validate_json(ElasticDocument().json())

I'm on it, will push something in the next minutes/ hour or so

@janosh done, added some tests which passed locally for me on my machine.

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look great! Thanks @danielzuegner!

I fixed the ruff error about a missing shebang but some tests are failing with

    return PhononBSDOSDoc.from_forces_born(
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/atomate2/common/schemas/phonons.py", line 300, in from_forces_born
    kpath_dict, kpath_concrete = cls.get_kpath(
AttributeError: type object 'ModelMetaclass' has no attribute 'get_kpath'

Maybe @JaGeo can comment if this has come up before.

@JaGeo
Copy link
Member

JaGeo commented Jan 6, 2024

No, it hasn't

@@ -190,6 +190,7 @@ class PhononBSDOSDoc(StructureMetadata):

uuids: Optional[PhononUUIDs] = Field("Field including all relevant uuids")

@classmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two classmethods?

@JaGeo
Copy link
Member

JaGeo commented Jan 6, 2024

get_kpath is a static method. Maybe that is why?

@janosh janosh added testing Test all the things fix Bug fix PR labels Jan 6, 2024
@janosh janosh added the schema pydantic/emmet schemas label Jan 6, 2024
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All green here. Thanks again @danielzuegner for this fix and @JaGeo for the pointer on where the CI error came from.

@janosh janosh merged commit c9e1376 into materialsproject:main Jan 6, 2024
5 checks passed
@JaGeo
Copy link
Member

JaGeo commented Jan 6, 2024

Thanks for fixing this, @danielzuegner @mkhorton and @janosh !

@mkhorton
Copy link
Member

mkhorton commented Jan 8, 2024

Thanks @danielzuegner, and @janosh for merging!

@utf
Copy link
Member

utf commented Jan 8, 2024

Thanks to everyone who helped getting this merged. Glad to see this fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix PR schema pydantic/emmet schemas testing Test all the things
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants