Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mashumaro DataClassORJSONMixin #2080

Merged

Conversation

quinten-flwls
Copy link
Contributor

@quinten-flwls quinten-flwls commented Jan 2, 2024

Why are the changes needed?

The mashumaro DataClassORJSONMixin class works like the mashumaro DataClassJSONMixin class. Additionally, it allows for serializing dataclasses with a wider variety of attributes types, including:

Currently, flytekit does not yet support the DataClassJSONMixin class and throws an error when classes derived from DataClassORJSONMixin are serialized.

What changes were proposed in this pull request?

This PR introduces support for classes derived from the mashumaro DataClassORJSONMixins class.

How was this patch tested?

Added DataClassORJSONMixin instances in test_type_engine.py

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

welcome bot commented Jan 2, 2024

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

For the CI to run, we need to add orjson as a developer dependency in https://github.com/flyteorg/flytekit/blob/master/dev-requirements.in

As for testing, can you add a test that checks the expected behavior for your NumPy use case?

@@ -24,6 +23,7 @@
from google.protobuf.struct_pb2 import Struct
from marshmallow_enum import EnumField, LoadDumpOptions
from mashumaro.mixins.json import DataClassJSONMixin
from mashumaro.mixins.orjson import DataClassORJSONMixin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to add orjson as a hard dependency for flytekit. Can you write the PR in way where orjson is a soft dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, orjson is a soft dependency now

@quinten-flwls
Copy link
Contributor Author

quinten-flwls commented Jan 17, 2024

Thank you for the PR!

For the CI to run, we need to add orjson as a developer dependency in https://github.com/flyteorg/flytekit/blob/master/dev-requirements.in

As for testing, can you add a test that checks the expected behavior for your NumPy use case?

Hi Thomas, thank you for your feedback! I will incorporate it. As for dev dependendies, we need both orjson and mashumaro>=3.11. What is the best way for me to specify a minimal version in the dev dependencies?

@Fatal1ty
Copy link

👋 Hi, author of mashumaro here.

Mashumaro 3.11 introduced the DataClassORJSONMixin class

As for dev dependendies, we need both orjson and mashumaro>=3.11. What is the best way for me to specify a minimal version in the dev dependencies?

To avoid all possible unwanted changes made on false conclusions, I’d like to make a correction. DataClassORJSONMixin was added earlier in 3.1 (see release notes). I hope it’s not that important in case of this pull request and I apologize for bothering you.

Signed-off-by: Quinten Roets <quinten.roets@flawlessai.com>
@quinten-flwls
Copy link
Contributor Author

👋 Hi, author of mashumaro here.

Mashumaro 3.11 introduced the DataClassORJSONMixin class

As for dev dependendies, we need both orjson and mashumaro>=3.11. What is the best way for me to specify a minimal version in the dev dependencies?

To avoid all possible unwanted changes made on false conclusions, I’d like to make a correction. DataClassORJSONMixin was added earlier in 3.1 (see release notes). I hope it’s not that important in case of this pull request and I apologize for bothering you.

Hi @Fatal1ty! Thank you for reaching out and reminding me that DataClassORJSONMixin does not require mashumaro 3.11. That is encredibly useful information indeed. I updated the PR description.

The reason why I had mashumaro 3.11 in mind is because we internally use json_encode and json_decode to process DataClassORJSONMixin instances. But that is completely unrelated to this PR.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

wider variety of attributes, including numpy arrays.

Does your NumPy array use case work out of the box after this PR? If so, can you add a test where a NumPy array is an attribute?

Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (7996c2e) 82.58% compared to head (95ede28) 85.15%.
Report is 10 commits behind head on master.

Files Patch % Lines
flytekit/core/type_engine.py 63.63% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2080      +/-   ##
==========================================
+ Coverage   82.58%   85.15%   +2.56%     
==========================================
  Files         233      284      +51     
  Lines       19541    22197    +2656     
  Branches     3512     3520       +8     
==========================================
+ Hits        16138    18901    +2763     
+ Misses       2824     2686     -138     
- Partials      579      610      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@quinten-flwls quinten-flwls changed the title Extend serialization support for custom classes Support mashumaro DataClassORJSONMixin Jan 18, 2024
Signed-off-by: Quinten Roets <quinten.roets@flawlessai.com>
Signed-off-by: Quinten Roets <quinten.roets@flawlessai.com>
@quinten-flwls
Copy link
Contributor Author

quinten-flwls commented Jan 18, 2024

Thanks for the update!

wider variety of attributes, including numpy arrays.

Does your NumPy array use case work out of the box after this PR? If so, can you add a test where a NumPy array is an attribute?

Thank you for reviewing! Unfortunately, Numpy array attributes do not work out of the box. However, other attribute types like datetime do. I added a datetime attribute as an example in the tests. I hope that suffices to demonstrate the use case. Let me know if it does not!

Signed-off-by: Quinten Roets <quinten.roets@flawlessai.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but otherwise LGTM

flytekit/core/type_engine.py Outdated Show resolved Hide resolved
tests/flytekit/unit/core/test_type_engine.py Show resolved Hide resolved
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
@eapolinario eapolinario merged commit f115157 into flyteorg:master Jan 23, 2024
83 of 84 checks passed
Copy link

welcome bot commented Jan 23, 2024

Congrats on merging your first pull request! 🎉

@quinten-flwls
Copy link
Contributor Author

Thank you for the feedback and the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants