-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes data type not JSON serializable #158
Comments
Hello @richard-to I would like to contribute to this issue, this is my first time however I'm fairly confident about my skills to help this project. |
What if we use a json encoder and edit the dataclass_utils.py file to retrieve this function so that it is json serializable import json def decode_bytes(dct): |
Hi @zacharias1219, Thanks for wanting to contribute to Mesop. You're certainly welcome to contribute. Yes, I agree with your approach of serializing the bytes to base64. In terms of implementation, I think you can follow a similar strategy to what we did with the Pandas serialization, which was to convert the incompatible data type to a dictionary. Example for bytes: {"__python.bytes._": "base64-enoded-byte-string"} We also have some other data types that need custom encoding: #387. So we want to create a consistent pattern. We have a We also have For getting started, you can start here: https://google.github.io/mesop/internal/contributing/ and here: https://google.github.io/mesop/internal/development/ We also have a Github Codespaces we recently set up that you can test out. No instructions yet since we just added it and it probably needs some testing to work out some kinks. But it's available. |
Also forgot to mention. For running the unit tests for the changes here, you can do:
There should be a test file there that you can add your tests to. |
Hey thank you for commenting, I've gotten an idea that I'll post here mesop/dataclass_utils/dataclass_utils.py
This is what I could come up with at the moment, I would appreciate for any changes regarding this and it you like this then I can commit it and also add tests to it. |
And sorry for editing multiple times, this is my first time with issues interface on github. |
No problem about editing multiple times. Perfectly fine. In terms of your proposed changes. They look fine on first glance. I would recommend starting with just bytes first. Let's keep the number of changes small and self-contained. Easier for the reviewer and allows you to test more throughly. So feel free to post a pull request for bytes. Once you get that merged in you can add the other ones too. A few notes: We recently added a State Diff performance improvement. This covers some serialization stuff, so you'll need to make sure the following tests pass and have coverage for your change:
|
Just FYI @zacharias1219 Looks like someone posted a pull request for the set/tuple/datetime issue #387. So you'll only need to do the bytes. Sorry about that. Next time I'll do a better job marking issues as assigned. |
Hey this is what I came up with import json from deepdiff import DeepDiff, Delta from mesop.exceptions import MesopException _PANDAS_OBJECT_KEY = "pandas.DataFrame" C = TypeVar("C") def _check_has_pandas():
except ImportError: _has_pandas = _check_has_pandas() def dataclass_with_defaults(cls: Type[C]) -> Type[C]: return dataclass(cls) def serialize_dataclass(state: Any): def update_dataclass_from_json(instance: Any, json_string: str): def _recursive_update_dataclass_from_json_obj(instance: Any, json_dict: Any): class MesopJSONEncoder(json.JSONEncoder): Since we support Pandas DataFrames in the Mesop table, users may need to store the For simplicity we will convert the DataFrame to JSON within the JSON serialized state. def default(self, obj):
def decode_mesop_json_state_hook(dct): Since we support Pandas DataFrames in the Mesop table, users may need to store the One thing to note is that pandas.NA becomes numpy.nan during deserialization.
if _BYTES_OBJECT_KEY in dct: return dct class DataFrameOperator(BaseOperator): DeepDiff does not support diffing DataFrames. See seperman/deepdiff#394. This operator checks if the DataFrames are equal or not. It does not do a deep diff of def match(self, level) -> bool:
def give_up_diffing(self, level, diff_instance) -> bool: def diff_state(state1: Any, state2: Any) -> str: DeepDiff does not support DataFrames yet. See The custom_actions = [] Only use the
|
Can you create a Pull Request with your changes. It will be easier for us to review the changes that way. Thanks. |
I have created it. |
Ok. Great. We'll take a look tomorrow morning (for us). Thanks a lot. |
If you store bytes in Mesop state, it does not appear to be JSON serializable by default .
The text was updated successfully, but these errors were encountered: