Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialize json to union of different classes by parameter #67

Closed
amirotin opened this issue Jan 30, 2022 · 14 comments
Closed

Deserialize json to union of different classes by parameter #67

amirotin opened this issue Jan 30, 2022 · 14 comments

Comments

@amirotin
Copy link

amirotin commented Jan 30, 2022

Hi!
I'm trying to understand if it is possible to deserialize nested dataclass based on some value in main dataclass.
Don't know how to explain it more correctly, probably my code will say more than me:

I've such json:

{
    "pointList": [{
		"r": {some data for pointType 1},
		"x": 1,
		"y": 1,
		"pointType": 1
	}, {
		"p": {some data for pointType 4},
		"x": 2,
		"y": 2,
		"pointType": 4
	}
    ]
}

So i defined dataclass for each pointType i have. serialization worked like a charm, but how can I deserialize such json, choosing correct dataclass for each point?

class BaseRequest(DataClassJSONMixin):
    class Config(BaseConfig):
        code_generation_options = [TO_DICT_ADD_OMIT_NONE_FLAG]

@dataclass(slots=True)
class MapPoint(BaseRequest):
    x: int
    y: int
    pointType: int

@dataclass(slots=True)
class Point4(MapPoint):
    r: Point4Data = None
    pointType: int = 4

@dataclass(slots=True)
class Point1(MapPoint):
    p: Point1Data = None
    pointType: int = 1

@dataclass(slots=True)
class MapData(BaseRequest):
    pointList: List[Union[Point1,Point4]]

Fo example make some dict where i can set class for each pointType and pass it to deserialization function. Or this is imposible and i want too much? :)

@amirotin amirotin changed the title deserialize Deserialize json to union of different classes py parametr Jan 30, 2022
@amirotin amirotin changed the title Deserialize json to union of different classes py parametr Deserialize json to union of different classes by parameter Jan 30, 2022
@Fatal1ty
Copy link
Owner

Hi @amirotin

There would be no problem if Point1 and Point4 had differences in the names or types of their fields. They have different p and r fields but since they both have a default value, these two classes could be deserialized from the same input data.

I'm going to implement discriminated unions, which will allow choosing the right class based on the value of a field and solve this problem completely. But for now there is a couple of workarounds:

  1. Change type of p and r fields to make them required
  2. Write a deserialization method for Union[Point1,Point4] by yourself:
def deserialize_point(value: Dict) -> Union[Point1, Point4]:
    pointType = value.get("pointType")
    if pointType == 1:
        return Point1.from_dict(value)
    elif pointType == 4:
        return Point4.from_dict(value)
    else:
        raise ValueError(f"Unknown pointType {pointType}")

@dataclass(slots=True)
class MapData(BaseRequest):
    pointList: List[Union[Point1, Point4]]

    class Config(BaseConfig):
        serialization_strategy = {
            Union[Point1, Point4]: {
                "deserialize": deserialize_point
            }
        }

@amirotin
Copy link
Author

@Fatal1ty thanks for idea!
I've checked my input data - i've 55 (later will be more) different point types and unfortunately many of them have same letter p, r, m, b and etc. So option 1 won't work for me
Actualy i was thinking about making custom SerializableType class for that. But serialization_strategy will be easier. As far as i understand - i can replace Union[Point1, Poin2, ....] with Any. And it will work same way?

@Fatal1ty
Copy link
Owner

As far as i understand - i can replace Union[Point1, Poin2, ....] with Any. And it will work same way?

Yes, you can override deserialization of whatever you want. So, you can use List[Any] and override deserialization for Any.

@Fatal1ty
Copy link
Owner

Fatal1ty commented Jan 30, 2022

Another solution is to use SerializableType, as you've noticed:

@dataclass(slots=True)
class MapPoint(BaseRequest, SerializableType):
    x: int
    y: int
    pointType: int

    def _serialize(self):
        return self.to_dict()

    @classmethod
    def _deserialize(cls, value: Dict):
        if value.get("pointType") == 1:
            return Point1.from_dict(value)
        elif value.get("pointType") == 4:
            return Point4.from_dict(value)
        ...

@dataclass(slots=True)
class MapData(BaseRequest):
    pointList: List[MapPoint]

You can have a registry like Dict[int, MapPoint] mapping to store all point types. For example, each point type can be registered with a decorator.

@amirotin
Copy link
Author

@Fatal1ty thanks! worked like a charm!

@Fatal1ty
Copy link
Owner

With a bit of black dirty magic in a simple case a universal deserialization method for all point types would look like this:

current_module = sys.modules[__name__]

...

@classmethod
def _deserialize(cls, value: Dict):
    pointType = value.get("pointType")
    try:
        point_cls = getattr(current_module, f'Point{pointType}')
    except AttributeError:
        raise ValueError(f"Unknown pointType: {pointType}")

    return point_cls.from_dict(value)

@amirotin
Copy link
Author

If we talk about json, should it be from_json instead of from_dict? And probably value is str, not Dict?

@amirotin amirotin reopened this Jan 30, 2022
@Fatal1ty
Copy link
Owner

It should be from_dict. from_json is just a simple wrapper.

@amirotin
Copy link
Author

Thanks again 👍
Now i got correct result in both directions!

@th0ger
Copy link

th0ger commented Feb 19, 2023

You can have a registry like Dict[int, MapPoint] mapping to store all point types. For example, each point type can be registered with a decorator.

Sounds interesting with a decorator, could you provide an example?

@Fatal1ty
Copy link
Owner

Fatal1ty commented Feb 19, 2023

You can have a registry like Dict[int, MapPoint] mapping to store all point types. For example, each point type can be registered with a decorator.

Sounds interesting with a decorator, could you provide an example?

Yes, sure. It might look something like this:

def register_point(point_number: int):
    def decorator(class_):
        MapPoint.__points__[point_number] = class_
        return class_
    return decorator


@dataclass(slots=True)
class MapPoint(BaseRequest, SerializableType):
    __points__ = {}
    
    ...

    @classmethod
    def _deserialize(cls, value: Dict):
        return cls.__points__[value["pointType"]].from_dict(value)

@register_point(1)
@dataclass(slots=True)
class Point1(MapPoint):
    pointType: int = 1
    ...

@dataclass(slots=True)
class MapData(BaseRequest):
    pointList: List[MapPoint]

Another option is to use __init_subclass__ method since every PointX is meant to be a subclass of MapPoint:

@dataclass(slots=True)
class MapPoint(BaseRequest, SerializableType):
    __points__ = {}

    def __init_subclass__(cls, **kwargs):
        MapPoint.__points__[cls.__dict__["pointType"]] = cls

    ...

    @classmethod
    def _deserialize(cls, value: Dict):
        return cls.__points__[value["pointType"]].from_dict(value)

@dataclass(slots=True)
class Point1(MapPoint):
    pointType: int = 1
    ...

@dataclass(slots=True)
class MapData(BaseRequest):
    pointList: List[MapPoint]

@th0ger
Copy link

th0ger commented Feb 19, 2023

@Fatal1ty cool I like the decorator approach and it works. But can we get rid of the redundant pointType ("1") in either the decorator arg or default variable?

@Fatal1ty
Copy link
Owner

@Fatal1ty cool I like the decorator approach and it works. But can we get rid of the redundant pointType ("1") in either the decorator arg or default variable?

You can use the pointType value:

def register_point(class_):
    MapPoint.__points__[class_.__dict__["pointType"]] = class_
    return class_


@register_point
class Point1:
    pointType: int = 1

@mishamsk
Copy link
Contributor

mishamsk commented Mar 4, 2023

I may open source a library I wrote on top of Mashumaro, that also includes serialization extension, that among other things sovles this issue. Here is a snippet from the body of my serialization mixin:

def __post_serialize__(self, d: dict[str, Any]) -> dict[str, Any]:
        out = {"class": self.__class__.__name__}
        for k, v in d.items():
            if k.startswith("_"):
                continue
            out[k] = v
        return out

    def _serialize(self) -> dict[str, Any]:
        if DataClassSerializeMixin.__mashumaro_dialect is not None:
            return self.to_dict(dialect=DataClassSerializeMixin.__mashumaro_dialect)
        else:
            return self.to_dict()

    @classmethod
    def _deserialize(cls: Type[T], value: dict[str, Any]) -> T:
        class_name = value.pop("class", None)

        if class_name is None:
            raise ValueError("Missing class name ('class' field) in source")

        clazz = TYPES.get(class_name, None)
        if clazz is None:
            raise ValueError(f"Unknown class name: {class_name}")

        if DataClassSerializeMixin.__mashumaro_dialect is not None:
            return cast(
                T, clazz.from_dict(value, dialect=DataClassSerializeMixin.__mashumaro_dialect)
            )
        else:
            return cast(T, clazz.from_dict(value))

    def __init_subclass__(cls, **kwargs: Any):
        if cls.__name__ in TYPES:
            package = "Unknown"
            module = inspect.getmodule(TYPES[cls.__name__])
            if module is not None:
                package = str(module.__package__)

            raise ValueError(
                f"DataClassSerializeMixin subclass <{cls.__name__}> is already defined in package <{package}>. Please use a different name."
            )

        TYPES[cls.__name__] = cls
        return super().__init_subclass__(**kwargs)

It also has some custom to/from_dict/json/msgpack wrappers with more complicated dialects supports, which is not in that snippet. Works like a charm. no decorators necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants