Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should dump be able to also pass class type to dictionary #14

Closed
casparjespersen opened this issue Feb 27, 2019 · 15 comments
Closed

Should dump be able to also pass class type to dictionary #14

casparjespersen opened this issue Feb 27, 2019 · 15 comments
Labels
feature A new feature that awaits implementation

Comments

@casparjespersen
Copy link

Say I have a class inheritance structure, and I want to generically be able to dump/load with the correct class types in the inheritance structure.

@dataclass
class A(object):
    foo: int

@dataclass
class B(A):
    bar: int

Assume that when loading, I do now know which class type it was dumped from. I would have to analyse the content to determine which class to load as, and then pass this to the cls property of jsons.load method.

Would it make sense to implement an optional flag when dumping to a dictionary that contains information of what class type it was dumped as, that can then be used for loading?

@ramonhagenaars
Copy link
Owner

Thanks Casparjespersen, for this issue.

You make a clear point, I can see why you might want this.
I am looking into how best this can be incorporated. Maybe something like this would do:

{
  'foo': 42,
  'bar': 24,
  '__meta__': {
    'type': 'B'
  }
}

Perhaps the type attribute should be a fully qualified name including its package(s) though.

@ramonhagenaars ramonhagenaars added the feature A new feature that awaits implementation label Feb 27, 2019
@casparjespersen
Copy link
Author

casparjespersen commented Feb 27, 2019

I agree, something like that. Then we would only need to work out a way to get the class from the string in a safe manner. This is difficult to make portable and secure.

From this post it appears one solution could be something using sys.modules:

import sys
from dataclasses import dataclass

@dataclass
class Foo(object):
    bar: int

if __name__ == "__main__":
    class_name = "Foo"
    var = getattr(sys.modules[__name__], class_name)
    print(var(1)) # output: Foo(bar=1)

However this is not portable at all, and only supports classes defined in the context in question.

Update

Inspired by this answer on a different post, one approach could be to let the user input a dictionary mapping a string representation with instantiators:

@dataclass
def A(object):
    foo: int

@dataclass
def B(A):
    bar: int

obj = A(0)
map = {"A": A, "B": B}
d = jsons.dump(obj, cls=A, instantiators=map) # output: {"foo": 0, "__instantiator__": "A"}
obj_reconstructed = jsons.load(d, instantiators=map) # reads __instantiator__ and does a lookup in the map

Something like that perhaps :-)

@ramonhagenaars
Copy link
Owner

ramonhagenaars commented Feb 27, 2019

The challenge here will indeed be obtaining the type from the str. Your updated answer could work. Or maybe some function to announce those types globally for all future loads? Something like:

jsons.announce_class(A)

When writing this, I just realized that the upcoming jsons 0.7.0 covers this issue to some extent. By combining the newly extended 'strict-mode' and the Union type, you could do:

@dataclass
class A:
    foo: int

@dataclass
class B(A):
    bar: int

jsons.load({'foo': 123, 'bar': 456}, Union[A, B], strict=True)

This would result in:

B(foo=123, bar=456)

@casparjespersen
Copy link
Author

@ramonhagenaars Do you have some documentation on, or can you briefly explain, how the Union operator is going to work? How will it decide which one to use? What would it output in case jsons.load({'foo': 123}, Union[A, B], strict=True) for the above example.

@ramonhagenaars
Copy link
Owner

For Unions, jsons will follow the definition of the typing module in that the object to be deserialized can be of any of the types in the Union. When a Union is provided, it will try to deserialize the given object to the types of that Union in the order from left to right, until a deserialization was successful.

So in the case of:

jsons.load({'foo': 123, 'bar': 456}, Union[A, B])

It will attempt jsons.load({'foo': 123, 'bar': 456}, A) first, and jsons.load({'foo': 123, 'bar': 456}, B) next if the former did not succeed.

By default, jsons is pretty forgiving and does not mind deserializing {'foo': 123, 'bar': 456} into A, even if A does not have a bar attribute in its class definition. So in the previous example, the resulting type would be A (and not B).

However, you can run jsons in 'strict-mode' to make jsons enforce the class definition on the object in question. This means that jsons.load({'foo': 123, 'bar': 456}, A, strict=True) would fail (an Exception is raised).

So if you instead wrote:

jsons.load({'foo': 123, 'bar': 456}, Union[A, B], strict=True)

It would result in an instance of B.

Remember that both the Union and this behavior of the 'strict-mode' are not available yet. They are included in the 0.7.0 release (which is coming soon).

@casparjespersen
Copy link
Author

casparjespersen commented Feb 28, 2019

Right. So that also means one would be able to achieve what I am asking for by under the following condition:

The solution only supports class inheritance structures where "deeper" classes contain additional required parameters in their initializer compared to parent classes. The differences in classes can be derived by examining whether or not the required parameters are available and using jsons.load(object, cls=Union[deepest, littleLessDeep, ..., baseClass], strict=True) for deserialization.

I think this is definitely something, but I also like the aforementioned idea of having a specific map, using some type of global announcing as you mentioned with jsons.announce_class(A). It would provide additional degrees of freedom. I imagine the use-case:

jsons.announce_class(Foo) # by default string-equivalent is automatically derived using str(class_name)
jsons.announce_class(Bar, "my_bar_representation") # optionally you can define your own)
obj_foo = jsons.dump(Foo(), cls=True) # output: { ..., "_cls_": "Foo" } or something similar
obj_bar = jsons.dump(Bar(), cls=True) # output: { ..., "_cls_": "my_bar_representation" }
foo = jsons.load(obj_foo, cls=True)
bar = jsons.load(obj_bar, cls=True)

@ramonhagenaars
Copy link
Owner

So that also means one would be able to achieve what I am asking for by under the following condition

Yes, as I understand it, this is achievable under that condition. But:

The solution only supports class inheritance structures where "deeper" classes contain additional required parameters

That's not entirely true, because "deeper" classes do not necessarily need additional required parameters; less required parameters would work as well. I would rephrase your sentence to something like:
"The solution only supports class inheritance structures where "deeper" classes have different constructor signatures than their parents."

but I also like the aforementioned idea of having a specific map ... It would provide additional degrees of freedom.

I agree with you on that, it would be a nice feature. I think I will put it on my TODO list. :-)

@casparjespersen
Copy link
Author

It should also work in a nested structure:

@dataclass
class Foo():
    my_field: str

@dataclass
class Bar():
    foo: Foo

jsons.announce_class(Foo)
jsons.announce_class(Bar)
jsons.dump(Bar(Foo("hello")), cls=True) 

The above should output:

{ 
  "_cls_": "Bar", 
  "foo": { 
    "_cls_": "Foo", 
    "my_field": "hello" 
  } 
}

@ramonhagenaars
Copy link
Owner

Or should it? In your example, the Bar class has hinted its foo attribute to be of type Foo. So technically, jsons only needs to know the "outer" class (Bar) and can derive foo to be Foo from the type hints in class Bar.

There are some pros and cons for using the nested/not-nested approach. These are what I could think of:

Nested Not nested
- Less clean output (more meta) + Cleaner output (less meta)
+ Classes to load into don't always require type hints anymore - Classes to load into always require type hints

Can you think of more pros/cons? And what do you think is preffered?

@casparjespersen
Copy link
Author

Consider this example:

@dataclass
class BaseFoo():
    my_field: str

@dataclass
class FooTypeA(BaseFoo):
    field_a: int

@dataclass
class FooTypeB(BaseFoo):
    field_b: int

@dataclass
class Bar():
    foo: BaseFoo

Here you can only deserialized Bar fully if you have the nested class structure.

@ramonhagenaars
Copy link
Owner

I see. We'll need the nested approach to make it work with inheritance.
That seals it then. :-)

@cypreess
Copy link

cypreess commented Mar 6, 2019

Just a few loose comments to this discussion:

  • adding some kind of meta key looks the cleanest to me (it creates a separate namespace of things that might be later added to the format). One immediate thing I would think of is
    {"%meta":{"protocol": "1.0"}} - this way jsons could support some backward incompatible format changes.
  • I really don't like anything that could clutter the object namespace (__meta__ is a legal attribute name for a class). Therefore I would use something that is allowed as a key in json at the same time no attribute/method can be called this way. And - or % character at the beginning of the name is a great example of this.

@ramonhagenaars
Copy link
Owner

Excellent additions, cypreess!

To summarize what we've come up so far and what I'm considering, assume having the following classes:

@dataclass
class Foo():
    my_field: str

@dataclass
class Bar():
    foo: Foo

You can then announce those classes to ensure that jsons knows those classes:

jsons.announce_class(Foo)
jsons.announce_class(Bar)

Jsons dumps the class name using its fully qualified name, so you can choose to omit announcing the class and "hope" that jsons can import the correct class using that meta information in the dump. It raises an error upon failure with a hint to use announce_class.

You can specify that you want to have meta information in the dumps. I'm thinking of naming that parameter "verbose". So:

jsons.dump(Bar, verbose=True)

There are now three options I am considering for the output. The first looks somthing like this:

{ 
  "-meta": {
    "cls": "my_package.my_module.Bar"
  }, 
  "foo": { 
    "-meta": {
      "cls": "my_package.my_module.Foo"
    },
    "my_field": "hello" 
  } 
}

The second option also has nested meta info, but only the "root" object holds the common meta information:

{ 
  "-meta": {
    "cls": "my_package.my_module.Bar",
    "protocol": "1.0",
    "serialization_time": "2019-03-07T08:49:00Z"
  }, 
  "foo": { 
    "-cls": "my_package.my_module.Foo",
    "my_field": "hello" 
  } 
}

The third option I could think of looks something like this:

{ 
  "-meta": {
    "classes": {
      "/": "my_package.my_module.Bar",
      "/foo": "my_package.my_module.Foo"
    },
    "protocol": "1.0",
    "serialization_time": "2019-03-07T08:49:00Z"
  }, 
  "foo": { 
    "my_field": "hello" 
  } 
}

The third option stores its meta information in the root object only.

@casparjespersen
Copy link
Author

I prefer options 2 and 3 - with option 3 a slight favorite. It is much cleaner in a deep nested structure, and the only downside is that you will not be able to extract a child-part of the object and deserialize it, since you would throw away the metadata. I think this is a fair tradeoff.

@ramonhagenaars ramonhagenaars added the in progress This issue is worked on label Mar 10, 2019
@ramonhagenaars
Copy link
Owner

This issue has been solved in 0.8.0. Check out the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature that awaits implementation
Projects
None yet
Development

No branches or pull requests

3 participants