Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion: support polymorphism - major change #65

Open
nros opened this issue Nov 6, 2017 · 16 comments
Open

discussion: support polymorphism - major change #65

nros opened this issue Nov 6, 2017 · 16 comments
Assignees

Comments

@nros
Copy link

nros commented Nov 6, 2017

Dear maintainers,

would you mind to support polymorphism when serializing?

Problem

At the moment the schema exactly defines the expected type of the property to serialize. If any subclass adds some more properties to serialize, these are simply ignored. Deserializing does not create the original type of instance but the schema-defined type.

Similar, arrays are serialized based on the first item only.

schema = getDefaultModelSchema(thing[0])

This behavior does not enable using polymorphism with sub classes and does not work well with the idea of dependency injection.

See this simple example:

const serializr = require("serializr");


class Test {}
class Test2 extends Test {}
class ToDo {}

serializr.createModelSchema(Test, {
    id: serializr.primitive(),
    name: serializr.primitive()
});

serializr.createModelSchema(Test2, {
    id: serializr.primitive(),
    name: serializr.primitive(),
    name2: serializr.primitive()
});

serializr.createModelSchema(ToDo, {
    t: serializr.primitive(),
    polymorphData: serializr.object(Test),
    listOfTests: serializr.list(serializr.object(Test)),
    q: serializr.primitive()
});

const testToDo = new ToDo();
testToDo.t = "grab coffee";
testToDo.q = 5;

testToDo.listOfTests = [];
testToDo.listOfTests.push(((id) => {
    const t = new Test();
    t.id = id;
    t.name = "B hello universe";
    return t;
})("ab"));

testToDo.listOfTests.push(((id) => {
    const t = new Test2();
    t.id = id;
    t.name = "C hello family";
    t.name2 = "sub class of Test";
    return t;
})("ac"));

testToDo.polymorphData = ((id) => {   
    const t = new Test2();
    t.id = id;
    t.name = "D polymorphic hello";
    t.name2 = "another sub class of Test";
    return t;
})("iu"); 

console.log(testToDo,"\n\n");
console.log(serializr.serialize(serializr.getDefaultModelSchema(ToDo), testToDo));

The output is:

ToDo {
  t: 'grab coffee',
  q: 5,
  listOfTests: 
   [ Test { id: 'ab', name: 'B hello universe' },
     Test2 { id: 'ac', name: 'C hello family', name2: 'sub class of Test' } ],
  polymorphData: 
   Test2 {
     id: 'iu',
     name: 'D polymorphic hello',
     name2: 'another sub class of Test' } }


{ t: 'grab coffee',
  polymorphData: { id: 'iu', name: 'D polymorphic hello' },
  listOfTests: 
   [ { id: 'ab', name: 'B hello universe' },
     { id: 'ac', name: 'C hello family' } ],
  q: 5 }

With the serialized version, all information about instances of Test2 are gone.

Possible solution

The schema for serializing is already stored with the instance constructor.

return thing.constructor.serializeInfo

Hence it would be possible to detect the schema of the instance to serialize dynamically. Additionally adding some sort of "class name" to the serialized information would be possible and would help to re-create the proper instance upon deserialization.

So the serialized info could look like:

{
  __className__: 'ToDo',
  t: 'grab coffee',
  polymorphData: {
     __className__: 'Test2',
    id: 'iu',
    name: 'D polymorphic hello'
  },
  listOfTests: [
    { __className__: 'Test', id: 'ab', name: 'B hello universe' },
    { __className__: 'Test2', id: 'ac', name: 'C hello family', name2: 'sub class of Test' }
  ],
  q: 5
}

The class name is a symbolic "class name identifier" (string), either be set by the defined schema or a decorator. This avoids problems with packers that scramble class names. When reading all schemas before deserialization, an internal map of "class name identifiers" to schemas can be build to find the schema for the "class name identifier". Already each schema contains a factory function that is able to create a new instance for that schema.

serializr.createModelSchema(Test, {
    __className__: "Test",
    id: serializr.primitive(),
    name: serializr.primitive()
});

serializr.createModelSchema(Test2, {
    __className__: "Test2",
    id: serializr.primitive(),
    name: serializr.primitive(),
    name2: serializr.primitive()
});

Do you mind?

Would you mind going that route and accept PRs implementing this?

May be someone is willing to help?
I cannot promise to, but will try.

@alexggordon
Copy link
Member

alexggordon commented Nov 8, 2017

I am open to definitely better supporting polymorphism but I don't think the way to do is through adding information to the serialized data. That means that moves serializr away from being a generic serialization library without a whole lot of added benefit I think, but doesn't necessarily solve the problem at hand.

The issue you mention is that when serializing an array, it selected the first schema from the array to serialize. This is because when using an array of many things, the suggested route to take is to use a custom type. Using a custom type is what I use for polymorphism, and during serialization I just do

custom(
     list_to_serialize => list_to_serialize.map(clazz => serialize(clazz)),
     to_to_deserialize => to_to_deserialize.map(obj => deserialize(getClazz(obj), obj)
)

This is effectively the currently suggested workaround of your issue with the first item in the array being the default schema. In your example, you would replace the

    listOfTests: serializr.list(serializr.object(Test)),

line with the custom serializer to fix this issue.

Back to the issue though, while I don't agree with adding metadata to the serialized version (whatever your data storage medium is shouldn't store information relative to a frontend library, as is suggested by proper separation of concerns), I would agree that I think we need some sort of polymorphism control due to the frequent requests for the feature.

I've been doing some research as to how other serializers manage this situation, and I think I might be able to suggest an alternative, to start, in the form of a typeOf function.

It would work like

 * @param {ModelSchema} model schema of the sub type
 * @param {ModelSchema} model schema of the parent type
 * @param {string} a string type description. Must be unique among sibling types. 
 * @param {function} a function that fetches the type from the obj. The return value must match the string returned from the type. 
 * @returns {PropSchema}
function typeOf(modelSchema, parentModelSchema, type, keyFunc=obj=>obj.type)  {
...
}

It would work like.

class Farm {
    @serializable(list(object(Animal))) animals = [];
}

class Animal {
    @serializable name;
    @serializable age;
    @serializable type;
}

@typeOf(getDefaultModelSchema(Animal), 'dog', obj => obj.type)
class Dog extends Animal {

    get ageInDogYears() {
        return this.age * 7
    }
}

or in plain JS

createModelSchema(Animal, {
    name: primitive(),
    age: primitive(),
    type: primitive(),
});

typeOf(Dog, getDefaultModelSchema(Animal), 'dog')

The upside of this implementation is that Lists wouldn't require any dev work outside of getting the default model schema for each individual item. As far as implementation for this, when the typeOf function is called on an object, we can just store the "types" in a Map on the serializeInfo object we set on the constructor.

Do you have any thoughts on this @nros? @mweststrate ?

@alexggordon
Copy link
Member

alexggordon commented Nov 8, 2017

Actually, I think my version is too specific already. I wonder if we'd be better off with a union type like from mobx state tree. Something like

const dispatcher = obj => {
   return obj.type === 'dog' ? Dog : Cat
}

const Animals = unionType([Cat, Dog], dispatcher)

class Farm {
    @serializable(list(object(Animals))) animals = [];
}

@nros
Copy link
Author

nros commented Nov 9, 2017

union type seems insufficient

The union type has the disadvantage that exactly have to define the possible sub classes to use. Think of all those minecraft mods that add new items/blocks. So, this solution seems to fall short.

"class identification information" needed in serialized data

IMHO It ends up to the point that additional information about the runtime type has to be added to the serialized data. Som sort of symbolic "identifier" not the class name itself (because of packers, etc.). It is used to find the proper schema upon deserialization.

global lookup table needed

At the moment serializr does not need global data. All schema information is attached to the constructor function of classes. As soon as you have to maintain a lookup table to dynamically determine the schema during deserialization, you end up with a "global" lookup table. This might lead to name collisions where various classes use the same class identifier.

in combination with a dependency injection system

My working solution - based on the custom() serializer - depends on a dependency injection system (eg: inversify). Since the DI must maintain unique identifiers, its global symbol table is used to find the proper class when deserialization takes place.

solution as an extra type "anytype"

To maintain current simplicity of serializr I would suggest to leave the lookup for classes to external providers. An extra type serializer like anytype() could be implemented.

class Farm {
    @serializable(list(anytype(classLookupFunction))) animals = [];
}

anytype could perform the following:

  • dynamically detect type/schema
  • in case of primitive types - use primitive() serializer internally
  • for object types, store some symbolic type information with the serialized data
  • JavaScript standard classes (eg: Date, Object) are handled directly
  • when deserializing the type information is used to find the proper class with the lookup function.

@nros
Copy link
Author

nros commented Nov 9, 2017

Then we need the "class identifier" with the class.

@classID("Animal")
class Animal {
    @serializable name;
    @serializable age;
    @serializable type;
}
@classID("Dog")
class Dog extends Animal {

    get ageInDogYears() {
        return this.age * 7
    }
}

@alexggordon
Copy link
Member

I really don't think storing any serialization info in the serialized data is a good idea. Imagine any of these scenarios:

  1. Class Name (class ID, your global variable) changes, or is no longer descriptive, but you can't change it because it breaks currently stored serialization info in the database
  2. Data stored from serializr is passed to a database storage library (graphql). We now have to add __className__ as a required param in the graphql query, in addition to making sure we expect it on the backend, as we can't pass unspecified params. Really, anything that ingests JSON in another context then stores it would likely destroy this metadata without prior specification of acceptance.
  3. The __className__ metadata isn't on a JSON object that's passed into the serializr polymophic type, but there's no intelligent way for serializr to 'deduce' the type.

See all the issues that adding metadata into serialization carries? That's why it's so important to have a dispatcher that the user is in control of. Then, if there are problems serializing/deserializing, the onus is on the user to fix these issues.

With regard to your issue that the Union type that the user needs to define all the subclasses ahead of time, there's not really any other way to do it. I'm assuming you're not implying metaprogramming here, but when deserializing, it's the users job to specify how to deserialize the data.

Can you explain how the union type wouldn't solve your array issue mentioned above? Or maybe explain a different use case it doesn't solve?

@nros
Copy link
Author

nros commented Nov 9, 2017

Hm. I totally agree with you that it would be nice to have a solution that does not need to store meta-information with the serialized data. I see your points.

Nonetheless, even the union type has to decide which of the types select when deserializing. Hence there must be a distinct feature available to distinguish between the types.

Generally I like the idea of basing the type decision on already existing attributes. This should work well as long as entities can be distinguished by their data/attributes, which should be no problem with carefully crafted models.

But instead of having a fixed list of possible types, I would prefer to use the "type decision function", like the lookup function with reference(). This function would receive the JSON and return the class to use - or an instance of that class. (a "pimped" factory)
Maybe we could use both. If the parameters to the unionType() are classes, use these. In case of a function, use it as a "type decision function".

where union type might fall short

A fixed list of used types would make the system very inflexible. I try to explain why. Of course I may be wrong or not fully understand your concept.

plugin system

Not knowing all types upfront might happen with a plugin system. I think of a module based system where a vendor (A) provide the base classes + UI + business logic (classical MVC) for an application. An example system might manage block-chains (another buzz-word). Each chain element has a content type. Other developers (B) base their product on it and may add more types as they needed. The content types make sense only to B. And since there are thousands of B's, A is reluctant to add all types a B is implementing to the union list.

mods and enhancements by other developers/vendors

The same applies to Minecraft. Each mod is able to add more block types with special behaviour but the mods are unable to change the code of the base Minecraft system. Since there are thousands of individual developers to small teams creating mods of mixed quality, Mojang would be unable to maintain a list of all added block types. There is no such registry. So there must be a different mechanism to store all blocks, including blocks of unknown type.

reluctance to touch or read working code

I experienced in the past that there are not many developers willing to read code of others or even change such code. Fewer of these even want to understand "foreign" code.
A fixed union type would involve more documentation, more understanding, more help by the original developer (=effort) to understand how to properly announce new sub classes/plugins to the original code. This might seem not that big of a problem with OpenSource communities but I try to earn my living with my work - usually :) So time and effort means money and pressure of project managers is always high to get the project ready in time.

@alexggordon
Copy link
Member

alexggordon commented Nov 9, 2017

I think your criticisms of the Union type largely support the need for something as loosely coupled as the union type is.

First, you can't serialize/deserialize something in serializr without first creating a schema for it. That means that regardless of what it is, the object has to get passed to serializr first, so then setting the expectation that the user pass all the types to a Union, is not unreasonable. In addition to that, if we wanted, we could probably get away not even passing in the types, and just passing in a functions.

Second, as with Minecraft, the developers have to specify all the blocks they want to add. This is the same with Serializr, except instead of blocks, there's types. Minecraft can't guess what blocks the developers are going to add, just as serializr can't guess what types the developer wants to have.

Maybe you could get away with doing without passing the blocks in though. Something like

const dispatcher = obj => {
   return obj.type === 'dog' ? Dog : Cat
}

class Animal {
    @serializable(list(dynamicObject(dispatcher))) animals = []
}

After thinking about it, I don't really see a benefit to passing in a list of items, short of it being a "strict" whitelist, as opposed to a blacklist.

What about something like that?

@nros
Copy link
Author

nros commented Nov 10, 2017

Yep, I like that. This seems a good, reasonable way.

It is simple, does not impose the need to build a global registry of all classes to serializr and yet is flexible enough to enable the user to adapt it to its needs.

@nros
Copy link
Author

nros commented Nov 10, 2017

There is another subject that is bothering me: references - do you want a separate discussion?

When deserializing an entity and this entity has a identifier property I would like to try to find it in the JSON - much like reference() does. But contrary to reference() it will not fail, it is optional. If the entity was previously found, use a reference, if not, create a new entity.

{
  t: 'grab coffee',
  polymorphData: {
    id: 'iu',
    name: 'D polymorphic hello'
  },
  listOfTests: [
    {id: 'iu', name: 'D polymorphic hello' },
    {id: 'ac', name: 'C hello family', name2: 'sub class of Test' }
  ],
  q: 5
}

You see that entity "iu" is always completely serialized. But when deserialized, the list item listOfTests[0] will be a reference to polymorphData. Do you know what I mean?

Of course you might argue that there is a problem in case the ID match but the properties do not! I would ignore the following JSON data in case the ID (and the class type) match. It is impossible to decide which data is the correct one, so just use the first set of properties.

The problem is, that reference() can not be used for that, because it will fail if it can not find the entity at another place. But dynamicObject() should not fail but silently create the entity based on the available data.

What do you think?

@scisci
Copy link

scisci commented Dec 5, 2017

Hi, I was able to get polymorphism kind of working with the suggested custom type:

custom(
     list_to_serialize => list_to_serialize.map(clazz => serialize(clazz)),
     to_to_deserialize => to_to_deserialize.map(obj => deserialize(getClazz(obj), obj)
)

But if I have a reference to an item, it fails to resolve the reference. How could I specify the base class when resolving the reference? Encode something into the identifier?

@scisci
Copy link

scisci commented Dec 5, 2017

Actually I just realized the above does not work because the custom function doesn't have access to the parent deserialize context.

It should really use deserializeObjectWithSchema but that isn't exposed by the library.

@nros
Copy link
Author

nros commented Dec 5, 2017

this is the reason for my PR. see #67

@scisci
Copy link

scisci commented Dec 5, 2017

I was able to get my issue working by exposing deserializeObjectWithSchema so deserializing a list of polymorphic objects looks like:

serializr.custom(
  listToSerialize => listToSerialize.map(item => serializr.serialize(item)),
  (listToDeserialize, context) =>
	listToDeserialize.map(obj => {
		return serializr.deserializeObjectWithSchema(
			context,
			serializr.getDefaultModelSchema(getClazz(obj.type)),
			obj
		);
	})
)

Kind of ugly, but it works. It seems references automagically work if I just specify the base class:

listOfReferences: serializr.list(serializr.reference(BaseClass))

@spinorkit
Copy link

I assume getClazz() looks up the class from the obj.type?
Do you manually have to maintain that?
I would like to use a (class) decorator to maintain that dictionary.

@alexggordon I am not worried about e.g. the class name changing, because it is a given that if you are persisting the user's data you have to support backwards compatibility, i.e. you can never replace or remove existing classes, only add new ones and explicitly handle converting old ones to the new ones that replace them, if needed.

@scisci
Copy link

scisci commented Dec 6, 2017 via email

@engelchrisi
Copy link

engelchrisi commented Apr 21, 2018

I enhanced the serializing code with some decorators to achieve persistence with clazz tags for polymorph items. Please have a look at this:

@PolyMorph
export class Snake extends Animal {

@serializable(polymorphArray(Animal)) private data: Array;

Polymorphism.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants