Skip to content
Simon Chan edited this page Mar 22, 2016 · 2 revisions

As of 0.6.0, MessagePack for CLI supports polymorphism for objects and collection items.

Actually, there are 2 kinds of polymorphism: Known subtypes based polymorphism and Runtime type based polymorphism.

Use Cases

There are several use cases of polymorphism.

  1. Serialize polymorphic collections (issue #58). Sometimes you want to serialize heterogeneous collection items.
  2. Serialize 'rich' domain model which has own data and logic (issue #47). You can deserialize objects and invoke their virtual methods.

Choosing a Kind of Polymorphism

Known Subtypes Based Polymorphism

  • Pros
    • Easy to interop. Known Subtypes Based Polymorphism uses a simple format, so you can easily implement counterpart system.
    • Naturally secure. You can control possible instance types via custom attributes, so there are fewer chance to inject malicious code except you also download untrusted assembly.
  • Cons
    • You must continuously maintain known subtype list(s).
    • All types must be known at compilation time.

Runtime Type Based Polymorphism

  • Pros
    • Easy to use and maintain. You just need to put some custom attributes to the members.
    • You don't have to know possible subtypes at compilation time.
  • Cons
    • It uses native .NET type identifier based format, so it is hard to keep interoperability because other systems must interpret the information and translate them to their own type system requirement.
    • You cannot control possible subtypes, it might hurt stability of your application.
    • If serialized data comes from external source, the data may contain malicious type information. Attackers can specify special type(s) which has default constructor which causes significant side effects like file/registry manipulation etc.

Usage

You can specify a member (field/property) is polymorphic by marking it with custom attributes like following:

// Known subtypes based polymorphism.
[MessagePackKnownType( 0, typeof( FileInfo ) )]
[MessagePackKnownType( 1, typeof( DirectoryInfo ) )]
public FileSystemInfo Info { get; set; }

// Runtime type based polymorphism.
[MessagePackRuntimeType]
public object Data { get; set; }

As you imagine, you cannot mix multiple polymorphism custom attribute to the member.

You can also specify polymorphism to collections themselves, each collection items, each dictionary keys/values, and each Tuple items.

These tables show valid combination and meanings of the attributes:

Attributes for Know Subtype Based Polymorphism

Attribute Target Note
MessagePackKnownTypeAttribute Noncollection objects or Collections themselves
MessagePackKnownCollectionItemTypeAttribute Collection items or Dictionary values For example, items of List<object> typed property value.
MessagePackKnownDictionaryKeyTypeAttribute Dictionary keys For example, keys of Dictionary<object, object> typed property value.
MessagePackKnownTupleItemTypeAttribute An item of tuples every attribute specifies an item (Nth attribute for ItemN property).

Attributes for Runtime Type Based Polymorphism

Attribute Target Note
MessagePackRuntimeTypeAttribute Noncollection objects or Collections them selves
MessagePackRuntimeCollectionItemTypeAttribute Collection items or Dictionary values For example, items of List<object> typed property value.
MessagePackRuntimeDictionaryKeyTypeAttribute Dictionary keys For example, keys of Dictionary<object, object> typed property value.
MessagePackRuntimeTupleItemTypeAttribute An item of tuples every attribute specifies an item (Nth attribute for ItemN property).

As you see, you can specify both of collections themselves are polymorphic and their keys/items are polymorphic for collection typed (that is, the type implements IEnumerable, but not IDictionary and not sealed) or dictionary typed members. In addition, you can specify polymorphic to tuple item(s). Note that System.Tuples are sealed, so you cannot specify a Tuple typed member itself is polymorphic.

For the remainder, there are default behaviors for collection and System.Object typed members.

  • System.Object means boxed MessagePackObject when the member is not marked with any polymorphic attributes.
  • Deserialized abstract collection typed member value is determined by SerializationContext.DefaultCollectionTypes registration. Defaults are List<T> and Dictionary<TKey, TValue>.

Polymorphism Internals

This section discusses about type information format to develop interoperable implementation.

Basic Design

  • Objects' type information will be serialized together with their values.
  • The type information and values are serialized within single array.
  • The type information consists of their data.
  • The type information itself will be encoded in an array.

Known Subtype Based Polymorphism Type Information Format

It will be encoded as a simple 2 elements array.

[<StringTypeCode>, <Data>]

In above figure, "StringTypeCode" is the type code string specified in the custom attributes. It will be encoded as MessagePack str(raw) format. It should be encoded as compact as possible. "Data" is the serialized object value and its form will be array or map.

Runtime Type Based Polymorphism Type Information Format

It will be encoded as a 2 elements array.

[<EncodedNETType>, <Data>]

In above figure, "Data" is the serialized object value and its form will be array or map. The "EncodedNETType" is a 6 element array formatted structured data and it is equivalent to .NET type name with assembly qualified name. This table shows contents of the structured type information and mapping between type qualified name and the structured data:

Index Type Content
0 integer Format ID. Only 1 is valid. Discussed later.
1 str(raw) Compressed type full name. Discussed later.
2 str(raw) Assembly's simple name.
3 array Assembly's version with 4 element int array.
4 str(raw) Assembly's culture name. nil for neutral assembly.
5 bin(raw) Assembly's public key token. nil for null.

Note that the Format ID 1 means this format uses "Compressed Format". This format compresses the type name. Because many type owns the prefix as namespace, and the prefix often matches its declaring assembly simple name, we can save space with omit the duplicated substring. The format replaces such prefix with '.'. For example, the type which is "TheCompany.TheProduct.TheComponent.TheLayer.TheType, TheCompany.TheProduct.TheComponent, Version=1.2.3.4, Culture=neutral, PublicKeyToken=null", then the result type information logically should be following:

[1, ".TheLayer.TheType", "TheCompany.TheProduct.TheComponent", [1, 2, 3, 4], nil, nil]

The physical format looks like following:

0x96 0x01 0xB12E5468654C617965722E54686554797065 0xD922546865436F6D70616E792E54686550726F647563742E546865436F6D706F6E656E74 0x94 0x01 0x02 0x03 0x04 0xC0 0xC0

It is 63 bytes binary instead of 142 bytes UTF-8 encoded string.