Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Serializing/Deserializing type Any: bugfix and simplification #1344

Closed
wants to merge 1 commit into from

Conversation

kmalik22
Copy link

Summary:
Current serialization and de-serialization of type Any in PyText has a bug. This diff:

  • Fixes the bug, by just simplifying handling of Any.
  • Adds a unit-test that exposes the bug.

====================

PyText Config Serialization for Any

PyText config serializes type Any by prefixing the real type of the object before the actual object.
This type prepending is important for PyText specific types (like Model). However, for a generic type like Any, this is unnecessary, and instead causes a bug.

Eg: if a config looks like this:
class AConfig(ConfigBase):
a_param: Any

a_config: AConfig

And at run-time, the input is:

a_config: {
    a_param: [1, 2, 3]
}

Pytext will serialize this config by converting a_param into a dict, and pre-pending the type of the input:
a_config: {
a_param: {'list': [1,2,3]}
}

====================

Bug

When de-serializing a config, we handle this type prepending for Any something like this:
def _any_from_json(json_obj):
if(_is_dict(json_obj)):
# aha, this must be type pre-pending!
assert(len(json_obj) == 1
# return the first value
return list(json_obj.values())[0]

This creates a bug if something of type Any is actually a Dict. E.g.:

class AConfig(ConfigBase):
    param_2: Dict[str, Any]
a_config: AConfig

Assume at run-time, the input is:
a_config: {
"param_2" : {
"nested_param1": 10,
"nested_param2": 2,
}
}

Parsing this config generates an error:

{F235368935}

====================

Fix and simplification

This bug is fixed by simpler serialization of Any: type doesn't need to prepended at all

====================

Unit Test

Added a unit test that exposes the bug above by taking a json input, and checks that deserialized(serialized(input)) == input

====================

Who does this affect

AFAIK, only Federated Learning trainers use type Any (for information hiding). This doesn't affect anyone other than people working on Federated Learning.

Reviewed By: jessemin, psuzhanhy

Differential Revision: D21244134

Summary:
Current serialization and de-serialization of type `Any` in PyText has a bug. This diff:

- Fixes the bug, by just simplifying handling of `Any`.
- Adds a unit-test that exposes the bug.

====================
## PyText Config Serialization for `Any`

PyText config serializes type `Any` by prefixing the real type of the object before the actual object.
This type prepending is important for PyText specific types (like `Model`). However, for a generic type like `Any`, this is unnecessary, and instead causes a bug.

Eg: if a config looks like this:
    class AConfig(ConfigBase):
        a_param: Any

    a_config: AConfig

And at run-time, the input is:

    a_config: {
        a_param: [1, 2, 3]
    }

Pytext will serialize this config by converting `a_param` into a `dict`, and pre-pending the type of the input:
    a_config: {
        a_param: {'list': [1,2,3]}
    }

====================
## Bug

When de-serializing a config, we handle this type prepending for `Any` something like this:
    def _any_from_json(json_obj):
        if(_is_dict(json_obj)):
            # aha, this must be type pre-pending!
            assert(len(json_obj) == 1
            # return the first value
            return list(json_obj.values())[0]

This creates a bug if something of type `Any` is actually a `Dict`. E.g.:

    class AConfig(ConfigBase):
        param_2: Dict[str, Any]
    a_config: AConfig

Assume at run-time, the input is:
    a_config: {
        "param_2" : {
            "nested_param1": 10,
            "nested_param2": 2,
        }
    }

Parsing this config generates an error:

{F235368935}

====================
## Fix and simplification

This bug is fixed by simpler serialization of `Any`: type doesn't need to prepended at all

====================
## Unit Test

Added a unit test that exposes the bug above by taking a json input, and checks that `deserialized(serialized(input)) == input`

====================
## Who does this affect

AFAIK, only Federated Learning trainers use type `Any` (for information hiding). This doesn't affect anyone other than people working on Federated Learning.

Reviewed By: jessemin, psuzhanhy

Differential Revision: D21244134

fbshipit-source-id: 0cce1e0d85fe53af158a8aa3d79065b3fb7f1512
@facebook-github-bot facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Apr 29, 2020
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D21244134

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 51de260.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants