In [1]:
import asdf
import os
import numpy as np

from dataclasses import dataclass

# How to write an ASDF Extension

As discussed in tutorial 8, ASDF has the ability to serialize objects beyond those
which are intrinsically supported, such as those `astropy` objects supported by the
extension library `asdf-astropy`. However, beyond the capability of extending ASDF
to support these objects, no further discussion occured.

Here we discuss how to write an ASDF `Extension` such that the object(s) described
by that extension can be seamlessly serialized to or deserialized from an ASDF file,
provided that the `Extension` is installed or made available to ASDF.

## An Example Object

Let's create a relatively simple Python object, which we would like to handle seamlessly
with ASDF. For our purposes lets consider a geometric ellipse described by its
- semi-major axis
- semi-minor axis

In [2]:
@dataclass
class Ellipse:
    """An ellipse defined by semi-major and semi-minor axes.

    Note: Using a dataclass to define the object so that we get `==` for free.
    """

    semi_major: float
    semi_minor: float

Note that ASDF will handle objects contained inside the objects you wish to serialize
provided that those objects are handled intrinsically by ASDF or an extension which
handles each particular object is available for ASDF to use. For example, if we wanted
to specify the axes of the ellipse using `astropy` `Quantity` objects (to attach units),
so long as `asdf-astropy` is installed, ASDF would handle this transparently.

## Writing a Basic Extension

An extension requires two components to function properly:
1. A `tag` for the object, so that ASDF identify/validate the object to deserialize it
from an ASDF file.
2. A `Converter` for the object, so ASDF knows how to serialize and deserialize the
object to and from an ASDF file.

The `tag` is defined through the schemas and related resources for ASDF to use, while
the `Converter` is a python object which provides the code the ASDF library executes
in order to handle the serialization or deserialization process.

### Creating a `tag`

Recall that ASDF supports the use of schemas for validating the correctness of the
information stored within its files. Often one wishes to create a schema for a specific
object so that the particular object the schema description can be reused in other
schemas. A `tag` is a reference to a specific schema or set of schemas that a particular
value in an ASDF file tree need to satisfy. Typically, a given `tag` refers to a
particular object which is represented in ASDF by the sub-tree located at that value.
Thus the `tag` serves two purposes:
1. Identifying the schema used to validate a sub-tree of the ASDF tree.
2. Identifying the object a particular sub-tree describes.

This means that in order to create a `tag` for a given Python object we really need
to create resource `yaml` files for ASDF to do two things:
1. Contain schema(s) used by that `tag`.
2. Create an entry for that `tag` in ASDF.

#### Creating a Schema for ASDF

An overview of how schemas work was given in tutorial 3, here we will discuss how
to tell ASDF about schemas without having to specify one when opening an ASDF file.

For our example object we can define the schema text as:

In [3]:
ellipse_uri = "asdf://example.com/example-project/schemas/ellipse-1.0.0"

ellipse_schema_content = f"""
%YAML 1.1
---
$schema: http://stsci.edu/schemas/yaml-schema/draft-01
id: {ellipse_uri}

type: object
properties:
  semi_major:
    type: number
  semi_minor:
    type: number
required: [semi_major, semi_minor]
...
"""

This can then be dynamically added to ASDF using the `add_resource_mapping`. This adds a map (`dict`)
between a `uri` (universal resource identifier) string and the content of the resource to asdf. 

Note we highly recommend as best practice to always have the `id` for any resource and the
`uri` string be the same. This is to limit the possibility of confusing how to look-up the
given schema as JSON schema (the base language/library used for ASDF schemas) uses the `id`
field to reference resources among one-another, while ASDF uses the `uri` as keys to find
those resources on disk. One does not have to follow this practice, but it is highly discouraged.

In [4]:
asdf.get_config().add_resource_mapping({ellipse_uri: ellipse_schema_content})

Later we will go over how to add resources automatically via python entry-points.

Lets now load and check that our schema is a valid schema: 

In [5]:
schema = asdf.schema.load_schema(ellipse_uri)
asdf.schema.check_schema(schema)

Note that if you are developing a schema the `asdf.schema.check_schema` will also work
directly on any `yaml` file which is loaded through the `pyyaml` interface.

Next lets check that the schema will validate or fail to validate some test trees

In [6]:
# Valid tree

valid_ellipse_tree = {"semi_major": 1.0, "semi_minor": 2.0}
asdf.schema.validate(valid_ellipse_tree, schema=schema)

In [7]:
# Invalid Tree

invalid_ellipse_tree = {"semi_major": 3.0}
# asdf.schema.validate(invalid_ellipse_tree, schema=schema)

#### Creating the `tag` Itself

ASDF uses a special `schema` to specify the `tag`s for a given ASDF extension.
This special type of `schema` is called a `manifest` which lists each `tag` as a
pair of `uri`s:
- `tag_uri`, the `uri` which will be used for the `tag`.
- `schema_uri`, the ASDF `uri` used to reference the specific schema involved.

This allows for a given `schema` to be reused for multiple `tag`s. Such as for
objects which contain the same serializable data, but have different Python
functionalities which need to be distinguished.

The following is an example for creating/adding a manifest for an extension which
has the resources for the `Ellipse` object:

In [8]:
ellipse_manifest_uri = "asdf://example.com/example-project/manifests/shapes-1.0.0"
ellipse_extension_uri = "asdf://example.com/example-project/extensions/shapes-1.0.0"
ellipse_tag = "asdf://example.com/example-project/tags/ellipse-1.0.0"

ellipse_manifest_content = f"""
%YAML 1.1
---
id: {ellipse_manifest_uri}
extension_uri: {ellipse_extension_uri}

title: Example Shape extension 1.0.0
description: Tags for example shape objects.

tags:
  - tag_uri: {ellipse_tag}
    schema_uri: {ellipse_uri}
...
"""

asdf.get_config().add_resource_mapping({ellipse_manifest_uri: ellipse_manifest_content})

Note that the `extension_uri` field defines the `uri` that the whole `Extension` (resource(s)
combined with `Converter`(s)) uses within ASDF. The `extension_uri` will be referenced later
by the `Extension` object so that the extension code will be available when the correct
resources are available and vice-versa.

Again we can check the `manifest` just like any other schema:

In [9]:
# check
schema = asdf.schema.load_schema(ellipse_manifest_uri)
asdf.schema.check_schema(schema)
asdf.schema.validate(ellipse_manifest_content, schema=schema)

### Create a `Converter`

All converters should be constructed as subclasses of the abstract type `asdf.extension.Converter`,
which requires that you define two methods:

1. `to_yaml_tree`: which converts a Python object into an ASDF tree.
2. `from_yaml_tree`: which converts an ASDF tree into a python object.

Note that these methods can account for the type/tag of the objects attempting to be converted.

Moreover your converter also needs to define the following two variables:

1. `tags`: A list of tags that this converter will use when reading ASDF.
2. `types`: A list of Python (object) types that this converter will use when writing ASDF.

Note that these lists do not need to be indexed with respect to each other, and that in order for
the converter to actually be used by ASDF, at least one of the `tags` needs to be registered as a
resource with ASDF (usually via the entry point).

An example converter for `Ellipse`:

In [10]:
class EllipseConverter(asdf.extension.Converter):
    tags = [ellipse_tag]
    types = [Ellipse]

    def to_yaml_tree(self, obj, tag, ctx):
        return {
            "semi_major": obj.semi_major,
            "semi_minor": obj.semi_minor,
        }

    def from_yaml_tree(self, node, tag, ctx):
        return Ellipse(semi_major=node["semi_major"], semi_minor=node["semi_minor"])

Note that, for performance of the entry points, one will normally defer the `import` of the object to be created
until `from_yaml_tree` is actually called.

### Create the Full Extension

Now lets dynamically create an extension for ASDF to support the `Ellipse` object using
the `EllipseConverter` we just created and the `ellipse_tag` we created earlier.

This can be accomplished via using the `asdf.extensions.ManifestExtension.from_uri` method, which
in our case requires two arguments:
1. The `manifest_uri`, the `uri` the `manifest` was added under.
2. The `converters`, a list of instances of `Converter` classes.

Note that one can also pass a list of `Compressor` (ASDF objects to handle custom binary block
compression).

An instance of the extension object can then be dynamically added to asdf using the `add_extension`
method.

In [11]:
ellipse_extension = asdf.extension.ManifestExtension.from_uri(ellipse_manifest_uri, converters=[EllipseConverter()])
asdf.get_config().add_extension(ellipse_extension)

Now let's test that we can round-trip an `Ellipse` object through our new extension.

In [12]:
ellipse = Ellipse(1.0, 2.0)

with asdf.AsdfFile() as af:
    af["ellipse"] = ellipse
    af.write_to("ellipse.asdf")

with open("ellipse.asdf") as f:
    print(f.read())

with asdf.open("ellipse.asdf") as af:
    print(af["ellipse"])
    assert af["ellipse"] == ellipse

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.12.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.12.0}
  - !core/extension_metadata-1.0.0 {extension_class: asdf.extension._manifest.ManifestExtension,
    extension_uri: 'asdf://example.com/example-project/extensions/shapes-1.0.0'}
ellipse: !<asdf://example.com/example-project/tags/ellipse-1.0.0> {semi_major: 1.0,
  semi_minor: 2.0}
...

Ellipse(semi_major=1.0, semi_minor=2.0)


As you can see, we have successfully created a full extension of ASDF to support the `Ellipse` object.

## Extending Our Example Object

Suppose that we want to extend our object so that it represents an Ellipse in 3D
(centered at the origin), that is add a position angle:

In [16]:
@dataclass
class RotatedEllipse(Ellipse):
    position_angle: float

### Extend the Schema

JSON schema does not support the concept of inheritance, which makes "extending"
an existing schema somewhat awkward. What we do instead is create a schema which
adds attributes to the existing schema via the `allOf` operation. In this case,
we can define the a schema for `RotatedEllipse` by adding a `position_angle` property:

In [17]:
rotated_ellipse_uri = "asdf://example.com/example-project/schemas/rotated_ellipse-1.0.0"

rotated_ellipse_schema_content = f"""
%YAML 1.1
---
$schema: http://stsci.edu/schemas/yaml-schema/draft-01
id: {rotated_ellipse_uri}

allOf:
  - $ref: {ellipse_uri}
  - properties:
      position_angle:
        type: number
    required: [position_angle]
...
"""

asdf.get_config().add_resource_mapping({rotated_ellipse_uri: rotated_ellipse_schema_content})

# check
schema = asdf.schema.load_schema(rotated_ellipse_uri)
asdf.schema.check_schema(schema)

test_rotated_ellipse_object = {"semi_major": 1.0, "semi_minor": 2.0, "position_angle": 3.0}
asdf.schema.validate(test_rotated_ellipse_object, schema=schema)

### Create an Updated Manifest

Let's assume that we already have `shapes-1.0.0` manifest (already released and
in use). Following our suggested versioning system, we should create a new manifest
which includes a new `rotated_ellipse-1.0.0`:

In [18]:
rotated_ellipse_manifest_uri = "asdf://example.com/example-project/manifests/shapes-1.1.0"
rotated_ellipse_extension_uri = "asdf://example.com/example-project/extensions/shapes-1.1.0"
rotated_ellipse_tag = "asdf://example.com/example-project/tags/rotated_ellipse-1.0.0"

rotated_ellipse_manifest_content = f"""
%YAML 1.1
---
id: {rotated_ellipse_manifest_uri}
extension_uri: {rotated_ellipse_extension_uri}

title: Example Shape extension 1.1.0
description: Tags for example shape objects.

tags:
  - tag_uri: {ellipse_tag}
    schema_uri: {ellipse_uri}

  - tag_uri: {rotated_ellipse_tag}
    schema_uri: {rotated_ellipse_uri}
...
"""

asdf.get_config().add_resource_mapping(
    {rotated_ellipse_manifest_uri: rotated_ellipse_manifest_content}
)


# check
schema = asdf.schema.load_schema(rotated_ellipse_manifest_uri)
asdf.schema.check_schema(schema)
asdf.schema.validate(rotated_ellipse_manifest_content, schema=schema)

### Create an Updated `Converter`

The "simplest" approach to creating a `Converter` for `RotatedEllipse` would be to simply
create a new converter as we did above for `Ellipse`; however, we can also take advantage
of the fact that multiple `tags` and `types` can be listed. Note that when multiple tags
are handled by the same `Converter`, we need to also implement a `select_tag` method:

In [19]:
class UpdatedEllipseConverter(asdf.extension.Converter):
    tags = [ellipse_tag, rotated_ellipse_tag]
    types = [Ellipse, RotatedEllipse]

    def select_tag(self, obj, tag, ctx):
        if isinstance(obj, RotatedEllipse):
            return rotated_ellipse_tag
        elif isinstance(obj, Ellipse):
            return ellipse_tag
        else:
            raise ValueError(f"Unknown object {type(obj)}")

    def to_yaml_tree(self, obj, tag, ctx):
        tree = {
            "semi_major": obj.semi_major,
            "semi_minor": obj.semi_minor,
        }

        if tag == rotated_ellipse_tag:
            tree["position_angle"] = obj.position_angle

        return tree

    def from_yaml_tree(self, node, tag, ctx):
        if tag == ellipse_tag:
            return Ellipse(**node)
        elif tag == rotated_ellipse_tag:
            return RotatedEllipse(**node)
        else:
            raise ValueError(f"Unknown tag {tag}")

### Creating an Updated `Extension`

We can now use this converter to create a new "updated" extension:

In [20]:
rotated_ellipse_extension = asdf.extension.ManifestExtension.from_uri(rotated_ellipse_manifest_uri, converters=[UpdatedEllipseConverter()])
asdf.get_config().add_extension(rotated_ellipse_extension)

### Checking the New Extension

Lets check this new extension by writing both an `Ellipse` and `RotatedEllipse` object
to ASDF:

In [21]:
ellipse = Ellipse(1.0, 2.0)
rotated_ellipse = RotatedEllipse(1.0, 2.0, 3.0)

with asdf.AsdfFile() as af:
    af["ellipse"] = ellipse
    af["rotated_ellipse"] = rotated_ellipse
    af.write_to("rotated_ellipse.asdf")

# Check
with open("rotated_ellipse.asdf") as f:
    print(f.read())

with asdf.open("rotated_ellipse.asdf") as af:
    print(af["ellipse"])
    assert af["ellipse"] == ellipse

    print(af["rotated_ellipse"])
    assert af["rotated_ellipse"] == rotated_ellipse

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.12.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.12.0}
  - !core/extension_metadata-1.0.0 {extension_class: asdf.extension._manifest.ManifestExtension,
    extension_uri: 'asdf://example.com/example-project/extensions/shapes-1.1.0'}
ellipse: !<asdf://example.com/example-project/tags/ellipse-1.0.0> {semi_major: 1.0,
  semi_minor: 2.0}
rotated_ellipse: !<asdf://example.com/example-project/tags/rotated_ellipse-1.0.0> {
  position_angle: 3.0, semi_major: 1.0, semi_minor: 2.0}
...

Ellipse(semi_major=1.0, semi_minor=2.0)
RotatedEllipse(semi_major=1.0, semi_minor=2.0, position_angle=3.0)
