In [1]:
import asdl
from types import ModuleType
def _asdl_parse(str):
    parser = asdl.ASDLParser()
    module = parser.parse(str)
    return module

ASDL itself has a grammar that looks like

```
module        ::= "module" Id "{" [definitions] "}"
definitions   ::= { TypeId "=" type }
type          ::= product | sum
product       ::= fields ["attributes" fields]
fields        ::= "(" { field, "," } field ")"
field         ::= TypeId ["?" | "*"] [Id]
sum           ::= constructor { "|" constructor } ["attributes" fields]
constructor   ::= ConstructorId [fields]
```

At the top level is the module, with an `Id` and a list of definitions, each of which defines a `type`, which is either a `product` (nameless typed-tuple) or a `sum`.

The ASDL module from the Python compiler has a way to `check` for well-formedness of the parsed ASDL description.  However, it's (a) not used for Python itself, and (b) has some constraints we want to slightly violate.  Instead, we'll verify the output of `_asdl_parse` ourselves.

# An Example Grammar

Consider the following simple language of polynomials as an example.  We would like to be able to represent expressions like `x*x + 32`

In [2]:
Poly = _asdl_parse("""
module Poly
{
  expr = Var(string name)
       | Const(float val)
       | Add(expr lhs, expr rhs)
       | Mul(expr lhs, expr rhs)
       attributes (srcinfo? loc)

  srcinfo  = (string input, int offset)
}
""")

In [3]:
type(Poly), Poly.__dict__

(asdl.Module,
 {'name': 'Poly',
  'dfns': [Type(expr, Sum([Constructor(Var, [Field(string, name)]), Constructor(Const, [Field(float, val)]), Constructor(Add, [Field(expr, lhs), Field(expr, rhs)]), Constructor(Mul, [Field(expr, lhs), Field(expr, rhs)])], [Field(srcinfo, loc, opt=True)])),
   Type(srcinfo, Product([Field(string, input), Field(int, offset)]))],
  'types': {'expr': Sum([Constructor(Var, [Field(string, name)]), Constructor(Const, [Field(float, val)]), Constructor(Add, [Field(expr, lhs), Field(expr, rhs)]), Constructor(Mul, [Field(expr, lhs), Field(expr, rhs)])], [Field(srcinfo, loc, opt=True)]),
   'srcinfo': Product([Field(string, input), Field(int, offset)])}})

The ASDL parser converts the above description string into an ASDL-AST, not into anything operationally useful in Python code.  We need to somehow extract useful information from this AST and construct Python tools/code from it.  There are two entries, `'dfns'` and `'types'` that we can get useful bits from.  Of these, the `'types'` entry is slightly more processed.

-----

Now consider our example expression `x*x + 32`.  Presumably we want some kind of python object structure that looks something like...

```
Add {
  lhs = Mul {
          lhs = Var { name = "x" },
          rhs = Var { name = "x" }
        },
  rhs = Const { val = 32 }
}
```
(where the optional `srcinfo` annotations have been suppressed)

This suggests that we might want to somehow provide some classes out of which this object-tree can actually be constructed.

In [4]:
class expr():
    def __init__(self):
        assert false, "do not instantiate expr directly"

class Var(expr):
    def __init__(self, name):
        assert isinstance(name, str), "expected string as name"
        self.name = name
    def __repr__(self):
        return f'Var(name={self.name!r})'

class Const(expr):
    def __init__(self, val):
        assert isinstance(val, int), "expected int as val"
        self.val = val
    def __repr__(self):
        return f'Const(val={self.val!r})'

class Add(expr):
    def __init__(self, lhs, rhs):
        assert isinstance(lhs, expr), "expected expr as lhs"
        assert isinstance(rhs, expr), "expected expr as rhs"
        self.lhs = lhs
        self.rhs = rhs
    def __repr__(self):
        return f'Add(lhs={self.lhs!r},rhs={self.rhs!r})'

class Mul(expr):
    def __init__(self, lhs, rhs):
        assert isinstance(lhs, expr), "expected expr as lhs"
        assert isinstance(rhs, expr), "expected expr as rhs"
        self.lhs = lhs
        self.rhs = rhs
    def __repr__(self):
        return f'Mul(lhs={self.lhs!r},rhs={self.rhs!r})'

In [5]:
x    = Var("x")
xx   = Mul(x,x)
xx32 = Add(xx,Const(32))
xx32

Add(lhs=Mul(lhs=Var(name='x'),rhs=Var(name='x')),rhs=Const(val=32))

Ultimately, we might want a better error checking mechanism than a bunch of asserts, but the basic idea is well sketched out.  We want to realize the ASDL grammar into the Python type system, including smart use of sub-classing, and at least equipped with some minimal type-checking.  And we should be able to get at least a rudimentary string representation for inspection and serialization.

What we've just done is sketch out the concrete output we want for the specific instance of the `Poly` asdl.  However, what we are now going to try to write should work for _any_ ASDL, not just `Poly`.

# Constructing the Constructors

To begin, let's extract all of the types from the module and construct a corresponding (super-)class for each one.  In doing so, we will need to distinguish between `Sum`s and `Product`s.  The `Sum`s are non-instantiable super-classes, but the `Product`s will end up being constructors themselves.

In [6]:
def _build_superclasses(asdl_mod):
    scs = {}
    def create_invalid_init(nm):
        def invalid_init(self):
            assert false, f"{nm} should never be instantiated"
        return invalid_init
    
    for nm,v in asdl_mod.types.items():
        if isinstance(v,asdl.Sum):
            scs[nm] = type(nm,(),{"__init__" : create_invalid_init(nm)})
        elif isinstance(v,asdl.Product):
            scs[nm] = type(nm,(),{})
    return scs

In [7]:
Poly_SCs = _build_superclasses(Poly)
Poly_SCs

{'expr': __main__.expr, 'srcinfo': __main__.srcinfo}

The typing of fields in various sum-constructors and products of the ASDL either refers to these superclasses (aka. _types_) or to some externalized _builtin_ types.  We know how to check whether an object is of one of these newly created types (just use `isinstance`) but we don't necessarily know what it means to be a `string` or an `int`.  On the other hand, we do know what it should mean for some basic types built into Python.

All together, this suggests that we need an extensible mechanism for appealing to built-in (or externalized) object type checking.

In [8]:
_builtin_checks = {
    'string'  : lambda x: type(x) is str,
    'int'     : lambda x: type(x) is int,
    'object'  : lambda x: x is not None,
    'float'   : lambda x: type(x) is float,
    'bool'    : lambda x: type(x) is bool,
}

def _build_checks(asdl_mod, scs, ext_checks):
    checks = _builtin_checks.copy()
    def make_check(sc):
        return lambda x: isinstance(x,sc)
    
    for nm in ext_checks:
        checks[nm] = ext_checks[nm]
    for nm in scs:
        assert not nm in checks, f"Name conflict for type '{nm}'"
        sc = scs[nm]
        checks[nm] = make_check(sc)
    return checks

In [9]:
Poly_checks = _build_checks(Poly, Poly_SCs, {})

Observe above.  By staging the construction of stub-classes corresponding to the types first, we were then able to construct all of the `checks` functions that we'll need to check whether or not a Python object satisfies the `Poly` grammar, or whichever other grammar we specified in ASDL.  However, we haven't built the actual constructors yet.

This particular sequencing of the construction is necessary, and a common pattern in compiler design when working with mutually recursive objects (which ASDL grammars are).  If you fail to identify a safe stage 1 that breaks apart the recursion in stage 2, you'll find yourself (and your execution) tying yourself in knots.

In [10]:
def _build_classes(asdl_mod, ext_checks={}):
    SC   = _build_superclasses(asdl_mod)
    CHK  = _build_checks(asdl_mod, SC, ext_checks)
    
    mod  = ModuleType(asdl_mod.name)
    
    Err  = type(asdl_mod.name+"Err",(Exception,),{})
    
    def basic_check(i,name,typ,indent="    "):
        typname = typ
        if typ in SC:
            typname = asdl_mod.name + "." + typ
        return (f"{indent}if not CHK['{typ}']({name}):\n"
                f"{indent}    raise Err("
                f"'expected arg {i} \"{name}\" "
                f"to be type \"{typname}\"')")
    def opt_check(i,name,typ,indent="    "):
        subidnt = indent + '    '
        return (f"{indent}if {name} is not None:\n"
                f"{basic_check(i,name,typ,subidnt)}")
    def seq_check(i,name,typ,indent="    "):
        subidnt = indent + '        '
        return (f"{indent}if type({name}) is list:\n"
                f"{indent}    for j,e in enumerate({name}):\n"
                f"{basic_check(i,name+'[j]',typ,subidnt)}")
    
    def create_initfn(C_name, fields):
        argstr   = ', '.join([ f.name for f in fields ])
        checks   = '\n'.join([
            seq_check(i,f.name,f.type) if f.seq else
            opt_check(i,f.name,f.type) if f.opt else
            basic_check(i,f.name,f.type)
            for i,f in enumerate(fields)
        ])
        assign   = '\n    '.join([
            f"self.{f.name} = {f.name}"
            for f in fields
        ])
        if len(fields) == 0:
            checks = "    pass"
            assign = "pass"
        
        exec_out = { 'Err': Err, 'CHK': CHK }
        exec_str = (f"def {C_name}_init(self,{argstr}):"
                    f"\n{checks}"
                    f"\n    {assign}")
        # un-comment this line to see what's
        # really going on
        #print(exec_str)
        exec(exec_str, exec_out)
        return exec_out[C_name + '_init']
    
    def create_reprfn(C_name, fields):
        prints   = ','.join([
            f"{f.name}={{self.{f.name}}}"
            for f in fields
        ])
        exec_out = { 'Err': Err }
        exec_str = (f"def {C_name}_repr(self):"
                    f"\n    return f\"{C_name}({prints})\"")
        # un-comment this line to see what's
        # really going on
        #print(exec_str)
        exec(exec_str, exec_out)
        return exec_out[C_name + '_repr']
        
    def create_prod(nm,t):
        C          = SC[nm]
        fields     = t.fields
        C.__init__ = create_initfn(nm,fields)
        C.__repr__ = create_reprfn(nm,fields)
        return C
    
    def create_sum_constructor(tname,cname,T,fields):
        C          = type(cname,(T,),{
            '__init__' : create_initfn(cname,fields),
            '__repr__' : create_reprfn(cname,fields),
        })
        return C
    
    def create_sum(typ_name,t):
        T          = SC[typ_name]
        afields    = t.attributes
        for c in t.types:
            C      = create_sum_constructor(
                        typ_name, c.name, T,
                        c.fields + afields )
            assert (not hasattr(mod,c.name)), (
                f"name '{c.name}' conflict in module '{mod}'")
            setattr(T,c.name,C)
            setattr(mod,c.name,C)
        return T
    
    for nm,t in asdl_mod.types.items():
        if isinstance(t,asdl.Product):
            setattr(mod,nm,create_prod(nm,t))
        elif isinstance(t,asdl.Sum):
            setattr(mod,nm,create_sum(nm,t))
        else: assert false, "unexpected kind of asdl type"
            
    return mod

The above function includes a lot of complicated meta-programming of the classes we're looking to generate.  The result is a module object with entries for all of the constructors.

In [11]:
P = _build_classes(Poly)
P, P.__dict__

(<module 'Poly'>,
 {'__name__': 'Poly',
  '__doc__': None,
  '__package__': None,
  '__loader__': None,
  '__spec__': None,
  'Var': __main__.Var,
  'Const': __main__.Const,
  'Add': __main__.Add,
  'Mul': __main__.Mul,
  'expr': __main__.expr,
  'srcinfo': __main__.srcinfo})

Now, we ought to be able to construct the `x*x + 32` expression we had in mind to start with.

In [12]:
x    = P.Var("x",None)
xx   = P.Mul(x,x,None)
xx32 = P.Add(xx,P.Const(32.,None),None)
xx32

Add(lhs=Mul(lhs=Var(name=x,loc=None),rhs=Var(name=x,loc=None),loc=None),rhs=Const(val=32.0,loc=None),loc=None)

We can package up what we've done into a single function that will take an ASDL grammar as a string and return the corresponding Python module containing all the desired types and constructors.

I'll call it `ADT` for Algebraic Data Type, mainly because that doesn't conflict with `ASDL`.

In [1]:
def ADT(asdl_str, ext_checks={}):
    asdl_ast = _asdl_parse(asdl_str)
    mod      = _build_classes(asdl_ast,ext_checks)
    # cache values in case we might want them
    mod._ext_checks = ext_checks
    mod._ast = asdl_ast
    return mod