RTValue, MDM, IR #529
edwardpeters
started this conversation in
General
Replies: 1 comment
-
@AttilaMihaly Thinking about our discussion today, I don't think there's anything foundational to
I imagine adding another AST to the Morphir ecosystem alongside |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In the morphir-scala world, we have three very closely linked concepts:
In this document, we refer to “Value” as any instance of any of these three - IR.Value, RTValue, or MDM.Data.
These three types are very similar: You can represent
List(
Tuple(Int(1), String(“Red”)),
Tuple(Int(2), String(“Green”))
)
In any of the above, and the trees will look quite similar. If you run these through the evaluator path, you would start with elm code like [(1, “red”), (2, “Green”)], which would be compiled to IR, which the evaluator would evaluate to RTValues and then convert to MDM; each structure along the way would look almost identical, while existing in completely different type trees.
However, the three formats have significant differences:
What Each Represents (And how):
Dimensions
Representation of Complex Types
Values of a number of types including LocalDates, Maps and user-defined types may be represented differently between the models
Self Sufficiency
Trees in each model may make sense “on their own”, or they may include references that depend on some surrounding context. For instance, If true then 1 else 2 is entirely self sufficient; if true 1 else IntFunctions.foo() requires a global context that includes foo; if true 1 else x requires a local context in which x is defined.
A consequence of this is that the different representations are differently “Mobile”. An MDM value may be moved to any context (including other runtimes) and maintain its meaning. RTValues may move around within a runtime and remain valid, but may lose or change their meaning if sent to a different runtime with different global bindings. IR subttrees are only guaranteed to be well-defined at the location they appear in the IR; moving them may cause variable or reference nodes within them to lose meaning.
Unique/Canonical Representation
Depending on the model, a single value may or may not have different ways it can be represented (for some definition of single value).
Type Information
Trees in each model may or may not include type information on that element. In some cases, this type information may be derivable from the tree itself - if true “Red” else “Green is always a String, for instance - but in others the explicit type information may be necessary to know the type, such as for empty lists or Maybe.Nothing
Adding type information to RTValues may be a priority
Type Level Self Sufficiency
Similar to the values, this type information may or may depend upon surrounding context
This may complicate attaching type information to RTValues, as RTValues can “move” in a way which IR cannot, and could leave the scope in which their type information is valid
Type Level Unique/Canonical Representation
Similar to values, types may be represented in different ways. This complicates type checking, as asserting that two types are equivalent is not a simple structural equality check.
Remaining Work To Do
A value may be “done” - just data - or it may have remaining computation.
How Each is Used
Possible Changes & Challenges
Switch How Evaluator Input Works
Current Model
The current model of calling a function in a morphir IR file with a scala value is to create a synthetic IR Node representing the application of IR representing that scala value to a reference to that function, and evaluate this in a context which has the global namespace from the IR file. Step by step:
This model can be tricky for some types, such as maps, local dates and user-defined types: in these cases, the IR “Representation” of the input becomes a sequence of apply nodes to functions that would produce the value. For analogy, if you want to give the evaluator a cake, you de-construct the cake into flour, eggs, milk ad sugar, and pass those to the evaluator along with a recipe for cake.
This poses a potential performance concern when chaining multiple evaluator runs together, with each fed the output of the former.
Possible Alternative
An alternative model has occasionally been considered, which would instead convert the input to an RT value, lookup the body of the entry point function, and evaluate that body in a context including a store pre-populated with the input (bound to the argument name specified in the entry point). In this model:
If this model were adopted, it could have two benefits:
We could eliminate the tricky process of representing values as the IR that would produce those values (Passing the evaluator a cake just means passing it the cake)
If multiple evaluator runs are chained together, no conversion would be necessary between the output of one and the input of the next
However, it is possible that risks would be introduced. RTValues can include things like references to functions in the current IR file, which may be invalid if you blindly pass them along to another evaluator instance using a different IR file.
Eliminating RTValue
Given the complexity and overhead of maintaining three models, there have been several suggestions that RTValue be eliminated and replaced with one of the other two.
Replace with MDM
Replacing RTValue with MDM would mean expanding MDM with function representations, which could have IR within them, which could refer to functions specific to a given runtime.
Replace with IR
Replacing MDM with IR might be more reasonable, but has some non-trivial difficulties to explore:
Notable, this model is used in the morphir-elm interpreter - however, that interpreter is unable to handle certain cases, such as partial function application (the code examples below work in native elm but not in morphir-elm develop:)
Attach Type Information to RTValues
There’s been a desire to have type information ride along with RTValues, both for additional safety and to provide context when those values are passed downstream.
In particular, some functions (List.sum and List.product) have executions which require the type information to produce the correct result (in the case of empty lists).
This was initially thought to be a simple change, as the type information is attached to the IR node from which the RTValue is evaluated, so we should be able to just put that type information on the RTValue and have typed RTValues. On further thought this has some problems:
IR Types, like IR, may be dependent on their local scope for meaning. A node’s IR may e of type Type.Variable(“a”); that’s fine locally, but if we then build a tuple of values of Type.Variable(“a”) and return then from the generic function, we could have an RTValue like (1 : Type.Variable(“a”), 2 : Type.Variable(a)) : (Int, Int)`,which could then be passed to some scope in which a refers to a different type. I’m not sure specifically what bugs this would produce, but I think we’d rather not learn the hard way.
IR Types may be represented as aliases, with generic parameters applied along the way. For instance, if you have type alias MyList a = List a and type alias IntList = List int , then three different type trees - MyList Int, IntList and List Int all refer to the same type. This is not insurmountable - it is possible to de-alias either on the fly or as a pre-evaluation transformation - but it will need to be done for the type information to be useful. For example, in our motivating List.sum example, if your type is SomeTypeAlias SomeOtherTypeAlias, you need to be able to de-alias to tell what type of 0 to produce.
Performance hits may be possible. At least some work will need to be done at runtime to manage types, but this will likely be low; the greater concern is if we need to repeatedly explore type trees during execution to maintain sanity or correctness. For instance, if at each evaluation step we have to re-explore the tree to fix each level (to avoid our (0 : a, 1 : a) : (Int, Int) example above), this could represent a several fold increase in the total calculations performed.
It turns out that our motivating example may remain an issue anyway - List.sum may be carried out at a code position where the type is generic (number), and testing seems to indicate that such values may even be returned to the top level.
Should RTValue’s Type be more limited than IR’s type?
One possible consideration is that RTValues should have a type representation that is less powerful but more self-sufficient than that of IR. Several of the above problems could be eliminated (On the RTValue level) if RTValue types never contained aliases or type variables, similar to MDM.
This does not easily solve the problem, because we still need a sane and consistent way to generate these “concrete” type trees from the potentially richer type trees of Morphir, but it would at least give us a strong static guarantee that we never had such knots on the RTValue level.
Do we need pre-evaluation transformation passes to support this?
Another possible option for managing these concerns is to do pre-evaluation transformations of the IR:
Create Unified AST Tree
The driving distinctions between the three models are now to partially overlapping sets of values, more than the representation of those values. A significant chunk of each - all lists, tuples, records and primitives, along with their associated types - have the same core meaning and structure in each interpretation.
It could be possible to simplify things and reduce boilerplate by creating a single sealed trait hierarchy in which different nodes extend different subsets of these three. It should be possible to express such a type hierarchy (in Scala and in Rust - possibly not Elm) in such a way that some forms, such as Tuple and List, will belong to given types only if their child types do as well.
Beta Was this translation helpful? Give feedback.
All reactions