|
| 1 | +Intermediate Type Language |
| 2 | +========================== |
| 3 | + |
| 4 | +Intermediate Type Language (ITL) is an attempt to provide a common |
| 5 | +type system for serialization schemes in a machine-friendly format. |
| 6 | + |
| 7 | +# Problem |
| 8 | +A common practice for applications that transmit or store data is to |
| 9 | +define the structure of the data in a neutral representation and then |
| 10 | +generate types that provide a native representation of the data in a |
| 11 | +particular programming language and code that can serialize and |
| 12 | +deserialize those types. |
| 13 | + |
| 14 | +There are three common problems with this approach. First, the |
| 15 | +neutral representation used to describe the data is often too heavy |
| 16 | +for automated translation. For example, IDL has a number of |
| 17 | +user-friendly features that are not machine friendly. Second, the |
| 18 | +type system is often coupled to the serialization scheme. |
| 19 | +Translating between serialization schemes leads to an N^2 problem of |
| 20 | +matching types in different systems. Third, serialization often |
| 21 | +assumes that the source or target is a native object in a programming |
| 22 | +language. That is, the language-neutral type has a corresponding |
| 23 | +concrete type in a programming language and the goal is to serialize |
| 24 | +to and from a value of the concrete type. This prevents potential |
| 25 | +optimization where the data is left in a serialized form and |
| 26 | +selectively deserialized as needed. |
| 27 | + |
| 28 | +# Use Case |
| 29 | + |
| 30 | +The primary use case for ITL is the development of translators that |
| 31 | +convert from one serialization scheme to another. The user provides a |
| 32 | +description of the incoming/outgoing data using ITL. The translator |
| 33 | +uses the ITL description of the data to perform the appropriate |
| 34 | +translation. The translator may interpret the ITL at run-time or the |
| 35 | +translator may be generated from ITL. |
| 36 | + |
| 37 | +# Design Goals |
| 38 | + |
| 39 | +1. General - ITL should support common types found in existing |
| 40 | + infrastructure such as IDL, FAST, Avro, Google Protocol Buffers, |
| 41 | + Thrift, etc. |
| 42 | +2. Machine-friendly - ITL should be easy to generate, easy to parse, |
| 43 | + and easy to use once parsed. |
| 44 | +3. Extensible - ITL should provide a means of annotating types with |
| 45 | + their intended use and external encoding-specific details, e.g., delta |
| 46 | + compression. |
| 47 | + |
| 48 | +ITL is descriptive and not prescriptive. The types that can be |
| 49 | +described with ITL may be a subset or superset of the types that can |
| 50 | +be described in another language. If a tool cannot describe a type in |
| 51 | +ITL, then ITL should not be used (and the user should be informed). |
| 52 | +If a serializer or deserializer is given a type that cannot be |
| 53 | +represented in that serialization scheme, then an appropriate failure |
| 54 | +mode should be adopted. |
| 55 | + |
| 56 | +# Scalar Types |
| 57 | + |
| 58 | +- **int** - Represents an integral number. An int has the number of |
| 59 | + bits needed to represent values of this type and a flag indicating |
| 60 | + if the values are unsigned. If the number of bits is not present, |
| 61 | + then the values of this type may have arbitrary magnitude. The |
| 62 | + unsigned flag is optional and assumed to be false. |
| 63 | +- **float** - Represents a floating-point number. A float has an |
| 64 | + optional model that describes the values represented by this |
| 65 | + type. |
| 66 | +- **fixed** - Represents a fixed-point number. A fixed has a base, |
| 67 | + the total number of digits, and a scale that indicates the number of |
| 68 | + digits after the decimal point. |
| 69 | +- **string** - Represents a text sequence. |
| 70 | + |
| 71 | +Integers, fixed-point numbers, and strings have an optional set of |
| 72 | +name-value pairs and an optional flag indicating if values of this |
| 73 | +type are constrained to the specified set of values. A value in a |
| 74 | +name-value pair is stored as a string for use as a union |
| 75 | +discriminator. If an integer, fixed-point, or string is used as a |
| 76 | +discriminator, then the set of name-value pairs must be one-to-one. |
| 77 | + |
| 78 | +# Compound Types |
| 79 | + |
| 80 | +- **sequence** - Represents a homogenous sequence of values of a given |
| 81 | + type. A sequence has has either: |
| 82 | + 1. No size or capacity indicating a dynamic size. |
| 83 | + 2. An integer size indicating a fixed size. |
| 84 | + 3. An array of sizes indicating the size of each dimension. |
| 85 | + 4. A capacity indicating a dynamic size but limit on the number of values in the sequence. |
| 86 | + The size setting is preferred to the capacity. If the elements |
| 87 | + have a fixed size, then size and capacity can be used to |
| 88 | + pre-allocate buffers. |
| 89 | +- **record** - A record represents a potentially heterogeneous sequence of |
| 90 | + named values. A record is defined by a list of fields. Each field |
| 91 | + has a name, a type, and an optional flag indicating if the field is |
| 92 | + optional. The name of each field must be unique. |
| 93 | +- **union** - A union represents a value from a finite set of types. |
| 94 | + A union has a discriminator type (int, fixed, string) that is used |
| 95 | + to determine the actual type and a non-empty set of fields. A union |
| 96 | + field has a name, a type, and a set of labels of the discriminator |
| 97 | + type. A label must correspond to a named value of the discriminator |
| 98 | + type. The name of each field must be unique. The pair-wise |
| 99 | + intersection of union field labels must be disjoint. An empty set |
| 100 | + of labels means that this field is the default. |
| 101 | +- **alias** - An alias for another type. An alias has a name and type. |
| 102 | + |
| 103 | +# Float Models |
| 104 | + |
| 105 | +A float model refers to a specification for floating-point numbers. |
| 106 | +When a model is specified for a floating-point type, it means that any |
| 107 | +value of the corresponding type *may* be represented by an |
| 108 | +implementation of the model. An implementation is not restrained by |
| 109 | +the model in its approach to encoding the number. However, |
| 110 | +implementations and users must be prepared to handle lossy conversions |
| 111 | +and respond appropriately. |
| 112 | + |
| 113 | +- "binary16" - IEEE 754 of same name |
| 114 | +- "binary32" - IEEE 754 of same name |
| 115 | +- "binary64" - IEEE 754 of same name |
| 116 | +- "binary128" - IEEE 754 of same name |
| 117 | +- "decimal32" - IEEE 754 of same name |
| 118 | +- "decimal64" - IEEE 754 of same name |
| 119 | +- "decimal128" - IEEE 754 of same name |
| 120 | + |
| 121 | +# Annotations |
| 122 | + |
| 123 | +Annotations provide a way to capture semantics about encoded data that |
| 124 | +govern its use. To illustrate |
| 125 | +the first, consider the problem of serializing a set. A serialized |
| 126 | +set looks like a sequence. However, when deserializing, the |
| 127 | +translator should attempt to restore set semantics by using an |
| 128 | +appropriate data type. In this case, the sequence should |
| 129 | +be annotated as a set. Annotations also provide a way to record |
| 130 | +details related to a particular encoding. For example, FAST delta |
| 131 | +compression assumes a know starting value and then sends updates to |
| 132 | +that value. In this case, the field containing the value should be |
| 133 | +annotated with delta compression so that translators will know (and |
| 134 | +can take advantage of) this fact. |
| 135 | + |
| 136 | +Annotations are a set of key/value pairs where each key corresponds a |
| 137 | +system. For the set example, the key may be "semantic" and the value |
| 138 | +may be { "preferredDataType" : "set" }. For the delta compression |
| 139 | +example, the key may be "FAST" and the value may be { "compression" : |
| 140 | +"delta" }. Value nesting is allowed to facilitate the creation of |
| 141 | +ontologies for different systems. |
| 142 | + |
| 143 | +# Implementation |
| 144 | + |
| 145 | +ITL is written using JSON to achieve machine friendliness. ITL |
| 146 | +presents a self-contained representation of types. There is no |
| 147 | +facility from importing types from external resources. There is no |
| 148 | +direct support for inheritance. |
| 149 | + |
| 150 | +# Grammar |
| 151 | + |
| 152 | +The grammar is presented as a JSON/BNF hybrid. Non-terminals are |
| 153 | +capitalized (Root) and non-terminals are lower-case (int). |
| 154 | +Terminals refer to JSON values with the same name. The terminal |
| 155 | +"value" represents any JSON value. The construct ( ... )? represents |
| 156 | +an optional group. |
| 157 | + |
| 158 | +``` |
| 159 | +Root: |
| 160 | + { "types" : [ TypeDef ] } |
| 161 | +
|
| 162 | +TypeDef: |
| 163 | + { "kind" : "int" (, "bits" : integer)? (, "unsigned" : boolean)? (, "values" : Values)? (, "constrained" : boolean)? } |
| 164 | +| { "kind" : "float" (, "model" : FloatModel)? } |
| 165 | +| { "kind" : "fixed", "base" : integer, "digits" : integer, "scale" : integer (, "values" : Values)? (, "constrained" : boolean)? } |
| 166 | +| { "kind" : "string" (, "values" : Values)? (, "constrained" : boolean)? } |
| 167 | +| { "kind" : "sequence", "type" : Type (,("size" : integer ) | ("size" : [ integer ] )? (, "capacity" : integer )? } |
| 168 | +| { "kind" : "record", "fields" : [ Field ] } |
| 169 | +| { "kind" : "union", "discriminator" : Type, "fields" : [ UnionField ] } |
| 170 | +| { "kind" : "alias", "name" : string, "type" : Type } |
| 171 | +
|
| 172 | +Type: |
| 173 | + string |
| 174 | +| TypeDef |
| 175 | +
|
| 176 | +Field: |
| 177 | + { "name" : string, "type" : Type, ("optional" : boolean)? } |
| 178 | +
|
| 179 | +UnionField: |
| 180 | + { "name" : string, "type" : Type, "labels" : [ string ] } |
| 181 | +
|
| 182 | +FloatModel: "binary16" | "binary32" | "binary64" | "binary128" | "decimal32" | "decimal64" | "decimal128" |
| 183 | +
|
| 184 | +Values: JSON object where all field values are strings |
| 185 | +``` |
| 186 | + |
| 187 | +Every JSON Object ({ ... }) has an optional note field ("note" : { |
| 188 | +... }) for annotating the field, type, etc. |
0 commit comments