This repository has been archived by the owner on Oct 3, 2019. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
guide: The "Configuration Language Design" section
- Loading branch information
1 parent
280771f
commit 57c9a67
Showing
3 changed files
with
286 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,285 @@ | ||
Configuration Language Design | ||
============================= | ||
|
||
In this section we will cover some conventions for HCL-based configuration | ||
languages that can help make them feel consistent with other HCL-based | ||
languages, and make the best use of HCL's building blocks. | ||
|
||
HCL's native and JSON syntaxes both define a mapping from input bytes to a | ||
higher-level information model. In designing a configuration language based on | ||
HCL, your building blocks are the components in that information model: | ||
blocks, arguments, and expressions. | ||
|
||
Each calling application of HCL, then, effectively defines its own language. | ||
Just as Atom and RSS are higher-level languages built on XML, HashiCorp | ||
Terraform has a higher-level language built on HCL, while HashiCorp Nomad has | ||
its own distinct language that is *also* built on HCL. | ||
|
||
From an end-user perspective, these are distinct languages but have a common | ||
underlying texture. Users of both are therefore likely to bring some | ||
expectations from one to the other, and so this section is an attempt to | ||
codify some of these shared expectations to reduce user surprise. | ||
|
||
These are subjective guidelines however, and so applications may choose to | ||
ignore them entirely or ignore them in certain specialized cases. An | ||
application providing a configuration language for a pre-existing system, for | ||
example, may choose to eschew the identifier naming conventions in this section | ||
in order to exactly match the existing names in that underlying system. | ||
|
||
Language Keywords and Identifiers | ||
--------------------------------- | ||
|
||
Much of the work in defining an HCL-based language is in selecting good names | ||
for arguments, block types, variables, and functions. | ||
|
||
The standard for naming in HCL is to use all-lowercase identifiers with | ||
underscores separating words, like ``service`` or ``io_mode``. HCL identifiers | ||
do allow uppercase letters and dashes, but this primarily for natural | ||
interfacing with external systems that may have other identifier conventions, | ||
and so these should generally be avoided for the identifiers native to your | ||
own language. | ||
|
||
The distinction between "keywords" and other identifiers is really just a | ||
convention. In your own language documentation, you may use the word "keyword" | ||
to refer to names that are presented as an intrinsic part of your language, | ||
such as important top-level block type names. | ||
|
||
Block type names are usually singular, since each block defines a single | ||
object. Use a plural block name only if the block is serving only as a | ||
namespacing container for a number of other objects. A block with a plural | ||
type name will generally contain only nested blocks, and no arguments of its | ||
own. | ||
|
||
Argument names are also singular unless they expect a collection value, in | ||
which case they should be plural. For example, ``name = "foo"`` but | ||
``subnet_ids = ["abc", "123"]``. | ||
|
||
Function names will generally *not* use underscores and will instead just run | ||
words together, as is common in the C standard library. This is a result of | ||
the fact that several of the standard library functions offered in ``cty`` | ||
(covered in a later section) have names that follow C library function names | ||
like ``substr``. This is not a strong rule, and applications that use longer | ||
names may choose to use underscores for them to improve readability. | ||
|
||
Blocks vs. Object Values | ||
------------------------ | ||
|
||
HCL blocks and argument values of object type have quite a similar appearance | ||
in the native syntax, and are identical in JSON syntax: | ||
|
||
.. code-block:: hcl | ||
block { | ||
foo = bar | ||
} | ||
# argument with object constructor expression | ||
argument = { | ||
foo = bar | ||
} | ||
In spite of this superficial similarity, there are some important differences | ||
between these two forms. | ||
|
||
The most significant difference is that a child block can contain nested blocks | ||
of its own, while an object constructor expression can define only attributes | ||
of the object it is creating. | ||
|
||
The user-facing model for blocks is that they generally form the more "rigid" | ||
structure of the language itself, while argument values can be more free-form. | ||
An application will generally define in its schema and documentation all of | ||
the arguments that are valid for a particular block type, while arguments | ||
accepting object constructors are more appropriate for situations where the | ||
arguments themselves are freely selected by the user, such as when the | ||
expression will be converted by the application to a map type. | ||
|
||
As a less contrived example, consider the ``resource`` block type in Terraform | ||
and its use with a particular resource type ``aws_instance``: | ||
|
||
.. code-block:: hcl | ||
resource "aws_instance" "example" { | ||
ami = "ami-abc123" | ||
instance_type = "t2.micro" | ||
tags = { | ||
Name = "example instance" | ||
} | ||
ebs_block_device { | ||
device_name = "hda1" | ||
volume_size = 8 | ||
volume_type = "standard" | ||
} | ||
} | ||
The top-level block type ``resource`` is fundamental to Terraform itself and | ||
so an obvious candidate for block syntax: it maps directly onto an object in | ||
Terraform's own domain model. | ||
|
||
Within this block we see a mixture of arguments and nested blocks, all defined | ||
as part of the schema of the ``aws_instance`` resource type. The ``tags`` | ||
map here is specified as an argument because its keys are free-form, chosen | ||
by the user and mapped directly onto a map in the underlying system. | ||
``ebs_block_device`` is specified as a nested block, because it is a separate | ||
domain object within the remote system and has a rigid schema of its own. | ||
|
||
As a special case, block syntax may sometimes be used with free-form keys if | ||
those keys each serve as a separate declaration of some first-class object | ||
in the language. For example, Terraform has a top-level block type ``locals`` | ||
which behaves in this way: | ||
|
||
.. code-block:: hcl | ||
locals { | ||
instance_type = "t2.micro" | ||
instance_id = aws_instance.example.id | ||
} | ||
Although the argument names in this block are arbitrarily selected by the | ||
user, each one defines a distinct top-level object. In other words, this | ||
approach is used to create a more ergonomic syntax for defining these simple | ||
single-expression objects, as a pragmatic alternative to more verbose and | ||
redundant declarations using blocks: | ||
|
||
.. code-block:: hcl | ||
local "instance_type" { | ||
value = "t2.micro" | ||
} | ||
local "instance_id" { | ||
value = aws_instance.example.id | ||
} | ||
The distinction between domain objects, language constructs and user data will | ||
always be subjective, so the final decision is up to you as the language | ||
designer. | ||
|
||
Standard Functions | ||
------------------ | ||
|
||
HCL itself does not define a common set of functions available in all HCL-based | ||
languages; the built-in language operators give a baseline of functionality | ||
that is always available, but applications are free to define functions as they | ||
see fit. | ||
|
||
With that said, there's a number of generally-useful functions that don't | ||
belong to the domain of any one application: string manipulation, sequence | ||
manipulation, date formatting, JSON serialization and parsing, etc. | ||
|
||
Given the general need such functions serve, it's helpful if a similar set of | ||
functions is available with compatible behavior across multiple HCL-based | ||
languages, assuming the language is for an application where function calls | ||
make sense at all. | ||
|
||
The Go implementation of HCL is built on an underlying type and function system | ||
:go:pkg:`cty`, whose usage was introduced in :ref:`go-expression-funcs`. That | ||
library also has a package of "standard library" functions which we encourage | ||
applications to offer with consistent names and compatible behavior, either by | ||
using the standard implementations directly or offering compatible | ||
implementations under the same name. | ||
|
||
The "standard" functions that new configuration formats should consider | ||
offering are: | ||
|
||
* ``abs(number)`` - returns the absolute (positive) value of the given number. | ||
* ``coalesce(vals...)`` - returns the value of the first argument that isn't null. Useful only in formats where null values may appear. | ||
* ``compact(vals...)`` - returns a new tuple with the non-null values given as arguments, preserving order. | ||
* ``concat(seqs...)`` - builds a tuple value by concatenating together all of the given sequence (list or tuple) arguments. | ||
* ``format(fmt, args...)`` - performs simple string formatting similar to the C library function ``printf``. | ||
* ``hasindex(coll, idx)`` - returns true if the given collection has the given index. ``coll`` may be of list, tuple, map, or object type. | ||
* ``int(number)`` - returns the integer component of the given number, rounding towards zero. | ||
* ``jsondecode(str)`` - interprets the given string as JSON format and return the corresponding decoded value. | ||
* ``jsonencode(val)`` - encodes the given value as a JSON string. | ||
* ``length(coll)`` - returns the length of the given collection. | ||
* ``lower(str)`` - converts the letters in the given string to lowercase, using Unicode case folding rules. | ||
* ``max(numbers...)`` - returns the highest of the given number values. | ||
* ``min(numbers...)`` - returns the lowest of the given number values. | ||
* ``sethas(set, val)`` - returns true only if the given set has the given value as an element. | ||
* ``setintersection(sets...)`` - returns the intersection of the given sets | ||
* ``setsubtract(set1, set2)`` - returns a set with the elements from ``set1`` that are not also in ``set2``. | ||
* ``setsymdiff(sets...)`` - returns the symmetric difference of the given sets. | ||
* ``setunion(sets...)`` - returns the union of the given sets. | ||
* ``strlen(str)`` - returns the length of the given string in Unicode grapheme clusters. | ||
* ``substr(str, offset, length)`` - returns a substring from the given string by splitting it between Unicode grapheme clusters. | ||
* ``timeadd(time, duration)`` - takes a timestamp in RFC3339 format and a possibly-negative duration given as a string like ``"1h"`` (for "one hour") and returns a new RFC3339 timestamp after adding the duration to the given timestamp. | ||
* ``upper(str)`` - converts the letters in the given string to uppercase, using Unicode case folding rules. | ||
|
||
Not all of these functions will make sense in all applications. For example, an | ||
application that doesn't use set types at all would have no reason to provide | ||
the set-manipulation functions here. | ||
|
||
Some languages will not provide functions at all, since they are primarily for | ||
assigning values to arguments and thus do not need nor want any custom | ||
computations of those values. | ||
|
||
Block Results as Expression Variables | ||
------------------------------------- | ||
|
||
In some applications, top-level blocks serve also as declarations of variables | ||
(or of attributes of object variables) available during expression evaluation, | ||
as discussed in :ref:`go-interdep-blocks`. | ||
|
||
In this case, it's most intuitive for the variables map in the evaluation | ||
context to contain an value named after each valid top-level block | ||
type and for these values to be object-typed or map-typed and reflect the | ||
structure implied by block type labels. | ||
|
||
For example, an application may have a top-level ``service`` block type | ||
used like this: | ||
|
||
.. code-block:: hcl | ||
service "http" "web_proxy" { | ||
listen_addr = "127.0.0.1:8080" | ||
process "main" { | ||
command = ["/usr/local/bin/awesome-app", "server"] | ||
} | ||
process "mgmt" { | ||
command = ["/usr/local/bin/awesome-app", "mgmt"] | ||
} | ||
} | ||
If the result of decoding this block were available for use in expressions | ||
elsewhere in configuration, the above convention would call for it to be | ||
available to expressions as an object at ``service.http.web_proxy``. | ||
|
||
If it the contents of the block itself that are offered to evaluation -- or | ||
a superset object *derived* from the block contents -- then the block arguments | ||
can map directly to object attributes, but it is up to the application to | ||
decide which value type is most appropriate for each block type, since this | ||
depends on how multiple blocks of the same type relate to one another, or if | ||
multiple blocks of that type are even allowed. | ||
|
||
In the above example, an application would probably expose the ``listen_addr`` | ||
argument value as ``service.http.web_proxy.listen_addr``, and may choose to | ||
expose the ``process`` blocks as a map of objects using the labels as keys, | ||
which would allow an expression like | ||
``service.http.web_proxy.service["main"].command``. | ||
|
||
If multiple blocks of a given type do not have a significant order relative to | ||
one another, as seems to be the case with these ``process`` blocks, | ||
representation as a map is often the most intuitive. If the ordering of the | ||
blocks *is* significant then a list may be more appropriate, allowing the use | ||
of HCL's "splat operators" for convenient access to child arguments. However, | ||
there is no one-size-fits-all solution here and language designers must | ||
instead consider the likely usage patterns of each value and select the | ||
value representation that best accommodates those patterns. | ||
|
||
Some applications may choose to offer variables with slightly different names | ||
than the top-level blocks in order to allow for more concise references, such | ||
as abbreviating ``service`` to ``svc`` in the above examples. This should be | ||
done with care since it may make the relationship between the two less obvious, | ||
but this may be a good tradeoff for names that are accessed frequently that | ||
might otherwise hurt the readability of expressions they are embedded in. | ||
Familiarity permits brevity. | ||
|
||
Many applications will not make blocks results available for use in other | ||
expressions at all, in which case they are free to select whichever variable | ||
names make sense for what is being exposed. For example, a format may make | ||
environment variable values available for use in expressions, and may do so | ||
either as top-level variables (if no other variables are needed) or as an | ||
object named ``env``, which can be used as in ``env.HOME``. | ||
|