Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,8 @@ compiler in order to make it compile.
3. Modify your calculator application from lesson 7, problem 2, to be written
in a literate style. Consider what text might be appropriate to turn the
resulting markdown file into documentation for your calculator.

## Next lesson

Once you have completed the above exercises, you can continue to
[Lesson 1.9: Unparsing and the format and color attributes](../09_unparsing/README.md).
292 changes: 292 additions & 0 deletions k-distribution/k-tutorial/1_basic/09_unparsing/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,293 @@
# Lesson 1.9: Unparsing and the format and color attributes

The purpose of this lesson is to teach the user about how terms are
pretty-printed in K, and how the user can make adjustments to the default
settings for how to print specific terms.

## Parsing, Execution, and Unparsing

When you use `krun` to interpret a program, the tool passes through three major
phases. In the first, parsing, the program itself is parsed using either `kast`
or an ahead-of-time parser generated via Bison, and the resulting AST becomes
the input to the interpreter. In the second phase, execution, K evaluates
functions and (as we will discuss in depth later) performs rewrite steps to
iteratively transform the program state. The third and final phase is called
**unparsing**, because it consists of taking the final state of the application
after the program has been interpreted, and converting it from an AST back into
text that (in theory, anyway) could be parsed back into the same AST that was
the output of the execution phase.

In practice, unparsing is not always precisely reversible. It turns out
(although we are not going to cover exactly why this is here), that
constructing a sound algorithm that takes a grammar and an AST and emits text
that could be parsed via that grammar to the original AST is an
**NP-hard problem**. As a result, in the interests of avoiding exponential time
algorithms when users rarely care about unparsing being completely sound, we
take certain shortcuts that provide a linear-time algorithm that /approximates/
a sound solution to the problem while sacrificing the notion that the result
can be parsed into the exact original term in all cases.

This is a lot of theoretical explanation, but at root, the unparsing process
is fairly simple: it takes a K term that is the output of execution and pretty
prints it according to the syntax defined by the user in their K definition.
This is useful because the original AST is not terribly user-readable, and it
is difficult to visualize the entire term or decipher information about the
final state of the program at a quick glance. Of course, in rare cases, the
pretty-printed configuration loses information of relevance, which is why K
allows you to obtain the original AST on request.

As an example of all of this, consider the following K definition
(`lesson-09-a.k`):

```k
module LESSON-09-A
imports BOOL

syntax Exp ::= "(" Exp ")" [bracket]
| Bool
> "!" Exp
> left:
Exp "&&" Exp
| Exp "^" Exp
| Exp "||" Exp

syntax Exp ::= id(Exp) [function]
rule id(E) => E
endmodule
```

This is similar to the grammar we defined in `LESSON-06-C`, with the difference
that the Boolean expressions are now constructors of sort `Exp` and we define a
trivial function over expressions that returns its argument unchanged.

We can now parse a simple program in this definition and use it to unparse some
Boolean expressions. For example (`exp.bool`):

```
id(true&&false&&!true^(false||true))
```

Here is a program that is not particularly legible at first glance, because all
extraneous whitespace has been removed. However, if we run `krun exp.bool`, we
see that the result of the unparser will pretty-print this expression rather
nicely:

```
<k>
true && false && ! true ^ ( false || true ) ~> .
</k>
```

Notably, not only does K insert whitespace where appropriate, it is also smart
enough to insert parentheses where necessary in order to ensure the correct
parse. For example, without those parentheses, the expression above would parse
equivalent to the following one:

```
(((true && false) && ! true) ^ false) || true
```

Indeed, you can confirm this by passing that exact expression to the `id`
function and evaluating it, then looking at the result of the unparser:

```
<k>
true && false && ! true ^ false || true ~> .
</k>
```

Here, because the meaning of the AST is the same both with and without
parentheses, K does not insert any parentheses when unparsing.

### Exercise

Modify the grammar of `LESSON-09-A` above so that the binary operators are
right associative. Try unparsing `exp.bool` again, and note how the result is
different. Explain the reason for the difference.

## Custom unparsing of terms

You may have noticed that right now, the unparsing of terms is not terribly
imaginative. All it is doing is taking each child of the term, inserting it
into the non-terminal positions of the production, then printing the production
with a space between each terminal or non-terminal. It is easy to see why this
might not be desirable in some cases. Consider the following K definition
(`lesson-09-b.k`):

```k
module LESSON-09-B
imports BOOL

syntax Stmt ::= "{" Stmt "}" | "{" "}"
> right:
Stmt Stmt
| "if" "(" Bool ")" Stmt
| "if" "(" Bool ")" Stmt "else" Stmt [avoid]
endmodule
```

This is a statement grammar, simplified to the point of meaninglessness, but
still useful as an objecte lesson in unparsing. Consider the following program
in this grammar (`if.stmt`):

```
if (true) {
if (true) {}
if (false) {}
if (true) {
if (false) {} else {}
} else {
if (false) {}
}
}
```

This is how that term would be unparsed if it appeared in the output of krun:

```
if ( true ) { if ( true ) { } if ( false ) { } if ( true ) { if ( false ) { } else { } } else { if ( false ) { } } }
```

This is clearly much less legible than we started with! What are we to do?
Well, K provides an attribute, `format`, that can be applied to any production,
which controls how that production gets unparsed. You've seen how it gets
unparsed by default, but via this attribute, the developer has complete control
over how the term is printed. Of course, the user can trivially create ways to
print terms that would not parse back into the same term. Sometimes this is
even desirable. But in most cases, what you are interested in is controlling
the line breaking, indentation, and spacing of the production.

Here is an example of how you might choose to apply the `format` attribute
to improve how the above term is unparsed (`lesson-09-c.k`):

```k
module LESSON-09-C
imports BOOL

syntax Stmt ::= "{" Stmt "}" [format(%1%i%n%2%d%n%3)] | "{" "}" [format(%1%2)]
> right:
Stmt Stmt [format(%1%n%2)]
| "if" "(" Bool ")" Stmt [format(%1 %2%3%4 %5)]
| "if" "(" Bool ")" Stmt "else" Stmt [avoid, format(%1 %2%3%4 %5 %6 %7)]
endmodule
```

If we compile this new definition and unparse the same term, this is the
result we get:

```
if (true) {
if (true) {}
if (false) {}
if (true) {
if (false) {} else {}
} else {
if (false) {}
}
}
```

This is the exact same text we started with! By adding the `format` attributes,
we were able to indent the body of code blocks, adjust the spacing of if
statements, and put each statement on a new line.

How exactly was this achieved? Well, each time the unparser reaches a term,
it looks at the `format` attribute of that term. That `format` attribute is a
mix of characters and **format codes**. Format codes begin with the `%`
character. Each character in the `format` attribute other than a format code is
appended verbatim to the output, and each format code is handled according to
its meaning, transformed (possibly recursively) into a string of text, and
spliced into the output at the position the format code appears in the format
string.

Provided for reference is a table with a complete list of all valid format
codes, followed by their meaning:

| Format Code | Meaning |
| ----------- | ------------------------------------------------------------- |
| n | Insert '\n' followed by the current indentation level |
| i | Increase the current indentation level by 1 |
| d | Decrease the current indentation level by 1 |
| c | Move to the next color in the list of colors for this
production (see next section) |
| r | Reset color to the default foreground color for the terminal
(see next section) |
| an integer | Print a terminal or non-terminal from the production. The
integer is treated as a 1-based index into the terminals and
non-terminals of the production.

If the offset refers to a terminal, move to the next color in
the list of colors for this production, print the value of
that terminal, then reset the color to the default foreground
color for the terminal.

If the offset refers to a regular expression terminal, it is
an error.

If the offset refers to a non-terminal, unparse the
corresponding child of the current term (starting with the
current indentation level) and print the resulting text, then
set the current color and indentation level to the color and
indentation level following unparsing that term. |
| other char | Print that character verbatim |

### Exercise

Change the format attributes for `LESSON-09-C` so that `if.stmt` will unparse
as follows:

```
if (true)
{
if (true)
{
}
if (false)
{
}
if (true)
{
if (false)
{
}
else
{
}
}
else
{
if (false)
{
}
}
}
```

## Output coloring

When the output of unparsing is displayed on a terminal supporting colors, K
is capable of coloring the output, similar to what is possible with a syntax
highlighter. This is achieved via the `color` and `colors` attributes.

Essentially, both the `color` and `colors` attributes are used to construct a
list of colors associated with each production, and then the format attribute
is used to control how those colors are used to unparse the term. At its most
basic level, you can set the `color` attribute to color all the terminals in
the production a certain color, or you can use the `colors` attribute to
specify a comma separated list of colors for each non-terminal in the
production. At a more advanced level, the `%c` and `%r` format codes control
how the formatter interacts with the list of colors specified by the `colors`
attribute. You can essentially think of the `color` attribute as a way of
specifying that you want all the colors in the list to be the same color.

## Exercises

1. Use the color attribute on `LESSON-09-C` to color the keywords `true` and
`false` one color, the keywords `if` and `else` another color, and the operators
`(`, `)`, `{`, and `}` a third color.

2. Use the `format`, `color`, and `colors` attributes to tell the unparser to
style the expression grammar from lesson 8, problem 3 according to your own
personal preferences for syntax highlighting and code formatting. You can
view the result of the unparser on a function term without evaluating that
function by means of the command `kast --output pretty <file>`.
3 changes: 3 additions & 0 deletions k-distribution/k-tutorial/1_basic/commands.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,6 @@ kompile 07_side_conditions/README.md --md-selector "k & ! exclude" --main-module
kompile 07_side_conditions/README.md --md-selector "k & ! exclude" --main-module LESSON-07-C -d build/07c
kompile 07_side_conditions/README.md --md-selector "k & ! exclude" --main-module LESSON-07-D -d build/07d
kompile 08_literate_programming/README.md --md-selector "k & ! exclude" --main-module LESSON-08 -d build/08
kompile 09_unparsing/README.md --md-selector "k & ! exclude" --main-module LESSON-09-A -d build/09a
kompile 09_unparsing/README.md --md-selector "k & ! exclude" --main-module LESSON-09-B -d build/09b
kompile 09_unparsing/README.md --md-selector "k & ! exclude" --main-module LESSON-09-C -d build/09c
2 changes: 1 addition & 1 deletion pending-documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2452,7 +2452,7 @@ by one element, and terminals and `%r` reset the current color to the default
foreground color of the terminal afterwards.

There are two ways you can construct a list of colors associated with a
production:P
production:

* The `color` attribute creates the entire list all with the same color, as
specified by the value of the attribute. When combined with the default format
Expand Down