Skip to content

A Tutorial Introduction to Duzzt

Malte Isberner edited this page Jan 17, 2014 · 2 revisions

Note: The example used in this tutorial was inspired by the German Wikipedia article on Fluent interfaces.

Goal

In this example, we want to create a DateAdder class, which allows us to add certain amounts days, hours, minutes, or seconds to Date objects. These add expressions should be in a readable form, such as

Date date = new Date();
date = new DateAdder()
    .add(5).days()
    .add(2).hours()
    .add(4).seconds()
    .to(date);

We want to impose the following restrictions:

  • It should not be possible to repeatedly increment the same field (such as add(3).days().add(2).days()).
  • Incrementing days must take place before incrementing hours, hours before minutes, minutes before seconds. Hence, add(2).hours().add(5).days() should be illegal.

Implementing the Logic

For now, we will forget about the above syntax restrictions, and merely focus on the logic behind our DateAdder class.

In Duzzt, this logic is encapsulated in a class called the "DSL implementation". Methods of this class will be invoked from within the DSL classes generated by Duzzt. In general, it is a good idea to hide the implementation class by declaring it as package private. That way, only the part of the API with syntax restrictions enabled is exposed. For our example, we will call the implementation class DateAdderImpl.

final class DateAdderImpl {
    // logic goes here
}

The realization of the logic is quite straightforward: in each call of add(), we stored the provided amount in a temporary variable called currentAmount. As the next method, either days(), hours(), minutes(), or seconds() will be invoked. This tells us which Calendar field should be incremented, and we will store the field identifier together with the currentAmount in a list. When to is finally called, we iterate over this list and increment all the stored fields with by respective amount.

// Simple data holder
private static final class FieldIncrement {
    private final int field; // refers to fields defined in Calendar
    private final int amount; // the increment amount
    public FieldIncrement(int field, int amount) {
        this.field = field;
        this.amount = amount;
    }
}

// Lists of field increments, no need to store more than 4 of them
private final List<FieldIncrement> fieldIncrements = new ArrayList<>(4);
// the last amount that was specified in a call to add()
private int currentAmount;

// Helper method
private void addAs(int field) {
    this.fieldIncrements.add(new FieldIncrement(field, this.currentAmount));
}

// Methods corresponding to DSL actions
public void add(int amount) {
    this.currentAmount = amount;
}
public void days() {
    addAs(Calendar.DATE);
}
public void hours() {
    addAs(Calendar.HOUR);
}
public void minutes() {
    addAs(Calendar.MINUTE);
}
public void seconds() {
    addAs(Calendar.SECOND);
}
public Date to(Date otherDate) {
    Calendar calendar = Calendar.getInstance();
    calendar.setTime(otherDate);
    for(FieldIncrement fi : fieldIncrements) {
        calendar.add(fi.field, fi.amount);
    }
    return calendar.getTime();
}

That wasn't too much code, was it? Of course, the code could further be shortened by omitting the FieldIncrement class and just storing the amount in 4 int fields daysIncrement, hoursIncrement and so on. However, our approach can easily be extended to months, milliseconds etc.

Specifying the DSL

In Duzzt, the syntax of an embedded DSL is specified through a regular expression. While Duzzt Regular Expressions specify some additional, not commonly used operators, what we need for our example is relatively straightforward.

We want to enforce that every add() is followed by a method specifying the respective field, such as days(), hours() etc. Also, we want to impose an ordering constraint on the sequence: days() must not appear after hours(), minutes(), or seconds() and so on. Furthermore, we want to specify that incrementing each field is optional. Finally, every sequence of method invocations should be concluded by a call to to.

A Duzzt regular expression adhering to these rules can be specified as follows:

(add days)? (add hours)? (add minutes)? (add seconds)? to

Concatenation of regular expression is achieved by simply using a whitespace character. The ? operator specifies that the preceeding regular expression must occur zero or one time.

The generation of the embedded DSL classes is triggered by the central annotation in Duzzt: @GenerateEmbeddedDSL. Through this annotation, many aspects of the DSL generation can be controlled. For this example, however, we first only need the only two mandatory options: name and syntax.

name is the name of the generated main DSL class, i.e., DateAdder in our example. Note that the name must not be qualified. By default, the generated classes are put in the same package as the implementation class. This can be overridden using the packageName option, but this is neither required nor possible, because our implementation class is package private. syntax specifies the syntax of the DSL as a regular expression as shown above.

In our example, the DSL specification is performed as follows:

@GenerateEmbeddedDSL(
    name = "DateAdder",
    syntax = "(add days)? (add hours)? (add minutes)? (add seconds)? to")
final class DateAdderImpl {
    // ...
}

That's it! Your first Duzzt-generated embedded DSL is ready. Start the compilation/annotation processing process. If you're using Maven, this is as easy as running mvn compile. You should see a message reporting successful compilation. Afterwards

Further Tweaks

Our embedded DSL already looks quite nicely, but there's still room for improvement. For example, it is possible to call to() without any prior calls to add(), which looks somewhat odd: date = new DateAdder().to(date). Also, the repeated occurrence of add() is ugly. More natural to read would be something like

date = new DateAdder()
    .add(5).days()
    .and(2).hours()
    .and(4).seconds()
    .to(date);

Of course, it shouldn't be possible to use and() right after the start, or use add() after another add. Luckily, the syntax that Duzzt offers allows it to accomplish both tasks rather easily.

Prohibiting Empty Sequences -- The << >> Operator

We want to slightly adapt our initial syntax (add days)? (add hours)? (add minutes)? (add seconds)? to so that to() can only be called if add() (for any field) has been called beforehand. In a normal regular expression, this would be quite hard to specify. Luckily, Duzzt provides an operator designed for exactly this task: the << >> operator (or nonempty operator).

All we need to do is to enclose the sequence we want to force to be non-empty within double angle brackets. The new syntax therefore is

<<(add days)? (add hours)? (add minutes)? (add seconds)?>> to

... et voilà! After re-generating the source code, the invocation sequence new DateAdder().to(date) will no longer compile.

Matching Sequence Positions

Duzzt provides special operators for matching sequence positions: ^ matches the start of the sequence, whereas / matches a position somewhere in the middle of the sequence (i.e., not the starting position). Note that unlike, e.h., ., which matches any action, both position matching operators do not actually match any actions.

If we want to provide the amount using add() add the start of the sequence and using and() in the middle of the sequence, we first have to declare the DSL action and() with the same semantics as add() as follows:

public void and(int amount) {
    add(amount);
}

Then, we use our sequence position matching operators to substitute the occurences of add in the syntax by (^add|/and). Apparently, add() is only accepted at the start of the sequence (matched by ^), whereas and() is only accepted somewhere in the middle of the sequence (matched by /).

Named Subexpressions

Substituting every occurence of add with (^add|/and) looks rather clumsily. A more elegant approach is to introduce a named subexpression. Named subexpressions serve as shorthand notations for other regular expressions, eliminating the need to repeat common subexpressions.

Named subexpressions are generated using the where value of the @GenerateEmbeddedDSL annotation. In our case, we want to define a named subexpression with a definition of ^add|/and (note that the parentheses are obsolete now), and because it essentially does the same as add() previously did, we will also name it add. The name of subexpressions must be a legal identifier. However, they are not required to be named differently from existing DSL actions, as they are referenced by enclosing their name in angle brackets (< >).

Using named subexpressions, we can rewrite the embedded DSL syntax specification as follows:

@GenerateEmbeddedDSL(
    name="DateAdder",
    syntax="<<(<add> days)? (<add> hours)? (<add> minutes)? (<add> seconds)?>> to",
    where={
        @SubExpr(name="add", definedAs="^add|/and")
    })

Of course it would also be possible to specify more than one named subexpressions. If only one named subexpression is specified, the curly braces in the value of where can also be omitted.