In [1]:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DerivingVia #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
import GHC.Generics
import Grisette

# UnionM and Custom Data Types

## Introduction

In the [previous tutorial](./1_symbolic_type.ipynb), we discussed the basic usage of Grisette with symbolic types.
Grisette is a powerful Haskell library that enables solver-aided programming, allowing you to construct complex programs and data abstractions that seamlessly integrate with the Grisette system.

In this tutorial, you will learn how to:
- Utilize the `UnionM` monadic container for representing choices and path conditions
- Understand path merging and its importance for efficient symbolic execution
- Work with user-defined data types in Grisette- Build a program synthesizer using CounterExample-Guided Inductive Synthesis (CEGIS)

By the end of this tutorial, you will have a solid understanding of how to leverage Grisette's features to build sophisticated solver-aided applications, such as program synthesizers and verifiers.

Prerequisites:

- Understanding of monads and type classes in Haskell

## UnionM Monadic Container

Grisette not only supports working with primitive types, but also supports the integration of user-defined data types.
The key to this integration is a monadic container called `UnionM`. A `UnionM` is essentially an if-then-else tree that wraps values with their path conditions.
When a path condition evaluates to true, the value of the `UnionM` is the corresponding value associated with that path condition.

The following code demonstrates how to wrap values in a `UnionM`.
As `UnionM` is monadic, you can use `return` to wrap a single value into the container.
However, when working with Grisette, in most scenarios, we prefer the `mrgReturn` variant. This is crucial to Grisette's efficiency, and we will explain the reason later.

In [2]:
mrgReturn (1, "a") :: UnionM (SymInteger, SymBool)
mrgReturn (Left 10) :: UnionM (Either Integer SymBool)

{(1,a)}

{Left 10}

You can introduce path condition to the `UnionM` container with the `mrgIf` function.

In [3]:
unionList :: UnionM [SymInteger]
unionList = mrgIf "cond" (mrgReturn ["a1"]) (mrgReturn ["a2", "a3"])
unionList

unionEither :: UnionM (Either Integer SymBool)
unionEither = mrgIf "cond" (mrgReturn $ Left 10) (mrgReturn $ Right "b")
unionEither

{If cond [a1] [a2,a3]}

{If cond (Left 10) (Right b)}

In these unions, the path condition determines which branch to take. For example, in the `unionList`, if `cond` is true, we will pick the branch with a single element; otherwise, we will pick the branch with two elements.

In [4]:
evaluateSym False (buildModel ("cond" ::= True, "a1" ::= (1 :: Integer))) unionList
evaluateSym False (buildModel ("cond" ::= False, "a2" ::= (1 :: Integer))) unionList

{[1]}

{[1,a3]}

`UnionM` is a container container, and we can use the `do`-notation or monadic combinators to manipulate them. The following code demonstrates prepending 1 to `unionList`. The `Monad` instance of the `UnionM` will "split" the execution path for the branches and maintain the correct path conditions in the results.

In [5]:
do l <- unionList
   mrgReturn $ 1 : l

{If cond [1,a1] [1,a2,a3]}

## Path Merging

This approach in the last section works well, but when writing real-world programs, you may encounter code like this:

```haskell
union = do
  a <- union1
  b <- union2
  c <- union3
  return $ f a b c
```

You can further extract the values from the `union` value. While this approach can work, it suffers from poor efficiency. Consider the `do` block shown in the example. If `union1`, `union2`, and `union3` have 2 branches each, the final processing function `f` will need to be executed 8 times (2 * 2 * 2). Each further usage of the `union` will force the code to be executed 8 times, and this exponential number of paths will not scale in real-world scenarios. This is the well-known path explosion problem in symbolic execution.

To tackle this problem, Grisette merges execution paths. For example, in the following code, where we take the sum of the values in `unionList`, instead of generating two different branches, each containing a single symbolic integer, Grisette merges them into a single symbolic integer using `ite`. 

In [6]:
do l <- unionList
   mrgReturn $ sum l

{(ite cond a1 (+ a2 a3))}

Note the `mrgReturn` function is used here. It is the key to merging in Grisette. If we use the vanilla `return` function or `fmap` function, the result will instead have two branches and will not be merged. The angle bracket around the values indicates that the result isn't merged.

In [7]:
sum <$> unionList

<If cond a1 (+ a2 a3)>

Let's examine the type signature of `mrgReturn` and compare it with the vanilla `return` function. `mrgReturn` places additional constraints on both the monad and the value.

In [8]:
:t return
:t mrgReturn

For the value, it needs to be `Mergeable`. `Mergeable` defines the merging strategy for the value type, which is beyond the scope of this tutorial.

For the monad, it needs to be `MonadTryMerge`. For `UnionM` (or `UnionM` transformed with some monad transformers, which we will discuss in later tutorials), the `MonadTryMerge` instance caches the merging strategy into the monad, and the bind function uses this cached strategy to merge the results. For other monads, such as `Either e`, the `MonadTryMerge` instance does nothing.

If you don't fully understand all the details here, that's okay. The key takeaway is to always use `mrgReturn` instead of `return`, unless you know what you are doing and want to manually control where to merge the values. We also provide `mrg*` (or sometimes, named `sym*`) variants for many combinators from GHC's `base` library, including those working with `Monad`, `Applicative`, `Functor`, `Foldable`, `Traversable` and lists. You may want to checkout the documentation and use them.

We may also print the results to see whether there is a cached merging strategy. The curly braces indicate that there is a merging strategy, while the angle brackets indicate that there isn't one.

In [9]:
return 1 :: UnionM Int
mrgReturn 1 :: UnionM Int

<1>

{1}

## Deriving Mergeable Instance

Different types have different merging strategies. For example, for the type `[SymInteger]`, the Grisette system will try to merge lists with the same lengths, while for the type `Either Integer SymBool`, left values will be merged if they are exactly the same, while right values will always be merged.

(It is okay to use `return` here because the `mrgIf` will handle the merging)

In [10]:
mrgIf "cond" (return ["a","b"]) (return ["c","d"]) :: UnionM [SymInteger]
mrgIf "cond" (return $ Left 1) (return $ Left 2) :: UnionM (Either Integer SymBool)
mrgIf "cond" (return $ Left 1) (return $ Left 1) :: UnionM (Either Integer SymBool)
mrgIf "cond" (return $ Left 1) (return $ Right "x") :: UnionM (Either Integer SymBool)
mrgIf "cond" (return $ Right "x") (return $ Right "y") :: UnionM (Either Integer SymBool)

{[(ite cond a c),(ite cond b d)]}

{If cond (Left 1) (Left 2)}

{Left 1}

{If cond (Left 1) (Right x)}

{Right (ite cond x y)}

As we mentioned earlier, merging is controlled by the `Mergeable` type class. This means that we can define `Mergeable` instances for our custom types to make them compatible with Grisette.

Defining a merging strategy is beyond the scope of this tutorial, but we have provided a default instance for non-GADT data types, which you can access using `DerivingVia` with a `Generic` instance. Automatic deriving instances for GADTs will require `TemplateHaskell`, and we may provide this in a future release.

In [11]:
data A = X Int SymBool | Y SymInteger
  deriving (Show, Generic)
  deriving (Mergeable) via (Default A)

mrgIf "cond" (return $ X 1 "a") (mrgIf "cond2" (return $ X 2 "b") (return $ X 1 "c")) :: UnionM A

{If (|| cond (! cond2)) (X 1 (ite cond a c)) (X 2 b)}

Note that a similar mechanism is provided for most of the type classes to make the types fully compatible with Grisette.

In [12]:
deriving via (Default A) instance (EvaluateSym A)
evaluateSym False (buildModel ("a" ::= True)) $ X 1 "a"

X 1 true

## A Rewriting Rule Synthesizer

Now let's extend the expression equivalence verifier to a simple synthesizer. A synthesizer is a tool that searches a program space, represented by a *program sketch*, to find a program that fulfills a given specification.

A *program sketch* is a program with *holes*, where a *hole* is a symbolic constant that could be instantiated to choose from the alternative programs in the program space. For example, consider the following program sketch for our problem:

```haskell
(If hole1 (Add x hole2) (Mul x hole3)
```

This program sketch represents a program space that includes, but not limited to, the following programs:

- `x + 1` (with `hole1` to be true and `hole2` to be 1)
- `x + 2` (with `hole1` to be true and `hole2` to be 2)
- `x * 3` (with `hole1` to be false and `hole3` to be 3)

To synthesize a rewriting rule, our goal is to find an alternative expression that implements the same functionality as the original expression. This can be formulated as the following formula:

$p_1 = \exists~\mathrm{hole}\in \mathrm{consts}(e_\mathrm{sketch})\setminus \mathrm{consts}(e_\mathrm{orig}). \forall~\mathrm{var}\in e_\mathrm{orig}. \mathrm{eval}(e_\mathrm{orig}) = \mathrm{eval}(e_\mathrm{sketch})$

Note that all symbolic constants already present in the original expression are not treated as holes, and our synthesized expression should have the same semantics as the original expression, regardless of the values assigned to these variables.

To represent a program sketch, we need to extend our expression type. We modify the operands of `Add`, `Mul` and `Eq` to `UnionM Expr` to represent a choice among multiple programs. In this tutorial, we will not use GADTs, as we want to avoid manually writing instances. Instead, we will derive the necessary instances for the type.

In [13]:
data Expr
  = I SymInteger
  | B SymBool
  | Add (UnionM Expr) (UnionM Expr)
  | Mul (UnionM Expr) (UnionM Expr)
  | Eq (UnionM Expr) (UnionM Expr)
  deriving (Show, Eq, Generic)
  deriving (Mergeable, ExtractSymbolics, EvaluateSym) via (Default Expr)

Additionally, we introduce a sum type `Value` to represent all possible evaluation results. This type will be used to demonstrate how to program with `UnionM`. The evaluation result can be a symbolic integer, a symbolic boolean, or a special `BadValue` if an ill-typed expression is evaluated.

In [14]:
data Value
  = IValue SymInteger
  | BValue SymBool
  | BadValue
  deriving (Show, Eq, Generic)
  deriving (Mergeable, SEq) via (Default Value)

Next, let's write the interpreter. Our interpreter now needs to perform dynamic type checking and generate `BadValue` if the inputs are of invalid types.

The type of ou`r ev`al function  nowi`s Expr -> UnionM Val`ue, which means that we interpret an expression, and the result is a choice among the value types. To evaluate `a UnionM Ex`pr, we can use th`e onUni`on o`r `.# combinator provided by Grisette to lift th`e ev`al function.

In [15]:
binOp :: UnionM Expr -> UnionM Expr -> ((Value, Value) -> Value) -> UnionM Value
binOp l r f = do
  el <- onUnion eval l
  er <- eval .# r
  mrgReturn $ f (el, er)

eval :: Expr -> UnionM Value
eval (I i) = mrgReturn $ IValue i
eval (B b) = mrgReturn $ BValue b
eval (Add l r) = binOp l r $ \case
  (IValue il, IValue ir) -> IValue $ il + ir
  _ -> BadValue
eval (Mul l r) = binOp l r $ \case
  (IValue il, IValue ir) -> IValue $ il * ir
  _ -> BadValue
eval (Eq l r) = binOp l r $ \case
  (IValue il, IValue ir) -> BValue $ il .== ir
  (BValue il, BValue ir) -> BValue $ il .== ir
  _ -> BadValue

Now we can define some sketches and evaluate them. Our `eval` function is capable of simultaneously evaluating the entire program space!

In [16]:
sketch :: Expr
sketch = Add (mrgReturn $ I "a") (mrgReturn $ I "b")
sketch
eval sketch

sketch :: UnionM Expr
sketch = do
  let a = mrgReturn $ I "a"
  let b = mrgReturn $ I "b"
  mrgIf "c" (mrgReturn $ Add a b) (mrgReturn $ Mul a b)
sketch
eval .# sketch

Add {I a} {I b}

{IValue (+ a b)}

{If c (Add {I a} {I b}) (Mul {I a} {I b})}

{IValue (ite c (+ a b) (* a b))}

Finally, let's write the synthesizer. Recall that in our formulation, we have an exists-forall formula. This cannot be directly transformed into an existential formula, so we use the CounterExample-Guided Inductive Synthesis (CEGIS) algorithm to handle it by making multiple solver calls, each of which is an existential formula.

The semantics of `cegisForAll` is to solve the following formula:

$\exists P.(\exists I. \mathrm{pre}(P, I))\wedge(\forall I.\mathrm{pre}(P, I)\Rightarrow\mathrm{post}(P, I))$

You can view $P$ as the space of the program and $I$ as the inputs to the program. In our synthesizer, $P$ represents all the holes, and $I$ represents all the variables that exist in the original expression.

The `cegisForAll` function takes three arguments. The first is the configuration of the solver. The second argument controls which symbolic constants are in $I$. With the `ExtractSymbolics` instance, we extract all the symbolic constants in the original expression and use them as the set $I$. The third argument specifies the preconditions and postconditions. Here, our precondition is simply true, so we can omit it and use the convenient function `cegisPostCond`.

In [17]:
synthesisRewriteTarget :: Expr -> UnionM Expr -> IO ()
synthesisRewriteTarget expr sketch = do
  let lhs = eval expr
  let rhs = eval .# sketch
  r <- cegisForAll (precise z3) expr $ cegisPostCond $ lhs .== rhs
  case r of
    (_, CEGISSuccess model) -> do
      putStrLn "Successfully synthesized RHS:"
      print $ evaluateSym False model sketch
    (cex, failure) -> do
      putStrLn $ "Synthesis failed with error: " ++ show failure
      putStrLn $ "Counter example list: " ++ show cex

We can now synthesize an expression. The following example tries to determine whether we can rewrite $2 * x$ as $x + x$ or $x * x$.

In [18]:
x :: UnionM Expr
x = mrgReturn $ I "x"

lhs :: Expr
lhs = Mul (mrgReturn $ I 2) x

sketch :: UnionM Expr
sketch =
  mrgIf "c"
    (mrgReturn $ Add x x)
    (mrgReturn $ Mul x x)
synthesisRewriteTarget lhs sketch

Successfully synthesized RHS:
{Add {I x} {I x}}

The next example uses a larger sketch. We want to see whether $(a * b) + (b * c)$ can be rewritten.

The sketch we are using is:

```
(?{a,b,c} ?{+,*} ?{a,b,c}) ?{+,*} ?{a,b,c}
```The question mark indicates that we are selecting among the choices, which is why we call the operator that performs this selectionon `choose`.

In [19]:
a, b, c :: UnionM Expr
a = mrgReturn $ I "a"
b = mrgReturn $ I "b"
c = mrgReturn $ I "c"
lhs :: Expr
lhs = Add (mrgReturn $ Mul a b) (mrgReturn $ Mul b c)

sketch :: UnionM Expr
sketch = do
  let lhs1 = chooseUnion [a, b, c] "lhs1"
  let rhs1 = chooseUnion [a, b, c] "rhs1"
  let rhs = chooseUnion [a, b, c] "rhs"
  let lhs = choose [Add lhs1 rhs1, Mul lhs1 rhs1] "lhs"
  choose [Add lhs rhs, Mul lhs rhs] "sketch"

synthesisRewriteTarget lhs sketch

Successfully synthesized RHS:
{Mul {Add {I a} {I c}} {I b}}

## Conclusion

In this tutorial, we explored the core construct of Grisette, the `UnionM` monadic container. We learned how to work with `UnionM` to represent choices and introduce path conditions, and how Grisette merges execution paths to improve efficiency and avoid the path explosion problem.

We also discovered how to derive instances of th`e Mergeab`le typeclass for our custom data types, enabling seamless integration with Grisette's features. By deriving instances usin`g Gener`ic an`d DerivingV`ia, we can quickly make our types compatible with Grisette without manually writing the instances.

Furthermore, we extended our expression equivalence verifier to build a simple rewriting rule synthesizer. We introduced the concept of program sketches and holes, and demonstrated how to use CounterExample-Guided Inductive Synthesis (CEGIS) provided by Grisette to search for a program that satisfies a given specification.

However, there are still some areas where the code can be improved. For example, we had to introduc`e the Ba`dValue constructor to handle ill-typed expressions, which adds boilerplate to our code. Additionally, constructing sketches can be verbose and difficult to reuse.

In future tutorials, we will explore more advanced features of Grisette that can help us address these issues and simplify our code. We will learn about constructs that can reduce boilerplate and make our code more concise and reusable.

By mastering the concepts introduced in this tutorial and leveraging the power of Grisette, you will be well-equipped to tackle complex problems involving symbolic execution, program synthesis, and verification.