# AVICENNA: A Semantic Debugging Tool
AVICENNA is a novel approach designed to automatically determine the causes and conditions of program failures. This notebook provides an overview and demonstration of its capabilities.

AVICENNA is our new debugging tool designed to automatically determine the causes and conditions of program failures. It leverages both generative and predictive models to satisfy constraints over grammar elements and detect relations of input elements. The tool uses the ISLa specification language to express complex failure circumstances as predicates over input elements. AVICENNA learns input properties that are common across failing inputs and employs a feedback loop to refine the current debugging diagnoses by systematic experimentation. The result is crisp and precise diagnoses that closely match those determined by human experts, offering a significant advancement in the realm of automated debugging.

## How AVICENNA works

To illustrate _AVICENNA_ ’s capabilities, we start with a quick motivating example. First, let us introduce our program under test: The Calculator.

This program acts as a typical calculator, capable of evaluating not just arithmetic expressions but also trigonometric functions, such as sine, cosine, and tangent. Furthermore, it also supports the calculation of the square root of a given number.

In [1]:
import math

def calculator(inp: str) -> float:
    """
        A simple calculator function that can evaluate arithmetic expressions
        and perform basic trigonometric functions and square root calculations.
    """
    return eval(
        str(inp), {"sqrt": math.sqrt, "sin": math.sin, "cos": math.cos, "tan": math.tan}
    )

**Side Note:** In the `calculator`, we use Python's `eval` function, which takes a string and evaluates it as a Python expression. We provide a dictionary as the second argument to eval, mapping names to corresponding mathematical functions. This enables us to use the function names directly within the input string.

In [2]:
# Evaluating the cosine of 2π
print(calculator('cos(6*3.141)'))

0.999993677717667


In [3]:
# Calculating the square root of 36
print(calculator('sqrt(6*6)'))

6.0


Each of these calls to the calculator will evaluate the provided string as a mathematical expression, and print the result.

Now, to find new defects, we need to introduce an oracle that tells us if the error that is triggered is something we expect or a new/unkonwn defect. The `OracleResult` is an enum with two possible values, `NO_BUG` and `BUG`. `NO_BUG` donates a passing test case and `BUG` a failing one.

We import the `OracleResult` enumerated type from the `avicenna` library. This is used in the oracle function to indicate the outcome of executing the 'calculator' function with a given input.

In [4]:
from avicenna.oracle import OracleResult

This is a function called **oracle**, which acts as an intermediary to handle and classify exceptions produced by the calculator function when given a certain input.

In [5]:
# Make sure you use the OracleResult from the evogfuzz library
from avicenna.oracle import OracleResult

def oracle(inp: str):
    """
    This function serves as an oracle or intermediary that catches and handles exceptions 
    generated by the 'calculator' function.
    It aims to determine whether an input triggers a bug in the 'calculator' function.

    Args:
        inp (str): The input string to be passed to the 'calculator' function.

    Returns:
        OracleResult: An enumerated type 'OracleResult' indicating the outcome of the function execution.
            - OracleResult.NO_BUG: Returned if the calculator function executes without any exception
            - OracleResult.BUG: Returned if the calculator function raises a ValueError exception, indicating a potential bug.
    """
    try:
        calculator(inp)
    except ValueError as e:
        return OracleResult.BUG
    return OracleResult.NO_BUG

This **oracle** function is used in the context of debugging to determine the behavior of various inputs on the program under test (in our case the _calculator_). When the calculator function behaves as expected (i.e., no exceptions occur), the **oracle** function returns `OracleResult.NO_BUG`. However, when the `calculator` function raises an unexpected exception, the **oracle** interprets this as a potential bug in the `calculator` and returns `OracleResult.BUG`.

We can see this in action by testing a few initial inputs:


In [6]:
initial_inputs = ['sqrt(1)', 'cos(912)', 'tan(4)', 'sqrt(-3)']

for inp in initial_inputs:
    print(inp.ljust(30), oracle(inp))

sqrt(1)                        NO_BUG
cos(912)                       NO_BUG
tan(4)                         NO_BUG
sqrt(-3)                       BUG


We see that `sqrt(-3)` results in the failure of our calculator program. We can now use *Avicenna* to learn the root causes of the program's failure.

First, we need to define the input format of the calculator with a grammar:

In [7]:
import string

grammar = {
    "<start>": ["<arith_expr>"],
    "<arith_expr>": ["<function>(<number>)"],
    "<function>": ["sqrt", "sin", "cos", "tan"],
    "<number>": ["<maybe_minus><onenine><maybe_digits><maybe_frac>"],
    "<maybe_minus>": ["", "-"],
    "<onenine>": [str(num) for num in range(1, 10)],
    "<digit>": list(string.digits),
    "<maybe_digits>": ["", "<digits>"],
    "<digits>": ["<digit>", "<digit><digits>"],
    "<maybe_frac>": ["", ".<digits>"],
}

The grammar provides a structured way to generate valid input strings for our calculator program. It defines patterns and rules that dictate how different elements can be combined to form syntactically correct mathematical expressions. Here's a breakdown of the key components of the grammar:

- `<start>`: The entry point for generating an expression. It signifies where the creation of an arithmetic expression begins.

- `<arith_expr>`: Represents a general arithmetic expression. For simplicity in this example, it's defined to consist of a function applied to a number, like `sin(3)` or `sqrt(9)`.

- `<function>`: Enumerates the mathematical functions our calculator can handle, including square root and trigonometric operations like sine, cosine, and tangent.

- `<number>`: Describes valid numbers for our calculator. This includes:
  - Negative values (denoted by `<maybe_minus>` which can be an empty string or a minus sign).
  - Whole numbers ranging from 1 to 9 (given by `<onenine>`).
  - Sequences of digits (represented by `<maybe_digits>` and `<digits>`).
  - Fractions or decimal numbers (expressed by `<maybe_frac>`).

This grammar acts as a blueprint, guiding the systematic generation of test cases for our calculator. By defining the rules and structures of valid inputs, it ensures that the generated expressions are meaningful and relevant for our debugging exercise.

With the oracle, the grammar, and a failure-inducing input, we can use *AVICENNA* to automatically infer properties over inputs, validate hypotheses, and generate additional test cases, producing precise and expressive diagnoses for the failure conditions.

In [8]:
from avicenna.avicenna import Avicenna

avicenna = Avicenna(
    grammar,
    oracle,
    initial_inputs,
    log=False
)
# You can turn on logging if you want to see the infividual steps that avicenna performs!

In [9]:
from typing import List, Tuple
from isla.language import Formula

diagnoses: List[Tuple[Formula, float, float]] = avicenna.explain()
# Avicenna returns a List of learned ISla Formula and the corresponding precision and recall

In the code above, we've created an instance of the Avicenna class and executed the debugging process by invoking the `explain` method.
Avicenna will utilize its feedback loop to systematically probe and test the Calculator program, identify the root cause of the bug on the analysis of the bug's behavior.

This output is a symbolic representation -- ISLa Constraints -- of the root cause of the failure detected by Avicenna in the Calculator program. Here's a breakdown of what it means:


In [10]:
from isla.language import ISLaUnparser

print(f"Avicenna determined the following constraints to describe the failure circumstances:\n")

for diagnosis in diagnoses:
    print(ISLaUnparser(diagnosis[0]).unparse())
    print(f"Avicenna calculated a precision: {diagnosis[1]*100:.2f}% and recall {diagnosis[2]*100:.2f}%", end="\n\n")

Avicenna determined the following constraints to describe the failure circumstances:

(exists <maybe_minus> elem in start:
   (= elem "-") and
exists <function> elem_0 in start:
  (= elem_0 "sqrt"))
Avicenna calculated a precision: 100.00% and recall 100.00%

(forall <number> elem in start:
   (<= (str.to.int elem) (str.to.int "-1")) and
exists <function> elem_0 in start:
  (= elem_0 "sqrt"))
Avicenna calculated a precision: 100.00% and recall 100.00%

(forall <number> container in start:
   exists <maybe_minus> elem in container:
     (= elem "-") and
exists <function> elem_0 in start:
  (= elem_0 "sqrt"))
Avicenna calculated a precision: 100.00% and recall 100.00%

(exists <function> elem in start:
   (= elem "sqrt") and
forall <number> container in start:
  exists <maybe_minus> elem_0 in container:
    (= (str.len elem_0) (str.to.int "1")))
Avicenna calculated a precision: 100.00% and recall 100.00%



This output, expressed in first-order logic, is saying:

- For all numbers (elements of type `<number>` in the grammar), if the integer representation of the number is less than or equal to -1 (`<= (str.to.int elem) (str.to.int "-1")`), and
- There exists a function (an element of type `<function>` in the grammar) that equals to "sqrt" (`= elem_0 "sqrt"`),

then a bug is likely to occur.

In plain English, the output is indicating that the failure in our Calculator program occurs when trying to take the square root (`sqrt`) of a negative number (a number less than or equal to -1). 

This is consistent with our expectations, since the square root of a negative number is not defined in the realm of real numbers. Consequently, Python's `math.sqrt()` function, which we've used in our Calculator program, throws a `ValueError` when given a negative number as input.

With this information, we can address the issue in our Calculator program to prevent crashes when dealing with such inputs. We might decide to handle such errors gracefully or implement support for complex numbers, depending on the requirements of our program.

Remember, these results are generated based on the information provided to Avicenna, such as the grammar and the oracle function, as well as the results of Avicenna's systematic testing of the Calculator program. So the more accurate and comprehensive these inputs are, the more helpful Avicenna's outputs will be.


<div class="alert alert-info">
[Info]: It's important to recognize that <b>all the diagnoses generated are equivalent</b>. Despite their syntactical variations, they accurately depict the same failure-inducing behavior.
</div>

## Generating More Inputs from the diagnoses

Now that we obtained the ISLa formulas that describe the failure circumstances, we can use them to generate more inputs triggering that exact same behavior. To do so, we use the ISLaSolver:

The function `ISLaSolver.solve()` attempts to compute a solution to the given ISLa formula. It returns that solution, if any. This function can be called repeatedly to obtain more solutions until one of two exception types is raised: A `StopIteration` indicates that no more solution can be found; a `TimeoutError` is raised if a timeout occurred. After that, an exception will be raised every time.

<div class="alert alert-info">
[Info]: For more information about the <a href="https://github.com/rindPHI/isla">ISLa Sepcification language</a> and the <b>ISLaSolver</b>, have a look at the extensive <a href="https://isla.readthedocs.io/en/latest/index.html">Documentation</a>.
</div>

In [11]:
from isla.solver import ISLaSolver

for diagnosis in diagnoses:
    solver = ISLaSolver(grammar,
                        diagnosis[0],
                        enable_optimized_z3_queries=False)
    
    for _ in range(20):
        try:
            inp = solver.solve()
            print(str(inp).ljust(30), oracle(inp))
        except StopIteration:
            continue

sqrt(-2)                       BUG
sqrt(-851.4)                   BUG
sqrt(-39.7)                    BUG
sqrt(-42.0)                    BUG
sqrt(-76.3)                    BUG
sqrt(-683.2)                   BUG
sqrt(-1923)                    BUG
sqrt(-57)                      BUG
sqrt(-9)                       BUG
sqrt(-1)                       BUG
sqrt(-6.2)                     BUG
sqrt(-7.57)                    BUG
sqrt(-8)                       BUG
sqrt(-1.921)                   BUG
sqrt(-8.85961)                 BUG
sqrt(-1)                       BUG
sqrt(-29.6)                    BUG
sqrt(-7.3)                     BUG
sqrt(-859.61)                  BUG
sqrt(-8)                       BUG
sqrt(-4)                       BUG
sqrt(-8.8)                     BUG
sqrt(-61654927)                BUG
sqrt(-3309)                    BUG
sqrt(-7)                       BUG
sqrt(-59780)                   BUG
sqrt(-9.9)                     BUG
sqrt(-47)                      BUG
sqrt(-1)            

## Summary:

In this notebook, we introduced *AVICENNA*, a powerful semantic debugging tool designed to automatically determine the causes and conditions of program failures. Through the example of a simple calculator program, we showcased the following:

1. **Setting Up an Oracle**: We defined an intermediary function, termed as an 'oracle', which classifies the exceptions produced by our program into expected and unexpected categories.
2. **Grammar Definition**: A structured blueprint for generating valid inputs to the calculator was established.
3. **Automated Debugging with AVICENNA**: Using the provided grammar, initial test cases, and the oracle, AVICENNA systematically probed our calculator program and identified potential root causes for observed failures.
4. **Interpreting Results**: We decoded AVICENNA's output, learning that the failure in our calculator program is triggered when computing the square root of a negative number.