# EvoGFuzz: An Evolutionary Approach to Grammar-Based Fuzzing

**EvoGFuzz** stands for *evolutionary grammar-based fuzzing*. This approach leverages evolutionary optimization techniques to systematically explore the space of a program's potential inputs, with a particular emphasis on identifying inputs that could lead to exceptional behavior. With a user-defined objective, EvoGFuzz can adapt and refine the input generation strategy over time, making it a powerful tool for uncovering software defects and vulnerabilities.

Efficient detection of defects and vulnerabilities hinges on the ability to automatically generate program inputs that are both valid and diverse. One common strategy is to use grammars, which provide structured and syntactically correct inputs. This approach leads to the concept of grammar-based fuzzing, where fuzzing strategies are guided by the rules defined within the grammar.

A further enhancement to this concept is probabilistic grammar-based fuzzing, where competing grammar rules are associated with probabilities that guide their application. By carefully assigning and optimizing these probabilities, we gain considerable control over the nature of the generated inputs. This enables us to direct the fuzzing process towards specific areas of interest—for example, those functions that are deemed critical, have a higher propensity for failures, or have undergone recent modifications. 

In essence, EvoGFuzz represents a potent blend of evolutionary optimization and probabilistic grammar-based fuzzing, poised to reveal hidden defects and vulnerabilities in a targeted and efficient manner.

## Fuzzing a Program

Our program under investigation is `The Calculator`. This program acts as a typical calculator, capable of evaluating not just arithmetic expressions but also trigonometric functions, such as sine, cosine, and tangent. Furthermore, it also supports the calculation of the square root of a given number.

In [1]:
import math

def calculator(inp: str) -> float:
    """
        A simple calculator function that can evaluate arithmetic expressions 
        and perform basic trigonometric functions and square root calculations.
    """
    return eval(
        str(inp), {"sqrt": math.sqrt, "sin": math.sin, "cos": math.cos, "tan": math.tan}
    )

**Side Note:** In the `calculator`, we use Python's `eval` function, which takes a string and evaluates it as a Python expression. We provide a dictionary as the second argument to eval, mapping names to corresponding mathematical functions. This enables us to use the function names directly within the input string. 

In [2]:
# Evaluating the cosine of 2π
print(calculator('cos(6*3.141)'))

0.999993677717667


In [3]:
# Calculating the square root of 36
print(calculator('sqrt(6*6)'))

6.0


Each of these calls to the calculator will evaluate the provided string as a mathematical expression, and print the result.

Now, to find new defects, we need to introduce an oracle that tells us if the error that is triggered is something we expect or a new/unkonwn defect. The `OracleResult` is an enum with two possible values, `NO_BUG` and `BUG`. `NO_BUG` donates a passing test case and `BUG` a failing one.

We import the `OracleResult` enumerated type from the `evogfuzz` library. This is used in the oracle function to indicate the outcome of executing the 'calculator' function with a given input.

In [4]:
from debugging_framework.input.oracle import OracleResult

This is a function called **oracle**, which acts as an intermediary to handle and classify exceptions produced by the calculator function when given a certain input.

In [5]:
# Make sure you use the OracleResult from the debugging_framework library
from debugging_framework.input.oracle import OracleResult

def oracle(inp: str):
    """
    This function serves as an oracle or intermediary that catches and handles exceptions 
    generated by the 'calculator' function. The oracle function is used in the context of fuzz testing.
    It aims to determine whether an input triggers a bug in the 'calculator' function.

    Args:
        inp (str): The input string to be passed to the 'calculator' function.

    Returns:
        OracleResult: An enumerated type 'OracleResult' indicating the outcome of the function execution.
            - OracleResult.PASSING: Returned if the calculator function executes without any exception or only with CalculatorSyntaxError
            - OracleResult.FAILING: Returned if the calculator function raises a ValueError exception, indicating a potential bug.
    """
    try:
        calculator(inp)
    except ValueError as e:
        return OracleResult.FAILING
    
    return OracleResult.PASSING

This **oracle** function is used in the context of fuzzing to determine the impact of various inputs on the program under test (in our case the _calculator_). When the calculator function behaves as expected (i.e., no exceptions occur), the **oracle** function returns `OracleResult.NO_BUG`. However, when the `calculator` function raises an unexpected exception, the **oracle** interprets this as a potential bug in the `calculator` and returns `OracleResult.BUG`.

We can see this in action by testing a few initial inputs:

In [6]:
initial_inputs = ['sqrt(1)', 'cos(912)', 'tan(4)']

for inp in initial_inputs:
    print(inp.ljust(20), oracle(inp))

sqrt(1)              PASSING
cos(912)             PASSING
tan(4)               PASSING


The following code represents a simple context-free grammar for our calculator function. This grammar encompasses all the potential valid inputs to the calculator, which include mathematical expressions involving square roots, trigonometric functions, and integer and decimal numbers:

In [7]:
from debugging_framework.types import Grammar
from debugging_framework.fuzzingbook.grammar import is_valid_grammar

CALCGRAMMAR: Grammar = {
    "<start>":
        ["<function>(<term>)"],

    "<function>":
        ["sqrt", "tan", "cos", "sin"],
    
    "<term>": ["-<value>", "<value>"], 
    
    "<value>":
        ["<integer>.<integer>",
         "<integer>"],

    "<integer>":
        ["<digit><integer>", "<digit>"],

    "<digit>":
        ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
}
    
assert is_valid_grammar(CALCGRAMMAR)

The defined grammar CALCGRAMMAR provides a structured blueprint for creating various inputs for our fuzz testing. Each rule in this grammar reflects a possible valid input that our calculator function can handle. By fuzzing based on this grammar, we can systematically explore the space of valid inputs to the calculator function.

### Leveraging EvoGFuzz to Unearth New Defects

We apply our `EvoGFuzz` class to carry out fuzz testing using evolutionary grammar-based fuzzing. This is aimed at uncovering potential defects in our 'calculator' function.

To initialize our EvoGFuzz instance, we require a grammar (in our case, `CALCGRAMMAR`), an oracle function, an initial set of inputs, a fitness function, and the number of iterations to be performed in the fuzzing process.

Upon creating the `EvoGFuzz` instance, we can execute the fuzzing process. The `fuzz()` method runs the fuzzing iterations, evolving the inputs based on our fitness function, and returns a collection of inputs that lead to exceptions in the 'calculator' function.

In [8]:
from evogfuzz.evogfuzz_class import EvoGFuzz

epp = EvoGFuzz(
    grammar=CALCGRAMMAR,
    oracle=oracle,
    inputs=initial_inputs,
    iterations=10
)

Upon creating the `EvoGFuzz` instance, we can execute the fuzzing process. The `.fuzz()` method runs the fuzzing iterations, evolving the inputs based on our fitness function, and returns a collection of inputs that lead to exceptions in the 'calculator' function.

In [9]:
found_exception_inputs = epp.fuzz()
print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")

EvoGFuzz found 407 bug-triggering inputs!


Lastly, we can examine the inputs that resulted in exceptions. This output can provide valuable insight into potential weaknesses in the 'calculator' function that need to be addressed.

In [10]:
# print only the first 20 bug-triggering inputs
for inp in list(found_exception_inputs)[:20]:
    print(str(inp))

sqrt(-7674.817)
sqrt(-7787.283244)
sqrt(-18499111.171)
sqrt(-85.9)
sqrt(-5619)
sqrt(-5.4436)
sqrt(-3)
sqrt(-49.2775)
sqrt(-277.83)
sqrt(-8)
sqrt(-4.77)
sqrt(-75459.545929)
sqrt(-4.579)
sqrt(-2.676777)
sqrt(-441)
sqrt(-928)
sqrt(-61.98)
sqrt(-84548.7)
sqrt(-95759.9981)
sqrt(-5784.8)


This process illustrates the power of evolutionary grammar-based fuzzing in identifying new defects within our system. By applying evolutionary algorithms to our fuzzing strategy, we can guide the search towards more defect-prone regions of the input space.

#### Analyzing and Sorting All Generated Inputs by Fitness

After the fuzzing process, you may want to examine all the generated inputs. These can be accessed using the `get_all_inputs()` method. Additionally, we can sort these inputs based on their fitness scores to gain insights into which inputs performed best according to our fitness function.

In [11]:
all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)

Now, let's print out these sorted inputs along with their respective fitness scores. Inputs with higher fitness scores will be displayed first, as these are the ones our evolutionary process deemed more likely to uncover potential defects.

In [12]:
# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:20]:
    print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")

sqrt(-7787.283244)                       fitness: 1
sqrt(-7674.817)                          fitness: 1
sqrt(-18499111.171)                      fitness: 1
sqrt(-85.9)                              fitness: 1
sqrt(-5619)                              fitness: 1
sqrt(-5.4436)                            fitness: 1
sqrt(-3)                                 fitness: 1
sqrt(-8)                                 fitness: 1
sqrt(-49.2775)                           fitness: 1
sqrt(-2.676777)                          fitness: 1
sqrt(-4.77)                              fitness: 1
sqrt(-277.83)                            fitness: 1
sqrt(-75459.545929)                      fitness: 1
sqrt(-4.579)                             fitness: 1
sqrt(-441)                               fitness: 1
sqrt(-928)                               fitness: 1
sqrt(-61.98)                             fitness: 1
sqrt(-84548.7)                           fitness: 1
sqrt(-95759.9981)                        fitness: 1
sqrt(-5784.8

This output provides an overview of the evolved inputs and their effectiveness in revealing potential defects, as gauged by our fitness function. It is a valuable resource for understanding the behavior of our program under various inputs and the effectiveness of our evolutionary grammar-based fuzzing approach.

In [13]:
from evogfuzz.input import Input

def fitness_function_naive(test_input: Input) -> int:
    score_structure = len(str(test_input))
    if test_input.oracle == OracleResult.FAILING:
        score_feedback = 100
    else:
        score_feedback = 0
    return score_feedback + score_structure

In [14]:
epp = EvoGFuzz(
    grammar=CALCGRAMMAR,
    oracle=oracle,
    inputs=initial_inputs,
    fitness_function=fitness_function_naive,
    iterations=10
)

found_exception_inputs = epp.fuzz()

print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")

EvoGFuzz found 370 bug-triggering inputs!


In [15]:
all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)

In [16]:
# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:20]:
    print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")

sqrt(-575.93759571669933132312911655553574916764591) fitness: 152
sqrt(-572195155538.3945343754919331479365535353499) fitness: 151
sqrt(-997113336573179945214945252937.4537966) fitness: 145
sqrt(-541475441782313811314939951144.235524) fitness: 144
sqrt(-3494694292775413917517936613.4135759) fitness: 143
sqrt(-63333569712545178769739563435797.4) fitness: 141
sqrt(-554.33461165854946554379255563975) fitness: 140
sqrt(-5665412949113488.337394817317)     fitness: 136
sqrt(-3129539873637463773.34259539)      fitness: 135
sqrt(-56759.33843396673497437549)        fitness: 133
sqrt(-751945491579733.759655927)         fitness: 132
sqrt(-39949227769947755919.696)          fitness: 131
sqrt(-4771214465517349.435587)           fitness: 130
sqrt(-418711245.3185353444435)           fitness: 130
sqrt(-511351555974719.6937633)           fitness: 130
sqrt(-22133317962631595257.79)           fitness: 130
sqrt(-559375228397.9556577113)           fitness: 130
sqrt(-9.442119114944911194429)           fitne

In [17]:
from evogfuzz.input import Input

def fitness_function_expansions(test_input: Input) -> int:
    score_structure = len(str(test_input))
    if test_input.oracle == OracleResult.FAILING:
        score_feedback = 100
    else:
        score_feedback = 0
    return score_feedback + score_structure

In [18]:
epp = EvoGFuzz(
    grammar=CALCGRAMMAR,
    oracle=oracle,
    inputs=initial_inputs,
    fitness_function=fitness_function_expansions,
    iterations=10
)

found_exception_inputs = epp.fuzz()

print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")

EvoGFuzz found 226 bug-triggering inputs!


In [19]:
all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)

In [91]:
def count_expansions(children):
    counter = 0
    if children == []:
        return -1
    
    for child in children:
        node, next_child = child
        print(node)
        print(next_child)

        counter += 1
        
        print(counter)
        counter += count_expansions(next_child)
        print("\n")
        print(node)
        print(counter)
        print("\n")
    
    return counter

In [92]:
# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:2]:
    node, children = inp.tree # derivation tree; node -> children; example: <start> -> <function>(<term>)... 
    #print(node, children)
    counter_exp = count_expansions([inp.tree])
    print(counter_exp)

node, children = all_generated_inputs_sorted[0].tree
#print(str(all_generated_inputs_sorted[0]) + "\n")
#print(node)
#print("\n")
#print(children)
#print("\n")
for child in children:
    node, child1 =child
#    print(node)
#    print(child1)
#    print("\n")
    
    #print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")

<start>
[DerivationTree('<function>', (DerivationTree('cos', (), id=99648),), id=99649), DerivationTree('(', (), id=99647), DerivationTree('<term>', (DerivationTree('<value>', (DerivationTree('<integer>', (DerivationTree('<digit>', (DerivationTree('7', (), id=99642),), id=99643),), id=99644), DerivationTree('.', (), id=99641), DerivationTree('<integer>', (DerivationTree('<digit>', (DerivationTree('2', (), id=99638),), id=99639), DerivationTree('<integer>', (DerivationTree('<digit>', (DerivationTree('6', (), id=99635),), id=99636),), id=99637)), id=99640)), id=99645),), id=99646), DerivationTree(')', (), id=99634)]
1
<function>
[DerivationTree('cos', (), id=99648)]
1
cos
[]
1


cos
-1




<function>
1


(
[]
2


(
0


<term>
[DerivationTree('<value>', (DerivationTree('<integer>', (DerivationTree('<digit>', (DerivationTree('7', (), id=99642),), id=99643),), id=99644), DerivationTree('.', (), id=99641), DerivationTree('<integer>', (DerivationTree('<digit>', (DerivationTree('2', (), id=996

### Incorporating Custom Fitness Functions

The fitness function plays a crucial role in guiding the evolution process of our fuzzing inputs. A well-crafted fitness function can effectively direct the search towards the most promising regions of the input space.

To create your own fitness function, define a function that takes an `Input` instance and returns a float value. The return value represents the 'fitness' of the given input, with higher values indicating better fitness. Here is a simple template:

```python
from evogfuzz.input import Input

def fitness_function_XYZ(inp: Input) -> float:
    # Implement your fitness function here.
    return 0.0
```

For instance, suppose we're interested in inputs that invoke the cosine function in our calculator. We could define a fitness function `fitness_function_cos` that assigns a high fitness value to inputs containing 'cos'. (**Note that this might not be the best fitness function to find new expcetions.**)

In [21]:
from evogfuzz.input import Input

def fitness_function_cos(inp: Input) -> float:
    if 'cos' in str(inp):
        return 1.0
    else:
        return 0.0

Once your fitness function is defined, you can incorporate it into the `EvoGFuzz` instance by passing it as the `fitness_function` argument. 

In [22]:
epp = EvoGFuzz(
    grammar=CALCGRAMMAR,
    oracle=oracle,
    inputs=initial_inputs,
    fitness_function=fitness_function_cos,
    iterations=10
)

found_exception_inputs = epp.fuzz()

print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")

for inp in found_exception_inputs:
    print(str(inp))

EvoGFuzz found 14 bug-triggering inputs!
sqrt(-7.66)
sqrt(-1.12)
sqrt(-766.2)
sqrt(-6.1667)
sqrt(-7.2857)
sqrt(-6.2)
sqrt(-7272.11)
sqrt(-6)
sqrt(-22.472)
sqrt(-211272222.266)
sqrt(-2)
sqrt(-67.5)
sqrt(-66.72)
sqrt(-7)


This way, the evolutionary grammar-based fuzzing process is now guided by your custom fitness function, focusing more on the areas you deem critical.

#### Evaluating Inputs Based on Custom Fitness Function

When utilizing a custom fitness function, such as `fitness_function_cos` in our case, we expect inputs containing 'cos' to achieve the highest fitness scores. This is because our fitness function assigns a score of 1.0 to any input that includes 'cos'.

To confirm this behavior, we retrieve all inputs generated during the fuzzing process using the `get_all_inputs()` method and sort these inputs based on their fitness scores.

In [23]:
all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)

Let's display these sorted inputs along with their fitness scores. The inputs that contain 'cos' should appear first, demonstrating their high fitness value.

In [24]:
# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:20]:
    print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")

cos(7.26)                                fitness: 1.0
cos(19)                                  fitness: 1.0
cos(-2762)                               fitness: 1.0
cos(7.6)                                 fitness: 1.0
cos(76397.53)                            fitness: 1.0
cos(7126.3)                              fitness: 1.0
cos(757.72)                              fitness: 1.0
cos(162821.87)                           fitness: 1.0
cos(42)                                  fitness: 1.0
cos(5.2)                                 fitness: 1.0
cos(4)                                   fitness: 1.0
cos(1.412)                               fitness: 1.0
cos(42.1)                                fitness: 1.0
cos(224.24)                              fitness: 1.0
cos(-6)                                  fitness: 1.0
cos(7.71)                                fitness: 1.0
cos(942.41)                              fitness: 1.0
cos(-9.2)                                fitness: 1.0
cos(7.1)                    

The resulting output validates the effectiveness of our custom fitness function. It shows how we can guide the evolutionary grammar-based fuzzing process towards specific regions of the input space, thereby facilitating targeted exploration and bug discovery.