# Effective Software Testing - Fuzzing Assignment
Both the description and the solution of this assignment will take place in this interactive jupyter notebook. Your deliverable should contain
- a `fuzzing_assignment.ipynb`, which will be the edited version of this file that will contain your solution.
- an `assignment_utils.py` that we provide you and holds helper classes for the assignment. Generally speaking, you should not edit this file, but see below for details.

The assignment *can* be solved only by filling in the cells where there is the comment "# Your code here", **but**, many solutions exist, and it is possible that some of them involve editing other cells of even the `assignment_utils.py`. The comments ("# Your code here") are **optional**, and they just indicate an exemplary solution that we will make available to you after the deadline. Feel free to edit any other cell, file, or even install new packages (in which case you must submit them in a comment at the beginning). Finally, any comments, documentation, or other remarks in natural language should take place inside this notebook, in a markdown cell.

For any question about the assignment, feel free to reach out to `konstantinos.kitsios@uzh.ch`.

Read below for more, and good luck!

# Installation
Python 3.9 is required for this assignment.
To install the required packages, create a new `pip` virtual environment and install the required libraries by running
```
pip install -r requirements.txt
```

If you encounter issues with the python version or the installation of requirements.txt, please reach out to `konstantinos.kitsios@uzh.ch`.

# Setup
The first part of the notebook sets up the environment by implementing classes/functions we talked about during the lecture.

In [24]:
from urllib.parse import urlparse
import random
import matplotlib.pyplot as plt
from scipy.stats import mannwhitneyu
import pickle
import hashlib
from typing import Tuple, List, Callable, Set, Any, Dict

In [4]:
from assignment_utils import Fuzzer, Runner, FunctionRunner, FunctionCoverageRunner, Coverage, population_coverage, Location

Remember the `urlparse()` function from the lecture? We will use this again. You might also remember that it never crashes, even with invalid urls, so we will use the corresponding `urlparse_that_crashes()`, like we did in the lecture.

In [5]:
url = "https://uzh.ch"
urlparse(url)

ParseResult(scheme='https', netloc='uzh.ch', path='', params='', query='', fragment='')

In [6]:
def urlparse_that_crashes(url: str) -> bool:
    supported_schemes = ["http", "https"]
    result = urlparse(url)
    if result.scheme not in supported_schemes:
        raise ValueError("Scheme must be one of " + repr(supported_schemes))
    if result.netloc == '':
        raise ValueError("Host must be non-empty")

    return True

Now, we will copy the code of the second live coding session we did in during the lecture. The code below implements the mutation part of a mutation-based blackbox fuzzer.

In [7]:
def mutate(s, num_of_mutations = 1):
    mutators = [delete_random, insert_random, flip_random]

    for i in range(num_of_mutations):
        mutator_ind = random.randint(0, 2)
        selected_mutator = mutators[mutator_ind]
        s = selected_mutator(s)

    return s

In [8]:
def delete_random(s):
    pos_to_delete = random.randint(0, len(s) - 1)
    return s[:pos_to_delete-1] + s[pos_to_delete:]

In [9]:
def insert_random(s):
    pos_to_insert = random.randint(0, len(s)-1)
    char_to_insert = chr(random.randint(32, 64))
    return s[:pos_to_insert-1] + char_to_insert + s[pos_to_insert-1:]

In [10]:
def flip_random(s):
    pos_to_flip  = random.randint(0, len(s)-1)
    byte_to_flip = ord(s[pos_to_flip]) # char --> byte
    bit_to_flip  = 1 << random.randint(0, 6) # 0010000 or 100000
    flipped_char = chr(byte_to_flip ^ bit_to_flip)

    return s[:pos_to_flip] + flipped_char + s[pos_to_flip+1:]

To put the function `mutate()` in use, we will implement the mutation-based blackbox fuzzer in the `MutationBlackboxFuzzer` class as following. The base class `Fuzzer` is just a placeholder that you can find in `utils.py`.

In [11]:
class MutationBlackboxFuzzer(Fuzzer):
    """Base class for mutational fuzzing"""

    def __init__(self, seed: List[str],
                 min_mutations: int = 2,
                 max_mutations: int = 10) -> None:
        """Constructor.
        `seed` - a list of (input) strings to mutate.
        `min_mutations` - the minimum number of mutations to apply.
        `max_mutations` - the maximum number of mutations to apply.
        """
        self.seed = seed
        self.min_mutations = min_mutations
        self.max_mutations = max_mutations
        self.reset()

    def reset(self) -> None:
        """Set population to initial seed.
        To be overloaded in subclasses."""
        self.population = self.seed
        self.seed_index = 0


    def mutate(self, inp: str) -> str:
        return mutate(inp)

    def create_candidate(self) -> str:
        """Create a new candidate by mutating a population member"""
        candidate = random.choice(self.population)
        trials = random.randint(self.min_mutations, self.max_mutations)
        for i in range(trials):
            candidate = self.mutate(candidate)
        return candidate

    def fuzz(self) -> str:
        # If we did not return all the seeds, return the next seed. Only after we run 
        # out of seeds, we return a mutated version of the seed
        if self.seed_index < len(self.seed):
            # Still seeding
            self.inp = self.seed[self.seed_index]
            self.seed_index += 1
        else:
            # Mutating
            self.inp = self.create_candidate()
        return self.inp

In [12]:
seed_input = "http://www.google.com/search?q=fuzzing"
mutation_fuzzer = MutationBlackboxFuzzer(seed=[seed_input])

Remember the `Runner` class we used in the lecture to measure the coverage achieved by a specific input. Instead of calling
```python
result = urlparse_that_crashes("https://uzh.ch")
```

you can instead do
```python
http_runner = FunctionCoverageRunner(urlparse_that_crashes) # define the function here
result, isCrash = http_runner.run("https://uzh.ch") # define the input here
```
which also gives you access to the coverage achieved:
```
list(http_runner.coverage())
```

In [13]:
http_runner = FunctionCoverageRunner(urlparse_that_crashes) # Only initialize it once per function!
http_runner.run("https://foo.bar/")

len(list(http_runner.coverage()))

51

Putting it all together, we can now run our fuzzer with a single seed for 1000 trials and measure the number of covered lines  like this

In [14]:
seed_input = "https://uzh.ch"
mutation_fuzzer = MutationBlackboxFuzzer(seed=[seed_input])

mutation_fuzzer.runs(http_runner, trials=1000)
print(mutation_fuzzer.population)

print("%d lines covered" % len(http_runner.coverage()))

['https://uzh.ch']
43 lines covered


# Task 01 - Investigating the Effect of Starting Seeds
In the above run, our fuzzer covers 40 lines using only one starting seed ("https://uzh.ch"). Your first task is to investigate the effect of the starting seed in fuzzer performance. We remind that fuzzer performance is usually measure by the achieved coverage of the fuzzer (higher coverage => higher probability of exposing bugs => better performance).

Compare the coverage of the `MutationBlackboxFuzzer` with one seed vs ten seeds. The choice of the ten seeds is up to you, you can either handpick ten random urls, or write a small script that randomly navigates the web and returns ten urls for more diversity.

Comparing the performance of two fuzzers (or the same fuzzer with different configurations in our case) is a nuanced topic due to the inherent randomness of fuzzing. For example, if we run the `MutationBlackboxFuzzer` with one seed and cover 40 lines, and then run it with ten seeds and cover 42 lines, can we claim that the more starting seeds the better? Probably not.
To make claims based on empirical data, we have to run statistical tests. In fuzzing, typically each fuzzer configuration is ran for 30 runs of 10,000 trials (i.e., you run the above cell 30 times), yielding an array of 30 coverage values for each fuzzer. The two arrays are then compared using the Mann-Whitney U Test with the null hypothesis that the values of the first array (i.e., coverage of the first fuzzer) are greater than those of the second array. If the p-value of the test is <0.05, we reject the null hypothesis, claiming that the first fuzzer achieves less coverage than the second one (with confidence level 1-0.05 = 95%).
You can run the Mann-Whitney U Test in python with the following code:
```python
from scipy.stats import mannwhitneyu
u_stat, p_value = mannwhitneyu(c1, c2, alternative='less')
if p_value < 0.05:
    print("Values of c1 are smaller than those in c2 with confidence level 95%")
```

In [15]:
single_seed = ["https://uzh.ch"]
ten_seeds = [
    "http://www.google.com",
    "https://www.youtube.com",
    "http://www.wikipedia.org",
    "https://www.github.com",
    "http://www.stackoverflow.com",
    "https://www.interdiscount.ch",
    "http://www.digitec.ch",
    "https://www.autoscout24.ch",
    "http://www.sbb.ch",
    "https://www.rhb.ch"
]

http_runner = FunctionCoverageRunner(urlparse_that_crashes)

num_runs = 30
trials_per_run = 10000

coverage_single_seed = []
coverage_ten_seeds = []

print(f"Running {num_runs} runs with {trials_per_run} trials each:")

print("Single-seed fuzzer:")
for i in range(num_runs):
    http_runner = FunctionCoverageRunner(urlparse_that_crashes)
    mutation_fuzzer_single = MutationBlackboxFuzzer(seed=single_seed)
    mutation_fuzzer_single.runs(http_runner, trials=trials_per_run)
    coverage_single_seed.append(len(http_runner.coverage()))
    print(f"{i+1}/{num_runs} Coverage: {coverage_single_seed[-1]}")

print("\nTen-seed fuzzer:")
for i in range(num_runs):
    http_runner = FunctionCoverageRunner(urlparse_that_crashes)
    mutation_fuzzer_ten = MutationBlackboxFuzzer(seed=ten_seeds)
    mutation_fuzzer_ten.runs(http_runner, trials=trials_per_run)
    coverage_ten_seeds.append(len(http_runner.coverage()))
    print(f"{i+1}/{num_runs} Coverage: {coverage_ten_seeds[-1]}")


print("\nMann-Whitney U Test:")
u_stat, p_value = mannwhitneyu(coverage_single_seed, coverage_ten_seeds, alternative='less')

print(f"u_stat: {u_stat}")
print(f"p_value: {p_value}")

if p_value < 0.05:
    print("Based on the test, the coverage achieved with the ten-seed fuzzer is greater than the coverage achieved with the single-seed fuzzer.")
else:
    print("Based on the test, the coverage achieved with the ten-seed fuzzer is not greater than the coverage achieved with the single-seed fuzzer.")

print(f"\nSingle-seed fuzzer average coverage: {sum(coverage_single_seed) / num_runs}")
print(f"Ten-seed fuzzer average coverage: {sum(coverage_ten_seeds) / num_runs}")

Running 30 runs with 10000 trials each:
Single-seed fuzzer:
1/30 Coverage: 39
2/30 Coverage: 43
3/30 Coverage: 52
4/30 Coverage: 42
5/30 Coverage: 42
6/30 Coverage: 39
7/30 Coverage: 43
8/30 Coverage: 51
9/30 Coverage: 42
10/30 Coverage: 39
11/30 Coverage: 51
12/30 Coverage: 42
13/30 Coverage: 43
14/30 Coverage: 42
15/30 Coverage: 51
16/30 Coverage: 43
17/30 Coverage: 52
18/30 Coverage: 42
19/30 Coverage: 43
20/30 Coverage: 42
21/30 Coverage: 43
22/30 Coverage: 46
23/30 Coverage: 42
24/30 Coverage: 50
25/30 Coverage: 42
26/30 Coverage: 39
27/30 Coverage: 51
28/30 Coverage: 39
29/30 Coverage: 42
30/30 Coverage: 43

Ten-seed fuzzer:
1/30 Coverage: 39
2/30 Coverage: 42
3/30 Coverage: 42
4/30 Coverage: 39
5/30 Coverage: 42
6/30 Coverage: 40
7/30 Coverage: 42
8/30 Coverage: 42
9/30 Coverage: 50
10/30 Coverage: 42
11/30 Coverage: 42
12/30 Coverage: 43
13/30 Coverage: 53
14/30 Coverage: 51
15/30 Coverage: 47
16/30 Coverage: 51
17/30 Coverage: 51
18/30 Coverage: 51
19/30 Coverage: 42
20/30 Cov

# Task 02: Investigating the Effect of More Sophisticated Mutations
The mutation function we implemented above (and in the lecture) is:
```python
def mutate(s, num_of_mutations = 1):
    mutators = [delete_random, insert_random, flip_random]

    for i in range(num_of_mutations):
        mutator_ind = random.randint(0, 2)
        selected_mutator = mutators[mutator_ind]
        s = selected_mutator(s)

    return s
```

Your task is to implement a more sophisticated mutation strategy by:
1. Extending the `mutate(s, num_of_mutations = 1)` function above to support three more mutators
    - Addition of small integers to a random byte
    - Replacing a random byte with a completely random byte value
    - Block duplucation (insert a random block of characters of `s` in a random position of `s`)
2. Extending the `mutate(self, s)` method of `MutationBlackboxFuzzer` so that with probability `p` it calls the `mutate(s, num_of_mutations = 1)` function above and with probability `1-p` mutates the input using **splicing**; Splicing (a.k.a, crossover) refers to selecting another seed from the existing population, and merging a random block of the new seed with the current seed at a random position.

Compare the performance of your new fuzzer with the previous fuzzer over 30 runs of 5000 trials each. Use a single starting seed `s=["https://www.swissinfo.ch"]`. What can you claim about the effect of more complex mutations?

Note: For more information about the exact mutations used in the most popular fuzzer, AFL, you can have a look at this blog by the AFL creator: https://lcamtuf.blogspot.com/2014/08/binary-fuzzing-strategies-what-works.html

In [18]:
# Extended mutate(s, num_of_mutations = 1) function
def delete_random(s):
    if not s:
        return s
    if len(s) == 1:
        return ""
    pos_to_delete = random.randint(0, len(s) - 1)
    return s[:pos_to_delete] + s[pos_to_delete+1:]

def insert_random(s):
    char_to_insert = chr(random.randint(32, 126))
    if not s:
        return char_to_insert
    pos_to_insert = random.randint(0, len(s))
    return s[:pos_to_insert] + char_to_insert + s[pos_to_insert:]

def flip_random(s):
    if not s:
        return s
    pos_to_flip  = random.randint(0, len(s)-1)
    byte_to_flip = ord(s[pos_to_flip]) # char --> byte
    bit_to_flip  = 1 << random.randint(0, 6) # 0010000 or 100000
    flipped_char = chr(byte_to_flip ^ bit_to_flip)

    return s[:pos_to_flip] + flipped_char + s[pos_to_flip+1:]

def add_small_int(s):
    if not s:
        return s
    pos_to_add = random.randint(0, len(s) - 1)
    byte_to_add = ord(s[pos_to_add])

    add_val = random.randint(-10, 10)
    if add_val >= 0:
        add_val += 1
    else:
        add_val -= 1

    # If byte range is exceeded
    new_byte = (byte_to_add + add_val) % 256
    flipped_char = chr(new_byte)

    return s[:pos_to_add] + flipped_char + s[pos_to_add+1:]

def replace_random_byte(s):
    if not s: # Handle empty string input
        return s
    pos_to_replace = random.randint(0, len(s) - 1)

    # Random byte value
    random_byte = random.randint(32, 126)
    replaced_char = chr(random_byte)

    return s[:pos_to_replace] + replaced_char + s[pos_to_replace+1:]


def block_duplicate(s):
    # A string of size at least 2 is needed
    if len(s) < 2:
        return s

    # Select a random start and end position for the block
    block_start = random.randint(0, len(s) - 1)
    block_end = random.randint(block_start, len(s) - 1)
    block_to_duplicate = s[block_start : block_end + 1]

    # Select a random position to insert the duplicated block
    insert_pos = random.randint(0, len(s))

    return s[:insert_pos] + block_to_duplicate + s[insert_pos:]

# Modified mutate function
def mutate(s, num_of_mutations = 1):
    mutators = [delete_random, insert_random, flip_random, add_small_int, replace_random_byte, block_duplicate]

    for i in range(num_of_mutations):
        mutator_ind = random.randint(0, 5)
        selected_mutator = mutators[mutator_ind]
        s = selected_mutator(s)

    return s


# Extended mutate(self, s) method in MutationBlackboxFuzzer
class ExtendedMutationBlackboxFuzzer(MutationBlackboxFuzzer):
    """Extended mutation-based fuzzer with splicing"""

    def __init__(self, seed: List[str], min_mutations: int = 2, max_mutations: int = 10, splicing_prob: float = 0.5) -> None:
        """Constructor.
        `seed` - a list of (input) strings to mutate.
        `min_mutations` - the minimum number of mutations to apply.
        `max_mutations` - the maximum number of mutations to apply.
        `splicing_prob` - the probability of performing splicing instead of pure mutation.
        """
        super().__init__(seed, min_mutations, max_mutations)
        self.splicing_prob = splicing_prob

        # Ensure the initial seeds are added to the population for splicing
        self.population = list(self.seed)


    def mutate(self, inp: str) -> str:
        return mutate(inp)

    def create_candidate(self) -> str:
        """Create a new candidate using either mutation or splicing."""
        if random.random() < self.splicing_prob and len(self.population) > 1:
            seed1 = random.choice(self.population)
            seed2 = None
            attempts = 0
            max_attempts = 100

            while attempts < max_attempts:
                temp_seed2 = random.choice(self.population)
                if temp_seed2 != seed1:
                    seed2 = temp_seed2
                    break
                attempts += 1

            if seed2 is not None:
                splice_point1 = random.randint(0, len(seed1))
                splice_point2 = random.randint(0, len(seed2))

                # Combine parts of seed1 and seed2
                candidate = seed1[:splice_point1] + seed2[splice_point2:]

                # Optionally apply some mutations after splicing
                trials = random.randint(self.min_mutations, self.max_mutations)
                for i in range(trials):
                    candidate = self.mutate(candidate)

                return candidate
            else:
                # Fallback to mutation on seed1
                candidate = seed1
                trials = random.randint(self.min_mutations, self.max_mutations)
                for i in range(trials):
                    candidate = self.mutate(candidate)
                return candidate

        else:
            # Perform pure mutation (like in the original fuzzer but with extended mutators)
            candidate = random.choice(self.population)
            trials = random.randint(self.min_mutations, self.max_mutations)
            for i in range(trials):
                candidate = self.mutate(candidate)
            return candidate

    def run(self, runner: FunctionCoverageRunner) -> Any:
        """Run function(inp) while tracking coverage and update population."""
        result, outcome = super().run(runner)
        self.population.append(self.inp)


        return result

    def fuzz(self) -> str:
        if self.seed_index < len(self.seed):
            self.inp = self.seed[self.seed_index]
            self.seed_index += 1
        else:
            self.inp = self.create_candidate()

        return self.inp


# Analysis

single_seed_task2 = ["https://www.swissinfo.ch"]

num_runs_task2 = 30
trials_per_run_task2 = 5000

coverage_original_fuzzer = []
coverage_extended_fuzzer = []

print(f"Running {num_runs_task2} runs with {trials_per_run_task2} trials each:")

print("Original MutationBlackboxFuzzer:")
for i in range(num_runs_task2):
    http_runner_orig = FunctionCoverageRunner(urlparse_that_crashes)
    original_fuzzer = MutationBlackboxFuzzer(seed=single_seed_task2)
    original_fuzzer.runs(http_runner_orig, trials=trials_per_run_task2)
    coverage_original_fuzzer.append(len(http_runner_orig.coverage()))
    print(f"{i+1}/{num_runs_task2} Coverage: {coverage_original_fuzzer[-1]}")

print("\nExtendedMutationBlackboxFuzzer:")
splicing_probability = 0.5
for i in range(num_runs_task2):
    http_runner_extended = FunctionCoverageRunner(urlparse_that_crashes)
    extended_fuzzer = ExtendedMutationBlackboxFuzzer(seed=single_seed_task2, splicing_prob=splicing_probability)
    extended_fuzzer.runs(http_runner_extended, trials=trials_per_run_task2)
    coverage_extended_fuzzer.append(len(http_runner_extended.coverage()))
    print(f"{i+1}/{num_runs_task2} Coverage: {coverage_extended_fuzzer[-1]}")


print("\nMann-Whitney U Test:")
u_stat_task2, p_value_task2 = mannwhitneyu(coverage_original_fuzzer, coverage_extended_fuzzer, alternative='less')

print(f"u_stat: {u_stat_task2}")
print(f"p_value: {p_value_task2}")

if p_value_task2 < 0.05:
    print("Based on the test, the coverage achieved with the original fuzzer is smaller than the coverage achieved with the ExtendedMutationBlackboxFuzzer fuzzer.")
else:
    print("Based on the test, the coverage achieved with the original fuzzer is not smaller than the coverage achieved with the ExtendedMutationBlackboxFuzzer fuzzer.")

print(f"\nOriginal fuzzer average coverage: {sum(coverage_original_fuzzer) / num_runs_task2}")
print(f"ExtendedMutationBlackboxFuzzer average coverage: {sum(coverage_extended_fuzzer) / num_runs_task2}")


Running 30 runs with 5000 trials each:
Original MutationBlackboxFuzzer:
1/30 Coverage: 39
2/30 Coverage: 43
3/30 Coverage: 51
4/30 Coverage: 42
5/30 Coverage: 43
6/30 Coverage: 43
7/30 Coverage: 43
8/30 Coverage: 43
9/30 Coverage: 51
10/30 Coverage: 42
11/30 Coverage: 52
12/30 Coverage: 51
13/30 Coverage: 43
14/30 Coverage: 43
15/30 Coverage: 42
16/30 Coverage: 50
17/30 Coverage: 42
18/30 Coverage: 50
19/30 Coverage: 42
20/30 Coverage: 50
21/30 Coverage: 42
22/30 Coverage: 50
23/30 Coverage: 42
24/30 Coverage: 42
25/30 Coverage: 39
26/30 Coverage: 43
27/30 Coverage: 43
28/30 Coverage: 43
29/30 Coverage: 43
30/30 Coverage: 50

ExtendedMutationBlackboxFuzzer:
1/30 Coverage: 39
2/30 Coverage: 40
3/30 Coverage: 39
4/30 Coverage: 43
5/30 Coverage: 52
6/30 Coverage: 42
7/30 Coverage: 41
8/30 Coverage: 39
9/30 Coverage: 43
10/30 Coverage: 46
11/30 Coverage: 40
12/30 Coverage: 42
13/30 Coverage: 42
14/30 Coverage: 42
15/30 Coverage: 39
16/30 Coverage: 43
17/30 Coverage: 40
18/30 Coverage: 42
1

# Task 03: Compare With Mutation-Based Greybox Fuzzer
To improve on the blackbox fuzzer in the lecture, we implemented the greybox fuzzer below, that dynamically adds a mutated input to the seeds queue if that input covers a new path. 

Your task is to compare the performance of the (already implemented) greybox fuzzer with that of the blackbox fuzzer over 30 runs of 5000 trials each.
Use the same single starting seed as the previous experiment.

What can you claim about the performance of they greybox fuzzer?

In [19]:
class MutationGreyboxFuzzer(MutationBlackboxFuzzer):
    """Fuzz with mutated inputs based on coverage"""

    def reset(self) -> None:
        super().reset()
        self.coverages_seen: Set[frozenset] = set()
        # Now empty; we fill this with seed in the first fuzz runs
        self.population = []

    def run(self, runner: FunctionCoverageRunner) -> Any:
        """Run function(inp) while tracking coverage.
           If we reach new coverage,
           add inp to population and its coverage to population_coverage
        """
        result, outcome = super().run(runner)
        new_coverage = frozenset(runner.coverage())
        if outcome == Runner.PASS and new_coverage not in self.coverages_seen:
            # We have new coverage
            self.population.append(self.inp)
            self.coverages_seen.add(new_coverage)

        return result

In [20]:
single_seed_task3 = ["https://www.swissinfo.ch"]

num_runs_task3 = 30
trials_per_run_task3 = 5000

coverage_blackbox_fuzzer_task3 = []
coverage_greybox_fuzzer_task3 = []

print(f"Running {num_runs_task3} runs with {trials_per_run_task3} trials each for Task 03...")

print("MutationBlackboxFuzzer:")
for i in range(num_runs_task3):
    http_runner_blackbox = FunctionCoverageRunner(urlparse_that_crashes)
    blackbox_fuzzer = MutationBlackboxFuzzer(seed=single_seed_task3)
    blackbox_fuzzer.runs(http_runner_blackbox, trials=trials_per_run_task3)
    coverage_blackbox_fuzzer_task3.append(len(http_runner_blackbox.coverage()))
    print(f"{i+1}/{num_runs_task3} Coverage: {coverage_blackbox_fuzzer_task3[-1]}")

print("\nMutationGreyboxFuzzer:")
for i in range(num_runs_task3):
    http_runner_greybox = FunctionCoverageRunner(urlparse_that_crashes)
    greybox_fuzzer = MutationGreyboxFuzzer(seed=single_seed_task3)
    greybox_fuzzer.runs(http_runner_greybox, trials=trials_per_run_task3)
    coverage_greybox_fuzzer_task3.append(len(http_runner_greybox.coverage()))
    print(f"{i+1}/{num_runs_task3} Coverage: {coverage_greybox_fuzzer_task3[-1]}")


print("\nMann-Whitney U Test:")
u_stat_task3, p_value_task3 = mannwhitneyu(coverage_blackbox_fuzzer_task3, coverage_greybox_fuzzer_task3, alternative='less')

print(f"u_stat: {u_stat_task3}")
print(f"p_value: {p_value_task3}")

if p_value_task3 < 0.05:
    print("Based on the test, the coverage achieved with the MutationBlackboxFuzzer is statistically smaller than the coverage achieved with the MutationGreyboxFuzzer.")
else:
    print("Based on the test, there is no statistically significant evidence to claim that the coverage achieved with the MutationBlackboxFuzzer is smaller than the coverage achieved with the MutationGreyboxFuzzer.")

# Optional: Print average coverage
print(f"\nMutationBlackboxFuzzer average coverage: {sum(coverage_blackbox_fuzzer_task3) / num_runs_task3}")
print(f"MutationGreyboxFuzzer average coverage: {sum(coverage_greybox_fuzzer_task3) / num_runs_task3}")

Running 30 runs with 5000 trials each for Task 03...
MutationBlackboxFuzzer:
1/30 Coverage: 58
2/30 Coverage: 42
3/30 Coverage: 39
4/30 Coverage: 42
5/30 Coverage: 39
6/30 Coverage: 39
7/30 Coverage: 44
8/30 Coverage: 50
9/30 Coverage: 43
10/30 Coverage: 39
11/30 Coverage: 50
12/30 Coverage: 39
13/30 Coverage: 42
14/30 Coverage: 42
15/30 Coverage: 50
16/30 Coverage: 39
17/30 Coverage: 50
18/30 Coverage: 39
19/30 Coverage: 51
20/30 Coverage: 50
21/30 Coverage: 52
22/30 Coverage: 43
23/30 Coverage: 42
24/30 Coverage: 50
25/30 Coverage: 42
26/30 Coverage: 42
27/30 Coverage: 51
28/30 Coverage: 39
29/30 Coverage: 35
30/30 Coverage: 50

MutationGreyboxFuzzer:
1/30 Coverage: 43
2/30 Coverage: 51
3/30 Coverage: 43
4/30 Coverage: 47
5/30 Coverage: 49
6/30 Coverage: 46
7/30 Coverage: 39
8/30 Coverage: 43
9/30 Coverage: 59
10/30 Coverage: 61
11/30 Coverage: 56
12/30 Coverage: 64
13/30 Coverage: 63
14/30 Coverage: 59
15/30 Coverage: 47
16/30 Coverage: 53
17/30 Coverage: 42
18/30 Coverage: 56
19/30

# Task 04: Boosted Greybox Fuzzer
In page 30 of the lecture materials, we discussed about boosting the performance of greybox fuzzers by prioritizing seeds that exercise more rare paths. Your task is to implement this into a class named `BoostedMutationGreyboxFuzzer` and compare its performance against the `MutationGreyboxFuzzer`.

Specifically, you must calculate the coverage achieved for each input you feed into the program-under-test, and keep track of how frequently a specific coverage is triggered (hint: you may find the function `getPathID()` below useful for this). 

Then, when selecting the next seed to mutate, you must select with probability reversly proportional to the frequency of the coverage of each seed. As a result, seeds that exercise rarer paths will be selected more frequently.

Compare the performance of your new `BoostedMutationGreyboxFuzzer` with the base `MutationGreyboxFuzzer` in two target programs:
1. On the `urlparse_that_crashes()` program using 30 runs of 5,000 trials each and the same single starting seed as above.
2. On the `crashme()` program found below, 30 runs of using 10,000 trials and a single starting seed `s=["good"]`.

In [21]:
def getPathID(coverage: Any) -> str:
    """Returns a unique hash for the covered statements"""
    pickled = pickle.dumps(sorted(coverage))
    return hashlib.md5(pickled).hexdigest()

In [22]:
def crashme(s: str) -> None:
    if len(s) > 0 and s[0] == 'b':
        if len(s) > 1 and s[1] == 'a':
            if len(s) > 2 and s[2] == 'd':
                if len(s) > 3 and s[3] == '!':
                    raise Exception()

In [31]:
class BoostedMutationGreyboxFuzzer(MutationBlackboxFuzzer):
    """Fuzz with mutated inputs based on coverage"""

    def reset(self) -> None:
        super().reset()
        self.coverages_seen: Set[frozenset] = set()
        # Now empty; we fill this with seed in the first fuzz runs
        self.population = list(self.seed)
        self.path_frequencies: Dict[str, int] = {}
        self.input_to_pathid: Dict[str, str] = {}

    def run(self, runner: FunctionCoverageRunner) -> Any:
        result, outcome = super().run(runner)
        current_inp = self.inp

        if outcome == Runner.PASS:
            current_coverage = frozenset(runner.coverage())
            path_id = getPathID(current_coverage)
            self.path_frequencies[path_id] = self.path_frequencies.get(path_id, 0) + 1
            self.input_to_pathid[current_inp] = path_id

        return result

    def create_candidate(self) -> str:
        if not self.population:
            print("Warning: Population is empty in create_candidate, falling back to default.")
            return self.seed[0] if self.seed else ""

        weights = []
        candidates = []
        for seed_inp in self.population:
            path_id = self.input_to_pathid.get(seed_inp)

            if path_id is None or path_id not in self.path_frequencies:
                frequency = 0
            else:
                frequency = self.path_frequencies[path_id]

            weight = 1.0 / (frequency + 1)
            weights.append(weight)
            candidates.append(seed_inp)

        selected_seed = random.choices(candidates, weights=weights, k=1)[0]
        candidate = selected_seed
        trials = random.randint(self.min_mutations, self.max_mutations)
        for i in range(trials):
            candidate = self.mutate(candidate)

        return candidate


In [35]:
print("--- Task 04: Scenario 1 (urlparse_that_crashes) ---")

single_seed_task4_scen1 = ["https://www.swissinfo.ch"]

num_runs_task4_scen1 = 30
trials_per_run_task4_scen1 = 5000

coverage_greybox_scen1 = []
coverage_boosted_greybox_scen1 = []

print(f"Running {num_runs_task4_scen1} runs with {trials_per_run_task4_scen1} trials each...")

print("MutationGreyboxFuzzer on urlparse_that_crashes:")
for i in range(num_runs_task4_scen1):
    runner_greybox_scen1 = FunctionCoverageRunner(urlparse_that_crashes)
    greybox_fuzzer_scen1 = MutationGreyboxFuzzer(seed=single_seed_task4_scen1)
    greybox_fuzzer_scen1.runs(runner_greybox_scen1, trials=trials_per_run_task4_scen1)
    coverage_greybox_scen1.append(len(runner_greybox_scen1.coverage()))
    print(f"{i+1}/{num_runs_task4_scen1} Coverage: {coverage_greybox_scen1[-1]}")

print("\nBoostedMutationGreyboxFuzzer on urlparse_that_crashes:")
for i in range(num_runs_task4_scen1):
    runner_boosted_scen1 = FunctionCoverageRunner(urlparse_that_crashes)
    boosted_greybox_fuzzer_scen1 = BoostedMutationGreyboxFuzzer(seed=single_seed_task4_scen1)
    boosted_greybox_fuzzer_scen1.runs(runner_boosted_scen1, trials=trials_per_run_task4_scen1)
    coverage_boosted_greybox_scen1.append(len(runner_boosted_scen1.coverage()))
    print(f"{i+1}/{num_runs_task4_scen1} Coverage: {coverage_boosted_greybox_scen1[-1]}")

print("\nMann-Whitney U Test (urlparse_that_crashes):")
u_stat_task4_scen1, p_value_task4_scen1 = mannwhitneyu(coverage_greybox_scen1, coverage_boosted_greybox_scen1, alternative='less')

print(f"u_stat (Scenario 1): {u_stat_task4_scen1}")
print(f"p_value (Scenario 1): {p_value_task4_scen1}")

if p_value_task4_scen1 < 0.05:
    print("The coverage achieved with the base MutationGreyboxFuzzer is statistically smaller than with the BoostedMutationGreyboxFuzzer on urlparse_that_crashes.")
else:
    print("No statistically significant evidence that coverage with the base MutationGreyboxFuzzer is smaller than with the BoostedMutationGreyboxFuzzer on urlparse_that_crashes.")

print(f"\nMutationGreyboxFuzzer (Scenario 1) average coverage: {sum(coverage_greybox_scen1) / num_runs_task4_scen1}")
print(f"BoostedMutationGreyboxFuzzer (Scenario 2) average coverage: {sum(coverage_boosted_greybox_scen1) / num_runs_task4_scen1}")


print("\n--- Task 04: Scenario 2 (crashme) ---")

single_seed_task4_scen2 = ["good"]

num_runs_task4_scen2 = 30
trials_per_run_task4_scen2 = 10000

coverage_greybox_scen2 = []
coverage_boosted_greybox_scen2 = []

print(f"Running {num_runs_task4_scen2} runs with {trials_per_run_task4_scen2} trials each...")

print("MutationGreyboxFuzzer on crashme:")
for i in range(num_runs_task4_scen2):
    runner_greybox_scen2 = FunctionCoverageRunner(crashme)
    greybox_fuzzer_scen2 = MutationGreyboxFuzzer(seed=single_seed_task4_scen2)
    greybox_fuzzer_scen2.runs(runner_greybox_scen2, trials=trials_per_run_task4_scen2)
    coverage_greybox_scen2.append(len(runner_greybox_scen2.coverage()))
    print(f"{i+1}/{num_runs_task4_scen2} Coverage: {coverage_greybox_scen2[-1]}")

print("\nBoostedMutationGreyboxFuzzer on crashme:")
for i in range(num_runs_task4_scen2):
    runner_boosted_scen2 = FunctionCoverageRunner(crashme)
    boosted_greybox_fuzzer_scen2 = BoostedMutationGreyboxFuzzer(seed=single_seed_task4_scen2)
    boosted_greybox_fuzzer_scen2.runs(runner_boosted_scen2, trials=trials_per_run_task4_scen2)
    coverage_boosted_greybox_scen2.append(len(runner_boosted_scen2.coverage()))
    print(f"{i+1}/{num_runs_task4_scen2} Coverage: {coverage_boosted_greybox_scen2[-1]}")

print("\nMann-Whitney U Test (crashme):")
u_stat_task4_scen2, p_value_task4_scen2 = mannwhitneyu(coverage_greybox_scen2, coverage_boosted_greybox_scen2, alternative='less')

print(f"u_stat (Scenario 2): {u_stat_task4_scen2}")
print(f"p_value (Scenario 2): {p_value_task4_scen2}")

if p_value_task4_scen2 < 0.05:
    print("The coverage achieved with the base MutationGreyboxFuzzer is statistically smaller than with the BoostedMutationGreyboxFuzzer on crashme.")
else:
    print("No statistically significant evidence that coverage with the base MutationGreyboxFuzzer is smaller than with the BoostedMutationGreyboxFuzzer on crashme.")

print(f"\nMutationGreyboxFuzzer (Scenario 2) average coverage: {sum(coverage_greybox_scen2) / num_runs_task4_scen2}")
print(f"BoostedMutationGreyboxFuzzer (Scenario 2) average coverage: {sum(coverage_boosted_greybox_scen2) / num_runs_task4_scen2}")

--- Task 04: Scenario 1 (urlparse_that_crashes) ---
Running 30 runs with 5000 trials each...
MutationGreyboxFuzzer on urlparse_that_crashes:
1/30 Coverage: 64
2/30 Coverage: 52
3/30 Coverage: 44
4/30 Coverage: 42
5/30 Coverage: 61
6/30 Coverage: 52
7/30 Coverage: 58
8/30 Coverage: 63
9/30 Coverage: 63
10/30 Coverage: 51
11/30 Coverage: 52
12/30 Coverage: 63
13/30 Coverage: 53
14/30 Coverage: 56
15/30 Coverage: 51
16/30 Coverage: 55
17/30 Coverage: 43
18/30 Coverage: 60
19/30 Coverage: 62
20/30 Coverage: 52
21/30 Coverage: 63
22/30 Coverage: 56
23/30 Coverage: 44
24/30 Coverage: 53
25/30 Coverage: 59
26/30 Coverage: 54
27/30 Coverage: 56
28/30 Coverage: 60
29/30 Coverage: 43
30/30 Coverage: 56

BoostedMutationGreyboxFuzzer on urlparse_that_crashes:
1/30 Coverage: 42
2/30 Coverage: 43
3/30 Coverage: 39
4/30 Coverage: 51
5/30 Coverage: 43
6/30 Coverage: 51
7/30 Coverage: 51
8/30 Coverage: 34
9/30 Coverage: 42
10/30 Coverage: 39
11/30 Coverage: 50
12/30 Coverage: 42
13/30 Coverage: 43
14/3