# Grammar Miner Evaluation

## Evaluation Setup

The evaluation of mining input grammar from string inclusion check and grammar mining from dynamic taints are based on 2 question.

### Accuracy

Our first question concerns the accuracy of the generated grammars which means, does our mined grammars from both techniques represent strings that would be rejected by our target program? In order to answer this question, we use the grammars produced by the two techniques as *producers*. that is, we start with the start symbol and continuously expand nonterminals according to grammar rules, then the resulting string would be fed into the target program which would either accept or reject it.

### Completeness

The next question concerns the completeness of the generated grammars which basically means to what extent does the generated grammars not contain strings that would be accepted by the test subject? In order to verify this, we used a reference grammar as a producer which would create arbitrary strings which would then be parsed by the generated grammars. 
A 100% completeness indicates that the mined grammar holds all inputs of the reference grammar.

## Evaluation Subject

The evluation subject in this presentation will be based on;
1. Python url parser [urllib/parse.py](https://github.com/python/cpython/blob/3.7/Lib/urllib/parse.py)

In [6]:
import fuzzingbook_utils

## Reference Grammar

In [7]:
REFERENCE_URL_GRAMMAR = {
    "<start>" : [
        "<url>"
    ],
    "<url>" : [
      "<scheme>://<authority><path><query>"  
    ],
    "<scheme>": [
        "http",
        "https",
        "ftp",
        "ftps"      
    ],
    "<authority>": [
        "<host>",
        "<host>:<port>",
        "<userinfo>@<host>",
        "<userinfo>@<host>:<port>"
    ],
    "<user>": [
        "user1",
        "user2",
        "user3",
        "user4",
        "user5"
    ],
    "<pass>": [
        "pass1",
        "pass2",
        "pass3",
        "pass4",
        "pass5"
    ],
    "<host>": [
        "host1",
        "host2",
        "host3",
        "host4",
        "host5"
    ],
    "<port>": [
        "<nat>"
    ],
    "<nat>": [
        "10",
        "20",
        "30",
        "40",
        "50"
    ],
    "<userinfo>": [
        "<user>:<pass>"
    ],
    "<path>": [
        "",
        "/",
        "/<id>",
        "/<id><path>"
    ],
    "<id>": [
        "folder"
    ],
    "<query>": [
        "",
        "?<params>"
    ],
    "<params>": [
        "<param>",
        "<param>&<params>"
    ],
    "<param>": [
        "<key>=<value>"
    ],
    "<key>": [
        "key1",
        "key2",
        "key3",
        "key4"
    ],
    "<value>": [
        "value1",
        "value2",
        "value3",
        "value4"
    ]
}

## Grammar Miner Experiment

In [8]:
from GrammarCoverageFuzzer import GrammarCoverageFuzzer

In [9]:
class GrammarMinerExperiment:
    def __init__(self, reference_grammar, target, **kwargs):
        self.options(kwargs)
        self.grammar = reference_grammar
        self.target_program = target
        self.max_no_samples = 1000
        
        self.gcf = GrammarCoverageFuzzer(self.grammar)
        self.generated_samples = self.generate_sample_inputs() 

In [10]:
class GrammarMinerExperiment(GrammarMinerExperiment):
    def options(self, kwargs):
        self.files = kwargs.get('files', [])
        self.methods = kwargs.get('methods', [])

In [11]:
class GrammarMinerExperiment(GrammarMinerExperiment):
    def generate_sample_inputs(self):
        return [self.gcf.fuzz() for _ in range(self.max_no_samples)]

In [12]:
from GrammarMiner import recover_grammar, recover_grammar_with_taints

In [13]:
class GrammarMinerExperiment(GrammarMinerExperiment):
    def mine_with_taint(self):
        grammar_from_taint = recover_grammar_with_taints(
            self.target_program,
            self.generated_samples,
            files=self.files,
            methods=self.methods)

        return grammar_from_taint

In [14]:
class GrammarMinerExperiment(GrammarMinerExperiment):
    def mine_with_substr_search(self):
        grammar_from_substr = recover_grammar(self.target_program,
                                              self.generated_samples,
                                              files=self.files,
                                              methods=self.methods)
        return grammar_from_substr

In [15]:
from urllib.parse import urlparse

In [16]:
ex = GrammarMinerExperiment(REFERENCE_URL_GRAMMAR,
                            urlparse,
                            methods=['urlparse'],
                            files=['urllib/parse.py'])

grammar_from_taint = ex.mine_with_taint()
grammar_from_str = ex.mine_with_substr_search()

## Grammar Miner Evaluator

In [17]:
class GrammarMinerEvaluator:
    def __init__(self, target, r_grammar):
        self.max_no_inputs = 1000
        self.target_program = target
        self.grammar = r_grammar

        self.gcf = self.create_gcf_instance(self.grammar)
        self.samples = [self.gcf.fuzz() for i in range(self.max_no_inputs)]

In [18]:
class GrammarMinerEvaluator(GrammarMinerEvaluator):
    def create_gcf_instance(self, grammar):
        return GrammarCoverageFuzzer(grammar)

In [19]:
class GrammarMinerEvaluator(GrammarMinerEvaluator):
    def url_parser_accuracy_test(self, mined_grammar):
        gcf = self.create_gcf_instance(mined_grammar)
        schemes = ['ftps', 'http', 'https', 'ftp']
        rejected = []

        for i in range(self.max_no_inputs):
            s = gcf.fuzz()
            parsed_url = urlparse(s)
            if parsed_url.scheme not in schemes or not bool(parsed_url.netloc):
                rejected.append(s)
        return (((self.max_no_inputs - len(rejected)) / self.max_no_inputs) *
                100)

In [20]:
from Parser import EarleyParser

In [21]:
class GrammarMinerEvaluator(GrammarMinerEvaluator):
    def completeness_test(self, mined_grammar):
        rejected = []
        parser = EarleyParser(mined_grammar)
        for url in self.samples:
            try:
                tree, *_ = parser.parse(url)
            except SyntaxError:
                rejected.append(url)
        return (((self.max_no_inputs - len(rejected)) / self.max_no_inputs) *
                100)

In [22]:
results = {}
ev = GrammarMinerEvaluator(urlparse, REFERENCE_URL_GRAMMAR)

In [23]:
taint_accuracy = ev.url_parser_accuracy_test(grammar_from_taint)
substr_search_accuracy = ev.url_parser_accuracy_test(grammar_from_str)

In [24]:
taint_completeness = ev.completeness_test(grammar_from_taint)
substr_search_completeness = ev.completeness_test(grammar_from_str)

In [25]:
results['Grammar_For_String_Inclusion'] = (substr_search_accuracy, substr_search_completeness)
results['Grammar_From_Taint'] = (taint_accuracy, taint_completeness)

## MicroJson

In [29]:
import json

In [30]:
js = ['{"name": "John"}']
recover_grammar(json.loads, js, files=['__init__.py', 'decoder.py'])

{'<start>': ['<raw_decode@343:s>'],
 '<raw_decode@343:s>': ['{"name": "<decode@337:obj.name>"}'],
 '<decode@337:obj.name>': ['John']}

## Results

In [21]:
from IPython.display import HTML, display

In [22]:
def show_table(keys, a, c, title):
    keys = [k for k in keys if k in a and k in c and a[k] and c[k]]
    tbl = ['<tr>%s</tr>' % ''.join(["<th>%s</th>" % k for k in ['<b>%s</b>' % title, 'Accuracy (%)', 'Completeness (%)']])]
    for k in keys:
        h_c = "<td>%s</td>" % k
        a_c = "<td>%s</td>" % a.get(k, ('', 0))[0]
        m_c = "<td>%s</td>" % c.get(k, ('', 0))[1]
        tbl.append('<tr>%s</tr>' % ''.join([h_c, a_c, m_c]))
    return display(HTML('<table>%s</table>' % '\n'.join(tbl)))

In [23]:
show_table(results.keys(), results, results, 'Grammar Mining Techniques')

Grammar Mining Techniques,Accuracy (%),Completeness (%)
Grammar_For_String_Inclusion,100.0,60.8
Grammar_From_Taint,100.0,60.8


In the experiment demonstrated above, the generated grammar both techniques proves to be accurate, out of 1000 urls it produces, all were valid urls which are been accepted by the test subject. Also, in terms of completeness, the grammars produced from both techniques could only parse about 60% (which is 608 urls out of 1000) of the samples generated by the reference grammar this is as a result of the grammar being too specific which means it could only parse similar strings which it can produce itself.