### Failure analysis
In this notebook, we analyze failure cases of the model (examples where model consistently fails), and try to make the model learn by giving similar examples in the prompt.
We will use GPT-3.5, no chain-of-thought.

In [1]:
import evaluation

In [2]:
ev1 = evaluation.Evaluation("config.json", "gpt-3.5-turbo-1106")
ev1.run_example("merge", k=3, n=2)

Running example: merge ; attempt: 1 ; error depth: 1
Generated program from GPT:
def merge(head1: Optional[Node], head2: Optional[Node]) -> Optional[Node]:
    """Merges two sorted lists."""
    Requires(Implies(head1 is not None, is_list(head1)))
    Requires(Implies(head2 is not None, is_list(head2)))
    Ensures(Implies(Result() is not None, is_list(Result())))
    if head1 is None:
        return head2
    if head2 is None:
        return head1
    Unfold(is_list(head1))
    Unfold(is_list(head2))
    if head1.val < head2.val:
        head1.next = merge(head1.next, head2)
        Fold(is_list(head1))
        return head1
    head2.next = merge(head1, head2.next)
    Fold(is_list(head2))
    return head2
response ['', 'Verification failed', 'Errors:', 'The precondition of method merge might not hold. There might be insufficient permission to access is_list(head2). (tmp.py@33.21)', 'Verification took 2.15 seconds.']
Verification result:
 Verification failed: The precondition of metho

(False, None)

A pattern of error is Unfold before Unfolding the same node. We craft a minimal example that demonstrates this error.
```python
def example_err_1(head: Node) -> str:
    "Unfold before Unfolding the same node"
    Requires(is_list(head))
    Unfold(is_list(head))
    if Unfolding(is_list(head), head.next) is None:
        return "Singleton"
    return "Multiple"
```
The fix is simply to remove the Unfold statement. We added this example to `config.json` and updatd the verification error pickle (by running the script in scripts.ipynb). Now we analyze if this resolves the error.

In [5]:
ev1 = evaluation.Evaluation("config.json", "gpt-3.5-turbo-1106")
ev1.run_example("merge", k=3, n=3)

Running example: merge ; attempt: 1 ; error depth: 1
Generated program from GPT:
def merge(head1: Optional[Node], head2: Optional[Node]) -> Optional[Node]:
    """Merges two sorted lists."""
    Requires(Implies(head1 is not None, is_list(head1)))
    Requires(Implies(head2 is not None, is_list(head2)))
    Ensures(Implies(Result() is not None, is_list(Result)))
    if head1 is None:
        return head2
    if head2 is None:
        return head1
    Unfold(is_list(head1))
    Unfold(is_list(head2))
    if head1.val < head2.val:
        head1.next = merge(head1.next, head2)
        Fold(is_list(head1))
        return head1
    Fold(is_list(head1))
    head2.next = merge(head1, head2.next)
    Fold(is_list(head2))
    return head2
response ['', 'Translation failed', 'Type error: Argument 1 to "is_list" has incompatible type "Callable[[], Any]"; expected "Node" (/home/omkar/ethz/hs23/thesis/llms-for-verified-programs/nagini_examples/tmp.py@25.0)']
Verification result:
 Translation failed

(True, (3, 3))

In [9]:
ev2 = evaluation.Evaluation("config_with_sorting.json", "gpt-4")
ev2.run_example("insertion_sort", k=3, n=3)

Running example: insertion_sort ; attempt: 1 ; error depth: 1
Generated program from GPT:
def insertion_sort(head: Node) -> Node:
    """Sorts the list using insertion sort."""
    Requires(is_list(head))
    Ensures(is_list(Result()))

    Unfold(is_list(head))
    if head.next is None:
        Fold(is_list(head))
        return head
    sorted = insertion_sort(head.next)
    head.next = None
    Fold(is_list(head))
    head = insert(head, sorted)
    return head


def insert(node: Node, head: Optional[Node]) -> Node:
    """Inserts the given node into the sorted list."""
    Requires(is_list(node))
    Requires(Implies(head is not None, is_list(head)))
    Ensures(is_list(Result()))

    if head is None:
        return node
    Unfold(is_list(node))
    if node.val < Unfolding(is_list(head), head.val):
        Fold(is_list(node))
        node.next = head
        Fold(is_list(node))
        return node
    Unfold(is_list(head))
    head.next = insert(node, head.next)
    Fold(is_list(

(False, None)

In [6]:
ev3 = evaluation.Evaluation("config.json", "gpt-3.5-turbo-1106")
ev3.run_eval(k=3, n=3)

Running example: prepend ; attempt: 1 ; error depth: 1
Generated program from GPT:
The given program is already correct and it has successfully verified.
response ['', 'Translation failed', 'Type error: invalid syntax (/home/omkar/ethz/hs23/thesis/llms-for-verified-programs/nagini_examples/tmp.py@21.0)']
Verification result:
 Translation failed: Type error: invalid syntax at line 1.0 


Running example: prepend ; attempt: 1 ; error depth: 2
Generated program from GPT:
Apologies for the misunderstanding. It seems that no modifications are needed for this program as it has already been correctly implemented and successfully verified.
response ['', 'Translation failed', 'Type error: invalid syntax (/home/omkar/ethz/hs23/thesis/llms-for-verified-programs/nagini_examples/tmp.py@21.0)']
Verification result:
 Translation failed: Type error: invalid syntax at line 1.0 


Running example: prepend ; attempt: 1 ; error depth: 3
Generated program from GPT:
I see, thank you for the clarification. I

EvalResult(results={'prepend': False, 'append': False, 'find': True, 'find_iter': False, 'remove': True, 'join_lists': True, 'reverse': False, 'merge': True, 'example_err_1': True}, verified_at={'find': (3, 1), 'remove': (2, 3), 'join_lists': (1, 1), 'merge': (2, 1), 'example_err_1': (1, 1)})