# Dafny Demo: LLM-Assisted Verification

**Dafny** is a verification-aware programming language. You write code with annotations (preconditions, postconditions, loop invariants), and Dafny proves your code satisfies them.

In this notebook, we'll:
1. Show what verified Dafny code looks like
2. Give an LLM unannotated code and ask it to add annotations
3. Run the verification loop until it passes

## Why Dafny?

Dafny is a sweet spot for LLM + verification:
- Syntax is familiar (C-like)
- Annotations are inline with code
- Error messages are specific and actionable
- The [dafny-annotator](https://dafny.org/blog/2025/06/21/dafny-annotator/) tool showed LLMs can do this well

In [None]:
import sys
sys.path.insert(0, '..')

from src.llm_client import LLMClient
from src.verifiers import DafnyVerifier
from rich import print as rprint
from rich.syntax import Syntax
from rich.panel import Panel

client = LLMClient()
verifier = DafnyVerifier()

print("‚úÖ Client and verifier ready")

## Example 1: Binary Search

Binary search is a classic verification challenge. We need to prove:
1. We never access out-of-bounds indices
2. If we return an index, the element at that index equals the key
3. If we return -1, the key isn't in the array

### The Target: Fully Annotated Code

In [None]:
with open('../examples/dafny/binary_search.dfy') as f:
    verified_code = f.read()

rprint(Panel(Syntax(verified_code, "csharp", theme="monokai"), title="‚úÖ Verified Binary Search"))

In [None]:
# Verify it actually passes
result = verifier.verify(verified_code)
print(f"Verification: {'‚úÖ PASSED' if result.success else '‚ùå FAILED'}")
if not result.success:
    print(result.error)

### The Challenge: Skeleton Code

Now let's see the same code *without* annotations. This is what we'll ask the LLM to fix.

In [None]:
with open('../examples/dafny/binary_search_skeleton.dfy') as f:
    skeleton_code = f.read()

rprint(Panel(Syntax(skeleton_code, "csharp", theme="monokai"), title="‚ö†Ô∏è Unannotated Skeleton"))

In [None]:
# This should fail verification
result = verifier.verify(skeleton_code)
print(f"Verification: {'‚úÖ PASSED' if result.success else '‚ùå FAILED (expected)'}")
print(f"\nErrors:\n{result.error}")

## The Verification Loop

Now we run the core loop:
1. Ask LLM to add annotations
2. Try to verify
3. If it fails, show the error to the LLM
4. Repeat until success or max attempts

In [None]:
def run_verification_loop(skeleton: str, max_attempts: int = 5):
    """Run the LLM + verifier loop."""
    code = skeleton
    
    for attempt in range(1, max_attempts + 1):
        print(f"\n{'='*60}")
        print(f"Attempt {attempt}/{max_attempts}")
        print('='*60)
        
        # Ask LLM to generate/fix annotations
        if attempt == 1:
            print("\nüì§ Asking LLM to add annotations...")
            response = client.generate_dafny_annotations(code)
        else:
            print(f"\nüì§ Asking LLM to fix based on error...")
            response = client.generate_dafny_annotations(code, error=result.error)
        
        code = client._extract_code(response.content)
        print(f"   Tokens used: {response.input_tokens + response.output_tokens}")
        
        # Show the generated code
        rprint(Panel(Syntax(code, "csharp", theme="monokai", line_numbers=True), 
                     title=f"LLM Output (Attempt {attempt})"))
        
        # Verify
        print("\nüîç Running Dafny verifier...")
        result = verifier.verify(code)
        
        if result.success:
            print("\n‚úÖ VERIFICATION PASSED!")
            return code, attempt
        else:
            print(f"\n‚ùå Verification failed:")
            print(result.error)
    
    print(f"\n‚ö†Ô∏è Max attempts ({max_attempts}) reached without success")
    return code, max_attempts

In [None]:
# Run the loop!
final_code, attempts = run_verification_loop(skeleton_code)

## Example 2: Find Maximum

Let's try another classic: finding the maximum element in an array.

In [None]:
with open('../examples/dafny/max_array_skeleton.dfy') as f:
    max_skeleton = f.read()

rprint(Panel(Syntax(max_skeleton, "csharp", theme="monokai"), title="FindMax Skeleton"))

In [None]:
final_code, attempts = run_verification_loop(max_skeleton)

## Key Observations

### What Makes This Work

1. **Precise error messages**: Dafny tells us exactly which assertion failed and where
2. **Structured task**: Adding annotations is well-defined‚Äîthe LLM doesn't need to reinvent the algorithm
3. **Objective success criteria**: Either it verifies or it doesn't‚Äîno ambiguity

### Failure Modes

Sometimes the LLM:
- Adds overly complex invariants that are correct but hard to prove
- Gets stuck in a loop trying slight variations
- Modifies the implementation instead of just adding annotations

### FM-ALPACA Findings

The [FM-ALPACA paper](https://arxiv.org/abs/2501.16207) found:
- LLMs perform best on "proof generation" tasks (what we're doing here)
- Fine-tuned models significantly outperform base models
- Providing context from the same file improves results

## Next: Lean4

Continue to [03-lean4-demo.ipynb](03-lean4-demo.ipynb) to see the same problem tackled with a theorem prover.