# R2E Usage

This notebook provides a quickstart usage guide for R2E. It describes how to use the R2E's ([CLI](CLI.md)) to:
1. setup and extract functions from repositories
2. build and install repositories
3. generate and execute **Equivalence Tests** for the functions

Finally, it provides example use-cases of R2E such as building code generation benchmarks for LLMs and Agents.

#### 1. Setup and Extract

First, choose a unique experiment id (e.g., `quickstart`) that you can reuse for the entire workflow. Then setup repositories and extract functions from:

In [None]:
! r2e setup -r https://github.com/google-research/python-graphs
! r2e extract -e quickstart --overwrite_extracted

#### 2. Build and Install

**Docker Mode:** By default, all repos in REPOS_DIR are installed in a Docker image for sandboxed execution. Find the generated dockerfile in REPOS_DIR.

**Local Mode:** Use `--local` which will suggest the steps ***you need to take to manually*** to install repos.

In [None]:
! r2e build -e quickstart --local

#### 3. Generate and Execute Tests

R2E provides a single command that runs a series of `k` generate-execute rounds w/ feedback. The loop continues until `min_valid`% functions reach a `min_cov`% branch coverage. Defaults: `k=3`, `min_valid=0.8`, and `min_cov=0.8`. 

Generated tests are executed in the Docker container. Use `--local` to execute locally.
You can also run `r2e generate` and `r2e execute` separately ([cli.md](./CLI.md)).

In [None]:
! r2e genexec -e quickstart --local --save_chat

Let's take a look at the results:

In [None]:
! r2e show -e quickstart --summary

Let's take a look at one of the functions extracted from the repositories:

In [None]:
! r2e show -e quickstart -f cyclomatic_complexity --show-all

## Use Case: Coding Benchmarks

In [1]:
from r2e.paths import *
from r2e.utils.data import *
from r2e.execution.helpers import run_fut_with_port, check_equiv
from r2e.execution.service import ServiceManager

Select a function you want to evaluate with

In [56]:
fname = "make_node_from_ast_node"
futs = load_functions_under_test(EXECUTION_DIR / "quickstart_out.json")
fut = next(f for f in futs if f.name == fname)
# print(fut.context.context)  # uncomment to see the context

Create a codegen problem by removing the body of the function

In [58]:
def remove_body(context, function_name):
    import re

    def replacement(match):
        signature = match.group(1)
        docstring = match.group(2) or ""
        return f"{signature}{docstring}pass\n"

    pattern = rf'(def\s+{re.escape(function_name)}\s*\([^)]*\):)(\s*"""[\s\S]*?"""\s*)?([^@]+)'
    modified_context = re.sub(pattern, replacement, context, flags=re.DOTALL)
    return modified_context


prob = remove_body(fut.context.context, fname)
# print(prob) # uncomment to see the problem

Evaluate an LLM's ability to complete the function body (i.e., solve the problem)

In [79]:
from openai import OpenAI
from r2e.generators.testgen.utils import extract_codeblock

client = OpenAI(api_key=os.getenv("OPENAI_KEY"))

PROMPT = (
    "You are given a function signature, docstring, and some context.\n\n{prob}\n"
    "Complete the body of {fname} and return the entire function as\n```python\n# YOUR CODE HERE\n```"
)

def get_completion(prompt):
    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a Python programming expert."},
            {"role": "user", "content": prompt},
        ],
        model="gpt-4o",
        temperature=0.2,
    )
    return extract_codeblock(response.choices[0].message.content)

code = get_completion(PROMPT.format(prob=prob, fname=fname))


Run equivalence test to check if the generated code is correct:

In [None]:
valid, tb, res = check_equiv(code, fut, port=3006, local=True, reuse_port=True)

print("Validity:", valid)
print("Traceback:\n", tb)

Evaluate **Agents** with execution feedback too:

In [81]:
def Agent(prob, fut, fname, max_attempts=5):
    prompt = PROMPT.format(prob=prob, fname=fname)

    for attempt in range(max_attempts):
        code = get_completion(prompt)
        print(f"Attempt {attempt + 1}:\n{code}")
        valid, tb, res = check_equiv(code, fut, port=3006, local=True, reuse_port=True)

        if valid:
            print(f"✅ Solution found on attempt {attempt + 1}\n")
            return json.dumps(res, indent=2)

        print(f"❌ Attempt {attempt + 1} failed. Refining based on feedback.\n")
        prompt += f"\n\nThe previous attempt failed.\nHere's your previous attempt: {code}\n\nHere's the error:\n{tb}\nPlease fix the code and try again."

    return f"❌ Failed to find a valid solution after {max_attempts} attempts."

In [None]:
res = Agent(prob, fut, fname)
print(f"Result:\n{res}")