# Experimenting with Code-Editing Prompts for GPT-4

This notebook demonstrates how to experiment with prompts for a code-completion (multi-line edit) model using OpenAI's GPT-4 (or similar). We’ll:
1. Create a system prompt that instructs the model how to generate or respond with code diffs.
2. Provide a user prompt that contains the XML tags we decided upon.
3. Inspect and tweak the outputs.

## Setup
Install `openai` if you haven't already:
```
pip install openai
```
Or ensure your environment is configured to run OpenAI API calls.

In [1]:
from openai import OpenAI
import os

# Replace this with your key or set OPENAI_API_KEY in your environment.
client = OpenAI(api_key="")

## 1. Define Your Prompts

We’ll define:
- A **system prompt** that sets the role and instructions (you can adapt this depending on your approach).
- A **user prompt** that follows the XML structure we discussed.

You can toggle these prompts to generate synthetic data *or* to test the actual code-completion suggestions at inference time.

### Example: Generating Synthetic Data
Below is a system prompt that instructs the model to generate code context, user edit history (as diffs), and a suggested diff.


In [2]:
system_prompt_synthetic = """\
Generate synthetic coding scenarios and propose plausible editing suggestions by producing XML that includes code context, recent diffs, and a new suggested diff for a continuous editing pattern.
"""

In [3]:
user_prompt_synthetic = """\
You are an AI code-editing assistant. Your job is to generate realistic, synthetic code-editing scenarios in an XML-based format that captures both:

The context of the code near a cursor.
The history of recent diffs made to the code.
A new suggested diff that follows logically from the history.
Focus on accuracy, consistency, and plausibility in your XML, ensuring the diffs reflect a believable code evolution process.

Output Structure
Your XML output must consist of two main elements: <Request> and <Response>.

<Request>

<ContextCursor>: Include a short, relevant section of code surrounding the editing cursor.
<HistoryDiffs>: Provide one or more <Diff> elements, each showing minimal changes made in the recent edits, using unified diff style (lines removed with -, lines added with +).
<Response>

<SuggestedDiff>: Propose a new unified diff in the same style. This diff must:
Show both the original lines (-) and the new lines (+).
Logically extend the editing pattern demonstrated in <HistoryDiffs>.
Be minimal but plausible (i.e., it should look like a realistic continuation of the code edits).
Requirements and Constraints
Code Language: Choose a popular programming language (e.g., Java, Python, JavaScript, C#, etc.).
Plausibility: Ensure the code snippets, diffs, and variable/method names look realistic and consistent.
Cursor Placement: Mark or imply the cursor position in <ContextCursor> in a way that suggests where the changes are occurring.
Diff Format:
Use Aiders unified diff style:
Each diff hunk should start with @@ lines (e.g., @@ - line_of_code / @@ + line_of_code) or a small chunk of context showing removed lines (-) and added lines (+).
Keep diffs minimal. Only show lines that have changed, removed, or added.
Logical Flow: The <SuggestedDiff> should naturally arise from the changes shown in the <HistoryDiffs>—e.g., continuing to rename a variable, refactor a method call, or fix a bug, etc.
Accurate Syntax: Maintain valid syntax for the chosen language and accurate diff syntax (start lines to be removed with - and lines to be added with +).
Detailed Example
Below is a sample structure; your actual scenarios can be more extensive, but follow the same pattern:

xml
Copy code
<Request>
  <ContextCursor>
    public void calculateDiscountedTotal() {
      int total = 0;
      for (int i = 0; i < cartItems.length; i++) {
        total += cartItems[i].getPrice();
        // [Cursor Position]
      }
    }
  </ContextCursor>

  <HistoryDiffs>
    <Diff>
      @@ ... @@
    public void calculateDiscountedTotal() {
      int total = 0;
      for (int i = 0; i < cartItems.length; i++) {
        total += cartItems[i].getPrice();
        // [Cursor Position]
      }
    }
           cartitems[i].getprice()
           cartitems[i].applydiscount()
    </Diff>
    <Diff>
      @@ - int total = 0;
      @@ + double total = 0;
    </Diff>
  </HistoryDiffs>
</Request>

<Response>
  <SuggestedDiff>
    @@ - total += cartItems[i].applyDiscount();
    @@ + total += applyTax(cartItems[i].applyDiscount());
  </SuggestedDiff>
</Response>
Explanation of the Example
<ContextCursor>: Shows a Java snippet with a loop summing up item prices. The cursor suggests we’re about to change or add code in this loop body.
<HistoryDiffs>: Two diffs indicate that the user first switched from cartItems[i].getPrice() to cartItems[i].applyDiscount(), and then changed int total = 0; to double total = 0;, reflecting a pattern of adjusting code for more complex pricing.
<SuggestedDiff>: Continues that logical pattern by suggesting a further modification to the discount logic, such as adding applyTax().
Prompt Instructions
Generate a <Request>:

Write a short code snippet that needs editing in <ContextCursor>.
Provide one or more <Diff> elements within <HistoryDiffs> reflecting recent plausible changes.
Generate a <Response>:

Within <SuggestedDiff>, propose a single new unified diff.
Show how the code should change next by including both the lines to remove (-) and the lines to add (+).
The new diff should reasonably build on the changes in <HistoryDiffs>.
Output:

Your final XML must be well-formed and valid.
Adhere to the same style used in the example: minimal but clear.
Key Goals:

Produce synthetic code that appears realistic.
Reflect an editing “story” where each diff incrementally transforms the code.
Maintain consistent variable and method names, data types, and function calls.
Use a clear, unified diff format to show exactly which lines changed.

"""

## 2. Calling GPT-4 (or GPT 3.5+) with These Prompts
Below, we make a completion call to OpenAI. The model should return a chunk of XML that you can parse or store.

In [4]:
def call_openai_api(system_prompt, user_prompt, model="gpt-4o", temperature=0.7):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=temperature,
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        print("Error calling OpenAI API:", e)
        return None

# Example usage:
synthetic_example = call_openai_api(system_prompt_synthetic, user_prompt_synthetic)
print(synthetic_example)

```xml
<Request>
  <ContextCursor>
    public class ShoppingCart {
      private List<Item> items;
      
      public ShoppingCart() {
        items = new ArrayList<>();
      }

      public void addItem(Item item) {
        items.add(item);
      }

      public double calculateTotal() {
        double total = 0;
        for (Item item : items) {
          total += item.getPrice();
          // [Cursor Position]
        }
        return total;
      }
    }
  </ContextCursor>

  <HistoryDiffs>
    <Diff>
      @@ - private List<Item> items;
      @@ + private List<Item> items = new ArrayList<>();
    </Diff>
    <Diff>
      @@ - total += item.getPrice();
      @@ + total += item.getDiscountedPrice();
    </Diff>
  </HistoryDiffs>
</Request>

<Response>
  <SuggestedDiff>
    @@ - total += item.getDiscountedPrice();
    @@ + total += applyTax(item.getDiscountedPrice());
  </SuggestedDiff>
</Response>
```

Explanation:

- **<ContextCursor>**: Displays a Java class `ShoppingCart` with 

### Inspect the Output
You'll see an XML structure with `<Request>` and `<Response>`.

## 3. Testing Inference-Time Code Completion
Suppose you want to test an actual code-completion scenario. We can create a different prompt that includes some partial context and a real or dummy `<HistoryDiffs>` block. Then we ask GPT-4 for a `<SuggestedDiff>`.


In [None]:
# Here’s a minimal example of how you might query GPT-4 for an inference-time suggestion:

system_prompt_inference = """\
<SystemPrompt>
You are a code-completion assistant. You see code context before and after a cursor, 
and a list of minimal diffs representing recent edits. Provide a <SuggestedDiff> in XML format, 
showing how you would continue or extend these edits.
</SystemPrompt>
"""

user_prompt_inference = """\
<Request>
  <ContextBefore>
# No code lines before
  </ContextBefore>

  <ContextCursor>
one="one"
two="TWO"
three="THREE"
  </ContextCursor>

  <ContextAfter>
# No code lines after
  </ContextAfter>

  <HistoryDiffs>
    <Diff>
- one="ONE"
+ one="one"
    </Diff>
  </HistoryDiffs>
</Request>
"""
inference_response = call_openai_api(system_prompt_inference, user_prompt_inference)
print(inference_response)

## 4. Next Steps
- You can tweak the system prompt to clarify style, length, or detail in the `<SuggestedDiff>`.
- You can parse the returned XML using standard Python libraries (like `xml.etree.ElementTree`).
- If you’re generating training data, you can store each `<Request>` and `<Response>` pair in a dataset.
- If you’re doing real-time inference, you would feed the code editor’s context and the diffs of user changes into the `<Request>` block, then patch the file with `<SuggestedDiff>` lines.

This notebook is just a starting point. Adapt it to your pipeline as needed!