## Knownski Challenge – Starter Walkthrough

This notebook demonstrates a baseline approach to tackling the Knownski challenge. The goal is to build an AI agent capable of:

1. **Understanding a GitHub issue and context** from a structured dataset.
2. **Generating a code patch** that addresses the described issue.
3. **Applying the patch** to the relevant repository and commit.
4. **Running targeted tests** to confirm that failing tests now pass and previously passing tests remain green.

### Workflow Overview

1. **Load and Inspect Dataset**
   The dataset contains GitHub issues along with associated metadata:

   * `repo` – Repository owner/name.
   * `base_commit` – Commit SHA to apply the patch on.
   * `problem_statement` – Description of the bug or feature request.
   * `patch` – Ground-truth patch for training.
   * `FAIL_TO_PASS` – Tests expected to fail before the fix.
   * `PASS_TO_PASS` – Tests that should remain passing after the fix.

2. **Model Preparation**
   A code generation model (e.g., `unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit`) is fine-tuned on issue–patch pairs using supervised fine-tuning (SFT).

   * LoRA adapters are used to make fine-tuning efficient.
   * Training is performed on the full set without splitting into validation if the dataset is very small.

3. **Patch Generation**
   The fine-tuned model is prompted with:

   ```
   ### GitHub Issue:
   {problem_statement}

   ### Patch:
   ```

   The generated output after `### Patch:` is used as the candidate fix.

4. **Patch Application & Test Execution**

   * Clone the target repository (skipping if already cloned locally).
   * Checkout the specified `base_commit`.
   * Apply the generated patch.
   * Install dependencies and run the `FAIL_TO_PASS` tests (should now pass) and `PASS_TO_PASS` tests (should still pass).

5. **Iterating and Improving**
   While this notebook shows a simple end-to-end baseline, improvements may include:

   * Better prompt engineering for patch generation.
   * Incorporating retrieval of repository context before generation.
   * Using reinforcement learning from test results to refine the model.




In [None]:
import pandas as pd
import subprocess
import sys
import os
import tempfile
from types import SimpleNamespace
from pathlib import Path
from datasets import Dataset
import torch
from unsloth import FastLanguageModel
from datasets import Dataset, DatasetDict
from transformers import TrainingArguments
from peft import LoraConfig
from trl import SFTTrainer
from transformers import pipeline

import wandb
wandb.run = None

# Passed and failed test content for a github issue

In [None]:
df = pd.read_parquet("/home/selu/Downloads/Interview_projects/knownski_challenge/data/data.parquet")
inst = df.query("instance_id=='pylint-dev__astroid-2496'").iloc[0]
instr = inst.problem_statement
patch = inst.patch
print(instr, patch, inst.FAIL_TO_PASS, inst.PASS_TO_PASS)


# Fine-tune coding model with training data

In [None]:
examples = [
    {"text": "### Issue:\n" + row["problem_statement"] + "\n### Patch:\n" + row["patch"]}
    for _, row in df.iterrows()
]

ds_all = Dataset.from_list(examples)
ds = DatasetDict({"train": ds_all})  # no split

model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit",  # ASCII hyphens only
    max_seq_length=1024,
    load_in_4bit=True,
    dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float16
)

model.gradient_checkpointing_enable()
model = FastLanguageModel.get_peft_model(
    model,
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"]
)



def tokenize_fn(ex):
    tok = tokenizer(ex["text"], truncation=True, max_length=1024)
    tok["labels"] = tok.input_ids.copy()
    return tok

tok = ds.map(tokenize_fn, batched=False)
args = TrainingArguments(
    output_dir="/home/selu/Downloads/Interview_projects/knownski_challenge/patch_agent",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=20,
    save_steps=500,
    report_to="none"
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tok["train"],
    eval_dataset=None,
    dataset_text_field="text",
    max_seq_length=1024,
    packing=True,
    args=args
)

trainer.train()

# Test runner & Patch applier

In [None]:
def run(cmd_list, cwd=None):
    """Run a command as a list so special characters in test names don't break."""
    print("> " + " ".join(cmd_list))
    result = subprocess.run(cmd_list, cwd=cwd)
    if result.returncode != 0:
        print(f"Command failed with exit code {result.returncode}")
        sys.exit(result.returncode)

def simplify_tests(test_list):
    """Return just the test file paths from full pytest node IDs."""
    return sorted({t.split("::")[0] for t in test_list})

def apply_instance(inst):
    repo_url = f"https://github.com/{inst.repo}.git"
    base = inst.base_commit
    patch_text = inst.patch

    with tempfile.TemporaryDirectory() as tmp:
        os.chdir(tmp)

        # Only clone if repo folder doesn't already exist
        if not os.path.exists("repo"):
            run(["git", "clone", repo_url, "repo"])
        else:
            print("> Repo already cloned, skipping clone step.")

        os.chdir("repo")
        run(["git", "checkout", base])

        # write and apply patch (ensure it ends with a newline)
        with open("fix.patch", "w") as f:
            f.write(patch_text if patch_text.endswith("\n") else patch_text + "\n")

        # try a dry-run before applying
        dry_run = subprocess.run(["git", "apply", "--check", "fix.patch"])
        if dry_run.returncode != 0:
            print("Patch is corrupt or context mismatch.")
            sys.exit(1)

        run(["git", "apply", "--recount", "--ignore-whitespace", "fix.patch"])

        # install requirements and pytest
        run([sys.executable, "-m", "pip", "install", "pytest"])
        if os.path.exists("requirements.txt"):
            run([sys.executable, "-m", "pip", "install", "-r", "requirements.txt"])
        else:
            run([sys.executable, "-m", "pip", "install", "."])

        # run tests by file
        fail_files = simplify_tests(inst.FAIL_TO_PASS)
        pass_files = simplify_tests(inst.PASS_TO_PASS)

        print("\n Running FAIL_TO_PASS tests (should now pass):")
        run([sys.executable, "-m", "pytest"] + fail_files)

        print("\n Running PASS_TO_PASS tests (should still pass):")
        run([sys.executable, "-m", "pytest"] + pass_files)

        print("\n All tests passed — patch applied cleanly.")


# Applying the patch

In [None]:
# Setup generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

def generate_patch(issue_text: str) -> str:
    prompt = f"### GitHub Issue:\n{issue_text}\n\n### Patch:\n"
    result = pipe(prompt, max_new_tokens=300, do_sample=True, top_p=0.9, temperature=0.8,
                  pad_token_id=tokenizer.eos_token_id)
    return result[0]["generated_text"].split("### Patch:\n", 1)[-1].strip()

target = df.query("instance_id == 'pylint-dev__astroid-2496'").iloc[0]

row = SimpleNamespace(
    repo=target.repo,
    base_commit=target.base_commit,
    patch=target.patch,
    FAIL_TO_PASS=list(target.FAIL_TO_PASS),
    PASS_TO_PASS=list(target.PASS_TO_PASS)
)

apply_instance(row)


## Positioning This as a Foundation for the 7‑Month Knownski Challenge

The Knownski challenge runs over a 7‑month period, providing ample time to experiment, refine, and scale the solution.

This notebook represents a **foundation** — a minimal, functional pipeline that can:

* Load Knownski issue–patch data.
* Fine-tune a code generation model on these examples.
* Generate candidate patches.
* Apply patches to the correct repository and commit.
* Validate correctness via targeted tests.

If I were starting the 7‑month journey, my **next immediate steps** from this foundation would be:

1. **Expand the Training Data**

   * Extract more issue–patch pairs from additional open-source repositories.
   * Include variations and related bug contexts to improve generalization.

2. **Use a More Powerful Base Model**

   * Upgrade from the current 3B model to a larger, code-specialized model (e.g., 7B, 13B, or beyond), while still supporting LoRA fine-tuning for efficiency.

3. **Advanced Fine-Tuning**

   * Explore multi-turn fine-tuning with retrieval of relevant file context.
   * Introduce reinforcement learning using test results as feedback (e.g., PPO).

This notebook is **not** the final solution — it is the **launchpad**. It shows that the pipeline works end-to-end, and from here, the challenge becomes one of **scaling, improving accuracy, and automating context retrieval**.
