## Knownski Challenge â€“ Starter Walkthrough

This notebook demonstrates a baseline approach to tackling the Knownski challenge. The goal is to build an AI agent capable of:

1. **Understanding a GitHub issue and context** from a structured dataset.
2. **Generating a code patch** that addresses the described issue.
3. **Applying the patch** to the relevant repository and commit.
4. **Running targeted tests** to confirm that failing tests now pass and previously passing tests remain green.

### Workflow Overview

1. **Load and Inspect Dataset**
   The dataset contains GitHub issues along with associated metadata:

   * `repo` â€“ Repository owner/name.
   * `base_commit` â€“ Commit SHA to apply the patch on.
   * `problem_statement` â€“ Description of the bug or feature request.
   * `patch` â€“ Ground-truth patch for training.
   * `FAIL_TO_PASS` â€“ Tests expected to fail before the fix.
   * `PASS_TO_PASS` â€“ Tests that should remain passing after the fix.

2. **Model Preparation**
   A code generation model (e.g., `unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit`) is fine-tuned on issueâ€“patch pairs using supervised fine-tuning (SFT).

   * LoRA adapters are used to make fine-tuning efficient.
   * Training is performed on the full set without splitting into validation if the dataset is very small.

3. **Patch Generation**
   The fine-tuned model is prompted with:

   ```
   ### GitHub Issue:
   {problem_statement}

   ### Patch:
   ```

   The generated output after `### Patch:` is used as the candidate fix.

4. **Patch Application & Test Execution**

   * Clone the target repository (skipping if already cloned locally).
   * Checkout the specified `base_commit`.
   * Apply the generated patch.
   * Install dependencies and run the `FAIL_TO_PASS` tests (should now pass) and `PASS_TO_PASS` tests (should still pass).

5. **Iterating and Improving**
   While this notebook shows a simple end-to-end baseline, improvements may include:

   * Better prompt engineering for patch generation.
   * Incorporating retrieval of repository context before generation.
   * Using reinforcement learning from test results to refine the model.




In [1]:
import pandas as pd
import subprocess
import sys
import os
import tempfile
from types import SimpleNamespace
from pathlib import Path
from datasets import Dataset
import torch
from unsloth import FastLanguageModel
from datasets import Dataset, DatasetDict
from transformers import TrainingArguments
from peft import LoraConfig
from trl import SFTTrainer
from transformers import pipeline

import wandb
wandb.run = None

  from .autonotebook import tqdm as notebook_tqdm


ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!


# Passed and failed test content for a github issue

In [2]:
df = pd.read_parquet("/home/selu/Downloads/Interview_projects/knownski_challenge/data/data.parquet")
inst = df.query("instance_id=='pylint-dev__astroid-2496'").iloc[0]
instr = inst.problem_statement
patch = inst.patch
print(instr, patch, inst.FAIL_TO_PASS, inst.PASS_TO_PASS)


TypeError: unsupported format string passed to NoneType.__format__
Regression in #2459

### Steps to reproduce
a.py:
```py
class A:
    def __init__(self):
        self._magnitude = None

    def name(self) -> str | None:
        if self._magnitude:
            return f"M {self._magnitude:.1f}"
```
```
pylint a.py
```
### Current behavior
```
  File "/Users/jwalls/release/lib/python3.12/site-packages/astroid/nodes/node_classes.py", line 4778, in _infer_from_values
    yield from nodes[0]._infer(context, **kwargs)
  File "/Users/jwalls/release/lib/python3.12/site-packages/astroid/nodes/node_classes.py", line 4695, in _infer
    formatted = format(value.value, format_spec.value)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unsupported format string passed to NoneType.__format__
```
 diff --git a/ChangeLog b/ChangeLog
index 4560e5d2b7..c08b1cbf2c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -13,6 +13,9 @@ What's New in astroid 3.3.1?
 Release date: TBA
 
+* Fix a crash i

# Fine-tune coding model with training data

In [7]:
examples = [
    {"text": "### Issue:\n" + row["problem_statement"] + "\n### Patch:\n" + row["patch"]}
    for _, row in df.iterrows()
]

ds_all = Dataset.from_list(examples)
ds = DatasetDict({"train": ds_all})  # no split

model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit",  # ASCII hyphens only
    max_seq_length=1024,
    load_in_4bit=True,
    dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float16
)

model.gradient_checkpointing_enable()
model = FastLanguageModel.get_peft_model(
    model,
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"]
)



def tokenize_fn(ex):
    tok = tokenizer(ex["text"], truncation=True, max_length=1024)
    tok["labels"] = tok.input_ids.copy()
    return tok

tok = ds.map(tokenize_fn, batched=False)
args = TrainingArguments(
    output_dir="/home/selu/Downloads/Interview_projects/knownski_challenge/patch_agent",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=20,
    save_steps=500,
    report_to="none"
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tok["train"],
    eval_dataset=None,
    dataset_text_field="text",
    max_seq_length=1024,
    packing=True,
    args=args
)

trainer.train()

==((====))==  Unsloth 2025.7.11: Fast Qwen2 patching. Transformers: 4.54.1.
   \\   /|    NVIDIA GeForce RTX 3070 Laptop GPU. Num GPUs = 1. Max memory: 7.664 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.1+cu126. CUDA: 8.6. CUDA Toolkit: 12.6. Triton: 3.3.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.31.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Map: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 6/6 [00:00<00:00, 200.72 examples/s]
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 6 | Num Epochs = 3 | Total steps = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 1,843,200 of 3,087,781,888 (0.06% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss


TrainOutput(global_step=3, training_loss=1.3888416290283203, metrics={'train_runtime': 22.8295, 'train_samples_per_second': 0.788, 'train_steps_per_second': 0.131, 'total_flos': 304872542208000.0, 'train_loss': 1.3888416290283203})

# Test runner & Patch applier

In [11]:
def run(cmd_list, cwd=None):
    """Run a command as a list so special characters in test names don't break."""
    print("> " + " ".join(cmd_list))
    result = subprocess.run(cmd_list, cwd=cwd)
    if result.returncode != 0:
        print(f"Command failed with exit code {result.returncode}")
        sys.exit(result.returncode)

def simplify_tests(test_list):
    """Return just the test file paths from full pytest node IDs."""
    return sorted({t.split("::")[0] for t in test_list})

def apply_instance(inst):
    repo_url = f"https://github.com/{inst.repo}.git"
    base = inst.base_commit
    patch_text = inst.patch

    with tempfile.TemporaryDirectory() as tmp:
        os.chdir(tmp)

        # Only clone if repo folder doesn't already exist
        if not os.path.exists("repo"):
            run(["git", "clone", repo_url, "repo"])
        else:
            print("> Repo already cloned, skipping clone step.")

        os.chdir("repo")
        run(["git", "checkout", base])

        # write and apply patch (ensure it ends with a newline)
        with open("fix.patch", "w") as f:
            f.write(patch_text if patch_text.endswith("\n") else patch_text + "\n")

        # try a dry-run before applying
        dry_run = subprocess.run(["git", "apply", "--check", "fix.patch"])
        if dry_run.returncode != 0:
            print("Patch is corrupt or context mismatch.")
            sys.exit(1)

        run(["git", "apply", "--recount", "--ignore-whitespace", "fix.patch"])

        # install requirements and pytest
        run([sys.executable, "-m", "pip", "install", "pytest"])
        if os.path.exists("requirements.txt"):
            run([sys.executable, "-m", "pip", "install", "-r", "requirements.txt"])
        else:
            run([sys.executable, "-m", "pip", "install", "."])

        # run tests by file
        fail_files = simplify_tests(inst.FAIL_TO_PASS)
        pass_files = simplify_tests(inst.PASS_TO_PASS)

        print("\n Running FAIL_TO_PASS tests (should now pass):")
        run([sys.executable, "-m", "pytest"] + fail_files)

        print("\n Running PASS_TO_PASS tests (should still pass):")
        run([sys.executable, "-m", "pytest"] + pass_files)

        print("\n All tests passed â€” patch applied cleanly.")


# Applying the patch

In [12]:
# Setup generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

def generate_patch(issue_text: str) -> str:
    prompt = f"### GitHub Issue:\n{issue_text}\n\n### Patch:\n"
    result = pipe(prompt, max_new_tokens=300, do_sample=True, top_p=0.9, temperature=0.8,
                  pad_token_id=tokenizer.eos_token_id)
    return result[0]["generated_text"].split("### Patch:\n", 1)[-1].strip()

target = df.query("instance_id == 'pylint-dev__astroid-2496'").iloc[0]

row = SimpleNamespace(
    repo=target.repo,
    base_commit=target.base_commit,
    patch=target.patch,
    FAIL_TO_PASS=list(target.FAIL_TO_PASS),
    PASS_TO_PASS=list(target.PASS_TO_PASS)
)

apply_instance(row)


Device set to use cuda:0
Cloning into 'repo'...


> git clone https://github.com/pylint-dev/astroid.git repo
> git checkout 8d3cdbbe6685fd8cf211816bec56c90f38f1859e
> git apply --recount --ignore-whitespace fix.patch
> /home/selu/anaconda3/envs/sf/bin/python -m pip install pytest


Note: switching to '8d3cdbbe6685fd8cf211816bec56c90f38f1859e'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 8d3cdbbe [pre-commit.ci] pre-commit autoupdate (#2495)


> /home/selu/anaconda3/envs/sf/bin/python -m pip install .
Processing /tmp/tmp4x3y8crj/repo
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: astroid
  Building wheel for astroid (pyproject.toml): started
  Building wheel for astroid (pyproject.toml): finished with status 'done'
  Created wheel for astroid: filename=astroid-3.4.0.dev0-py3-none-any.whl size=274333 sha256=8a67bd73547ceb86dc33f117e73f09624cc436b70a9a6952fcb40ffa60d0d4b7
  Stored in directory: /tmp/pip-ephem-wheel-cache-57z9d16w/wheels/b9/24/27/beb3993563e148f01726084481c5512c3a7f855649bc63b857
Successfully built astroid
Installing collected packages: astroid
  Attempting uninstall: astroid
    Found existing

## Positioning This as a Foundation for the 7â€‘Month Knownski Challenge

The Knownski challenge runs over a 7â€‘month period, providing ample time to experiment, refine, and scale the solution.

This notebook represents a **foundation** that is a minimal, functional pipeline that can:

* Load Knownski issueâ€“patch data.
* Fine-tune a code generation model on these examples.
* Generate and apply patches to the correct repository and commit.

If I were starting the 7â€‘month journey, my **next immediate steps** from this foundation would be:

1. **Expand the Training Data**

   * Extract more issueâ€“patch pairs from additional open-source repositories.
   * Include variations and related bug contexts to improve generalization.

2. **Use a More Powerful Base Model**

   * Upgrade from the current 3B model to a larger, code-specialized model (e.g., 7B, 13B, or beyond), while still supporting LoRA fine-tuning for efficiency.

3. **Advanced Fine-Tuning**

   * Explore multi-turn fine-tuning with retrieval of relevant file context.
   * Introduce reinforcement learning using test results as feedback (e.g., PPO).

This notebook is **not** the final solution but it is a **launchpad**. It shows that the pipeline works end-to-end, and from here, the challenge becomes one of **scaling, improving accuracy, and automating context retrieval**.
