# Fast Apply
## Introduction

Frontier models such as GPT-4o struggle on large edits, with problems of laziness, inaccuracy, and high-latency.

This is a weakness visible in coding agents. Accurately editing hundreds of lines can take multiple model calls, at times trapping the agent in an infinite loop. Even small, isolated edits are plagued with bugs.

Worst of all, existing models are slow at large edits, breaking the programmer out of flow. We've trained a specialized model on an important version of the full-file code edit task called fast apply.

The concept of Fast applies was first introduced by Cursor:
- https://web.archive.org/web/20240823050616/https://www.cursor.com/blog/instant-apply

There are few articles available about it , but none really has details:
- https://fireworks.ai/blog/cursor
- https://github.com/llllvvuu/instant_apply
- https://github.com/paritoshk/anysphere_test?tab=readme-ov-file

Luckily someone made an Open Source version , which we'll explore
- https://github.com/kortix-ai/fast-apply
- https://www.reddit.com/r/LocalLLaMA/comments/1ga25gj/introducing_fast_apply_replicate_cursors_instant/
- https://github.com/kortix-ai/fast-apply/tree/main/tests_evaluate/example

## Installation

In [None]:
%pip install -qU transformers torch colorama
import sys
sys.path.append('./lib')

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


## Model on HuggingFace
The model we're using is <https://huggingface.co/Kortix/FastApply-7B-v1.0>

### Setting up the model
! This model requires a lot of memory and is not ideal to be run in codespaces

In [63]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Kortix/FastApply-7B-v1.0")
tokenizer = AutoTokenizer.from_pretrained("Kortix/FastApply-7B-v1.0")

import torch
device = "cpu"

if torch.backends.mps.is_available():
    device = "mps"

print(f"Using device: {device}")

model= model.to(device)

Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00,  2.37s/it]


Using device: mps


## Setup the existing and refactor suggestion

In [64]:
existing_code = """
// These functions help print the answers
function my_function() {
    helloWorld();
}

// This function prints "Hello, World!"
function helloWorld() {
    console.log("Hello, World!");
}
"""

In [65]:
refactor_suggestion = """
function myFunction() {
    helloWorld("Hello, World!");
}

function helloWorld(message) {
    console.log(message);
}"""

## The instruct prompt
- You can see `|im_start|` and `<|im_end|>` tags

In [66]:
prompt_template = """<|im_start|>system
You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated.<|im_end|>
<|im_start|>user
Merge all changes from the <update> snippet into the <code> below.
- Preserve the code's structure, order, comments, and indentation exactly.
- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.
- Do not include any additional text, explanations, placeholders, ellipses, or code fences.

<code>{existing_code}</code>

<update>{refactor_suggestion}</update>

Provide the complete updated code.<|im_end|>
<|im_start|>assistant
"""

prompt = prompt_template.format(
    existing_code=existing_code,
    refactor_suggestion=refactor_suggestion,
).strip()

## Generate the result

In [67]:
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(input_ids, max_length=8192,)
response = tokenizer.decode(output[0][len(input_ids[0]):])
print("Full response:")
print(response)
print(20*"=")
updated_code = response.split("<updated-code>")[1].split("</updated-code>")[0]
print(updated_code)

Full response:

<updated-code>// These functions help print the answers
function myFunction() {
    helloWorld("Hello, World!");
}

// This function prints "Hello, World!"
function helloWorld(message) {
    console.log(message);
}</updated-code><|im_end|>
// These functions help print the answers
function myFunction() {
    helloWorld("Hello, World!");
}

// This function prints "Hello, World!"
function helloWorld(message) {
    console.log(message);
}


## A diff helper

In [68]:
from diff_helper import diff_code
print(diff_code(existing_code, updated_code))

[47m[30m[41m
[47m// These functions help print the answers
function my[42mF[47m[41m_f[47munction() {
    helloWorld([42m"Hello, World!"[47m);
}

// This function prints "Hello, World!"
function helloWorld([42mmessage[47m) {
    console.log([42mmessage[47m[41m"Hello, World!"[47m);
}[41m
[47m[0m


You can see there is more to building an AI Coding IDE then just asking gpt models !