# Code Completion Parser

SGLang support code fim completion in completion interface for code model. Codefim completion is feature that predicts and inserts missing code snippets within the context of existing code, boosting productivity and reducing errors. 

## Supported Models

Currently, SGLang supports the following reasoning models:
- DeepSeek Coder
- Qwen Coder
- Star Coder

## Usage

### Launching the Server

Specify the `--completion-template` option.

In [None]:
import requests
from openai import OpenAI
from sglang.test.test_utils import is_in_ci

if is_in_ci():
    from patch import launch_server_cmd
else:
    from sglang.utils import launch_server_cmd

from sglang.utils import wait_for_server, print_highlight, terminate_process


server_process, port = launch_server_cmd(
    "python -m sglang.launch_server --model-path deepseek-ai/deepseek-coder-1.3b-base --completion-template deepseek_coder --port 30020 --host 0.0.0.0"
)

wait_for_server(f"http://localhost:{port}")

### OpenAI Compatible API

completion template currently have three options: deepseek_coder, qwen_coder

- `completion-template`: the completion template you want to use. 

In [None]:
# Initialize OpenAI-like client
client = OpenAI(api_key="None", base_url=f"http://0.0.0.0:{port}/v1")
model_name = client.models.list().data[0].id

prompt = "function sum(a: number, b: number): number {"
suffix = "}"

#### Request
it's the same for Stream request and Non-Stream request, so we take non-stream as an example.

In [None]:
response_non_stream = client.completions.create(
    model=model_name,
    prompt=prompt,
    suffix=suffix,
    temperature=0.3,
    top_p=0.95,
    stream=False,  # Non-streaming
)

print_highlight("==== [FIM] ====")
print_highlight(response_non_stream.choices[0].text)

### Offline Engine API

In [None]:
import sglang as sgl
from sglang.srt.code_completion_parser import generate_completion_prompt
from sglang.utils import print_highlight


llm = sgl.Engine(model_path="deepseek-ai/deepseek-coder-1.3b-base")
input = generate_completion_prompt(prompt, suffix, "deepseek_coder")


sampling_params = {
    "max_new_tokens": 50,
    "skip_special_tokens": False,
    "temperature": 0.3,
    "top_p": 0.95,
}
result = llm.generate(prompt=input, sampling_params=sampling_params)

fim_code = result["text"]  # Assume there is only one prompt

print_highlight(fim_code)

  from .autonotebook import tqdm as notebook_tqdm


INFO 03-08 21:47:34 __init__.py:190] Automatically detected platform cuda.


2025-03-08 21:47:34,939	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


INFO 03-08 21:47:41 __init__.py:190] Automatically detected platform cuda.
INFO 03-08 21:47:41 __init__.py:190] Automatically detected platform cuda.


Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
  state = torch.load(bin_file, map_location="cpu")
Loading pt checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.31s/it]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:01<00:00,  1.31s/it]

100%|██████████| 4/4 [00:01<00:00,  2.27it/s]



  return a + b;



In [None]:
llm.shutdown()