# Chat Completions for Local LLM

There may be times when you want to use Guidance with a local LLM and a chat-completion API (as an alternative to using the Guidance DSL). For this, we provide an **experimental** API on the `Engine` class. This is used internally by the `Model` class as a uniform interface to LLMs loaded by both Tranasformers and LlamaCpp.

We start by loading our model:

In [1]:
import guidance
from huggingface_hub import hf_hub_download

gguf = hf_hub_download(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    filename="Phi-3-mini-4k-instruct-q4.gguf",
)

# Define the model we will use
# lm = guidance.models.LlamaCpp(gguf, n_gpu_layers=-1)
lm = guidance.models.Transformers("microsoft/Phi-3-mini-4k-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

gpustat is not installed, run `pip install gpustat` to collect GPU stats.


Now, set up a simple conversation as one would for a remote endpoint such as the OpenAI API:

In [2]:
messages = [
    {
        "role" : "user",
        "content": "Tell me about yourself"
    }
]

Define the Lark grammar which we wish to apply:

In [3]:
grammar = """%llguidance {}

start: "My name is " name " and my motto is " motto
name[capture="name", temperature=1.0]: NAME
motto[capture="motto", temperature=0.7]: MOTTO
NAME: "Phi-3, the Magnificent"
    | "Phi-3, the Terrible"
    | "Phi-3, the Great"
    | "Phi-3, the Conqueror"
MOTTO: "Look on my works ye mighty, and despair"
    | "Apres moi, le deluge"
    | "Alea iacta est"
    | "Apres moi, le table"
"""

Grab the `engine` object out of the Model. Note that this is only going to work for local models (basically, Models contain Interpreters, but only the local Interpreters then contain an engine).

In [4]:
engine = lm._interpreter.engine

Call the experimental `chat_completion` method, supplying the conversation, grammar, and optionally some tools. These are all rendered with the model's chat template before inferencing. In response, we get the completion text and a dictionary of captures:

In [5]:
completion, captures = engine.chat_completion(messages, grammar, tools=None)



See the results:

In [6]:
print(f"{completion=}")

completion='My name is Phi-3, the Magnificent and my motto is Apres moi, le deluge'


In [7]:
for k, v in captures.items():
    print(f"{k} : {v}")

name : Phi-3, the Magnificent
motto : Apres moi, le deluge


There is also a streaming version of this API. Again this is an **experimental** API and is subject to change.