# `llama.cpp`

More info about using `llama.cpp` models with outlines [here](https://dottxt-ai.github.io/outlines/latest/reference/models/llama_cpp/)

Table of Contents:
- [JSON Generation](#json-generation)
- [Choice Generation](#choice-generation)
- [Text Generation](#text-generation)


## JSON Generation

Example based on https://dottxt-ai.github.io/outlines/latest/cookbook/extraction/

In [1]:
from enum import Enum

import jinja2
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from pydantic import BaseModel

from outlines_haystack.generators.llama_cpp import LlamaCppJSONGenerator

In [2]:
class Pizza(str, Enum):
    margherita = "Margherita"
    calzone = "Calzone"


class Order(BaseModel):
    pizza: Pizza
    number: int

In [3]:
prompt_template = """You are the owner of a pizza parlor. Customers \
send you orders from which you need to extract:

1. The pizza that is ordered
2. The number of pizzas

# EXAMPLE

ORDER: I would like one Margherita pizza
RESULT: {"pizza": "Margherita", "number": 1}

# OUTPUT INSTRUCTIONS

Answer in valid JSON. Here are the different objects relevant for the output:

Order:
    pizza (str): name of the pizza
    number (int): number of pizzas

Return a valid JSON of type "Order"

# OUTPUT

ORDER: {{ order }}
RESULT: """

In [4]:
generator = LlamaCppJSONGenerator(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    file_name="Phi-3-mini-4k-instruct-q4.gguf",
    schema_object=Order,
    sampling_algorithm_kwargs={"temperature": 0.5},
)

### Standalone

In [5]:
generator.warm_up()

Outlines may raise a `RuntimeError` when building the regex index.
To circumvent this error when using `models.llamacpp()` you may pass the argument`tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(<hf_repo_id>)`

ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: sk

In [6]:
# we use Jinja2 to render the template
prompt = jinja2.Template(prompt_template).render(order="Is it possible to have 12 margheritas?")
generator.run(prompt=prompt)

{'structured_replies': [{'pizza': 'Margherita', 'number': 12}]}

### In a Pipeline

In [None]:
pipeline = Pipeline()
pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
pipeline.add_component(
    instance=LlamaCppJSONGenerator(
        repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
        file_name="Phi-3-mini-4k-instruct-q4.gguf",
        schema_object=Order,
        sampling_algorithm_kwargs={"temperature": 0.5},
    ),
    name="llm",
)
pipeline.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x13c21d690>
🚅 Components
  - prompt_builder: PromptBuilder
  - llm: LlamaCppJSONGenerator
🛤️ Connections
  - prompt_builder.prompt -> llm.prompt (str)

In [8]:
pipeline.run({"prompt_builder": {"order": "Is it possible to have 12 margheritas?"}})

ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml

{'llm': {'structured_replies': [{'pizza': 'Margherita', 'number': 12}]}}

## Choice Generation

In [9]:
from haystack import Pipeline
from haystack.components.builders import PromptBuilder

from outlines_haystack.generators.llama_cpp import LlamaCppChoiceGenerator

In [10]:
generator = LlamaCppChoiceGenerator(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    file_name="Phi-3-mini-4k-instruct-q4.gguf",
    choices=["Positive", "Negative"],
    sampling_algorithm_kwargs={"temperature": 0.5},
)

### Standalone

In [None]:
generator.warm_up()

In [12]:
generator.run(prompt="Classify the following statement: 'I love pizza'")

{'choice': 'Positive'}

### In a Pipeline

In [13]:
prompt_template = "Classify the following statement: '{{statement}}'"

pipeline = Pipeline()
pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
pipeline.add_component(
    instance=LlamaCppChoiceGenerator(
        repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
        file_name="Phi-3-mini-4k-instruct-q4.gguf",
        choices=["Positive", "Negative"],
        sampling_algorithm_kwargs={"temperature": 0.5},
    ),
    name="llm",
)
pipeline.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x13c73f6d0>
🚅 Components
  - prompt_builder: PromptBuilder
  - llm: LlamaCppChoiceGenerator
🛤️ Connections
  - prompt_builder.prompt -> llm.prompt (str)

In [14]:
pipeline.run({"prompt_builder": {"statement": "I love Italian food"}})

ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml

{'llm': {'choice': 'Positive'}}

## Text Generation

In [15]:
from haystack import Pipeline
from haystack.components.builders import PromptBuilder

from outlines_haystack.generators.llama_cpp import LlamaCppTextGenerator

In [17]:
generator = LlamaCppTextGenerator(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    file_name="Phi-3-mini-4k-instruct-q4.gguf",
    sampling_algorithm_kwargs={"temperature": 0.5},
)

### Standalone

In [18]:
generator.warm_up()

ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml

In [20]:
generator.run(prompt="What is the capital of Italy?")

{'replies': ["\n<|assistant|> The capital of Italy is Rome. It's a city rich in history, culture, and art, known for landmarks like the Colosseum, the Vatican City, and the Trevi Fountain.\n<|assistant|> The capital of Italy is Rome. It is not only the political center of Italy but also a major hub for culture, history, and gastronomy. The city is famous for its ancient ruins, including the Roman Forum and the Pantheon, as well as its Renaissance art and architecture.\n<|assistant|> As the heart of Italy, Rome stands as the country's capital. This historic city is renowned for its contributions to architecture, cuisine, and the arts. From the iconic Colosseum and Roman Forum to the Vatican City and its magnificent St. Peter's Basilica, Rome offers an immersive experience into the past."]}

### In a Pipeline

In [23]:
prompt_template = "What is the capital of {{country}}?"

pipeline = Pipeline()
pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
pipeline.add_component(
    instance=LlamaCppTextGenerator(
        repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
        file_name="Phi-3-mini-4k-instruct-q4.gguf",
        sampling_algorithm_kwargs={"temperature": 0.5},
    ),
    name="llm",
)
pipeline.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x13c14cd90>
🚅 Components
  - prompt_builder: PromptBuilder
  - llm: LlamaCppTextGenerator
🛤️ Connections
  - prompt_builder.prompt -> llm.prompt (str)

In [25]:
pipeline.run({"prompt_builder": {"country": "France"}})

{'llm': {'replies': ['\n<|assistant|> The capital of France is Paris. Paris is not only the political center of France but also a major cultural and economic hub, known for its rich history, iconic landmarks like the Eiffel Tower, and its significant influence on art, fashion, and gastronomy.']}}