# LLaMA2-Chat Guidance

This notebook shows how to control the <a href="https://ai.meta.com/llama/">LLaMA</a> model using the guidance library. Note that this notebook uses a <a href="https://huggingface.co/meta-llama/Llama-2-7b-chat-hf">Transformers version of the model</a>, so please check out the special license terms noted on the HuggingFace model page before downloading.

In [None]:
import torch
import guidance
from transformers import BitsAndBytesConfig

# create a quantization config
qconfig = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

# load the LLaMA2 Chat model
# create a hf token, and add it to your system
llm = guidance.llms.transformers.LLaMAChat(
    "meta-llama/Llama-2-7b-chat-hf",
    quantization_config=qconfig,
    low_cpu_mem_usage=True,
    use_auth_token=True,
    device_map="auto",
    torch_dtype=torch.float16,
)  # use cuda for GPU if you have >27GB of VRAM

In [None]:
# A prompt to generate 5 ways to cook pasta
# These chat models require a temperature > 0

PROMPT = """
{{#system~}}
You are a helpful, kind and truthful AI assistant. 
{{~/system}}

{{#user~}}
I want to {{goal}}.
Can you please generate one option for how to accomplish this?
Please make the option very short, at most one line.
{{~/user}}

{{#assistant~}}
{{gen 'options' temperature=0.3 max_tokens=900 n=5}}
{{~/assistant}}
"""

program = guidance(PROMPT, llm=llm)
executed_program = program(goal="cook pasta")

executed_program["options"]

<hr style="height: 1px; opacity: 0.5; border: none; background: #cccccc;">
<div style="text-align: center; opacity: 0.5">Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!</div>