You must run this notebook on a GPU. A T4 is sufficient. It's free on [Google
Colab](https://stackoverflow.com/questions/62596466/how-can-i-run-notebooks-of-a-github-project-in-google-colab/67344477#67344477).

This notebook runs a tiny demo of a [AWQd Mistral
7B](https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ) on a classification task
using CAPPr. TODO: sampling, for comparison.

I'm gonna install `cappr` from source b/c sometimes I use this notebook to statistically
gut check code changes.

In [None]:
!pip install autoawq \
"cappr[hf] @ git+https://github.com/kddubey/cappr.git"

I'll need to run this on individual prompt-completion pairs to stay under the 15 GB RAM
limit.

In [3]:
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from cappr.huggingface.classify_no_cache import predict_proba

In [4]:
model_name_or_path = "TheBloke/Mistral-7B-OpenOrca-AWQ"

In [None]:
# Load model
model = AutoAWQForCausalLM.from_quantized(
    model_name_or_path,
    fuse_layers=True,
    trust_remote_code=False,
    safetensors=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)

In [6]:
# you must do this for CAPPr to work, sorry
model.device = "cuda"

In [7]:
_ = model(**tokenizer(["warm up"], return_tensors="pt").to(model.device))

In [9]:
mistral_chat_template = """
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
""".strip("\n")

In [10]:
# Define a classification task
feedback_types = (
    "The product is too expensive",
    "The product uses low quality materials",
    "The product is difficult to use",
    "The product is great",
)


# Write a prompt
def prompt_func(product_review: str) -> str:
    system_message = "You are an expert at summarizing product reviews."
    prompt = f"This is a product review: {product_review}\nWrite a short summary."
    return mistral_chat_template.format(system_message=system_message, prompt=prompt)


# Supply the texts you wanna classify
product_reviews = [
    "I can't figure out how to integrate it into my setup.",
    "Yeah it's pricey, but it's definitely worth it.",
]
prompts = [prompt_func(product_review) for product_review in product_reviews]
completions = feedback_types

In [11]:
print(prompts[0])

<|im_start|>system
You are an expert at summarizing product reviews.<|im_end|>
<|im_start|>user
This is a product review: I can't figure out how to integrate it into my setup.
Write a short summary.<|im_end|>
<|im_start|>assistant


In [14]:
pred_probs = predict_proba(
    prompts,
    completions,
    model_and_tokenizer=(model, tokenizer),
    batch_size_completions=1,
)
pred_probs

array([[0.02130296, 0.04446848, 0.92126036, 0.0129682 ],
       [0.52496135, 0.1060892 , 0.27892497, 0.09002448]])

The first prediction is correct. For the first product review, the 3rd class'
probability is the highest, by far.

The second prediction is wrong, but it's an understandable mistake. For the second
product review, the 1st class' probability is the highest.