When training or fine-tuning an LLM, model builders utilize certain **Prompt Formats**. These look like a special markup type of language that allows the prompt curators to specify things like chat conversation turns, end-of-statements, tool calls, etc.

If you download an open model and try to prompt the without utilizing the expected prompt format, you will possibly get poor performance. Here we will demonstrate that along with application of the prompt format. However, AI platforms usually provide a way to automatically apply the appropriate prompt format via a "chat" endpoint. This doesn't mean you have to use this only for chat, but it is a convenient way to manage the proper prompt formatting.

# Dependencies and imports

In [1]:
! pip install predictionguard

Collecting predictionguard
  Downloading predictionguard-2.7.0-py2.py3-none-any.whl.metadata (872 bytes)
Downloading predictionguard-2.7.0-py2.py3-none-any.whl (21 kB)
Installing collected packages: predictionguard
Successfully installed predictionguard-2.7.0


In [2]:
import os

from predictionguard import PredictionGuard
from getpass import getpass

In [3]:
pg_access_token = getpass('Enter your Prediction Guard access api key: ')
os.environ['PREDICTIONGUARD_API_KEY'] = pg_access_token

Enter your Prediction Guard access api key: ··········


In [4]:
client = PredictionGuard()

# With and without prompt formatting

In [11]:
result = client.completions.create(
    model="Hermes-3-Llama-3.1-8B",
    prompt="Tell me a joke"
)

print(result['choices'][0]['text'])

 – I can’t remember any. I’m like a bank. I don’t keep deposits. Tell me a joke.
I hate you – I was only kidding. I could never really hate you. I’m not capable of that emotion. Well, maybe in traffic but that doesn’t count.
You said you’d never forget me. But you didn’t. Like I’m nothing. You just went ahead and forgot me.
Where’s my damn coffee? I’m going to kill him. Wait,


In [13]:
prompt = """<|im_start|>system
{prompt}<|im_end|>
<|im_start|>user
Tell me a joke<|im_end|>
<|im_start|>assistant<|im_end|>"""

result = client.completions.create(
    model="Hermes-3-Llama-3.1-8B",
    prompt=prompt
)

print(result['choices'][0]['text'])


Sure, here's a joke for you:

Why don't scientists trust atoms? 
Because they make up everything!

This joke plays on the dual meaning of "make up". In science, atoms truly do "make up" or constitute everything that exists. However, the phrase "make up" also means to fabricate or lie about something. So the joke humorously suggests that atoms are untrustworthy because they "lie" by making up everything. The juxtaposition of the two meanings is


# Automatic prompt formatting with a chat endpoint

In [15]:
result = client.chat.completions.create(
    model="Hermes-3-Llama-3.1-8B",
    messages=[{"role": "user", "content": "Tell me a joke"}]
)

print(result['choices'][0]['message']['content'])

Sure, here's a classic one for you:

Why don't scientists trust atoms?

Because they make up everything!
