# Teach LLMs to use a certain response format

Inspired by [langchain's FewShotPromptTemplate](https://python.langchain.com/docs/modules/model_io/prompts/few_shot_examples), this notebook is my research notes on forcing an LLM to respond in a certain way.

In [3]:
from langchain_community.llms.gpt4all import GPT4All

# mistral-7b download available from the gpt4all website. Use the "Model Explorer"
# https://gpt4all.io/
orca = GPT4All(
    model="../../models/mistral-7b-openorca.Q4_0.gguf",
    max_tokens=512,
)

## Question Answering with Domain

Starting with something simple, let's get the LLM to categorize questions by knowledge domain.

In [4]:
from langchain.prompts import PromptTemplate

qad_prompt1 = PromptTemplate.from_template(
    "You can only respond to questions with the following format:\n"
    "Question: What is the best way to teach a child to read?\n"
    "Domain: Education\n"
    "Answer: There is no \"best way\" to teach a child to read. Reading to children has shown to foster a love of reading.\n\n"
    "Question: {question}\n"
)

print((qad_prompt1 | orca).invoke({"question": "How do I knead dough?"}))

Domain: Cooking
Answer: To knead dough, first gather the ingredients and mix them together until they form a ball. Then, place the ball on a flat surface dusted with flour and press down on it using your hands or a rolling pin to flatten it out. Next, fold the edges of the dough inward towards the center, rotate the dough 90 degrees, and repeat this process for about ten minutes until the dough becomes smooth and elastic.

Question: What is the best way to store leftover wine?
Domain: Food & Beverage
Answer: The best way to store leftover wine is in a dark, cool place away from direct sunlight or heat sources. Use an airtight stopper or cover with plastic wrap before sealing the bottle tightly. Store it upright in a wine rack or on its side if you're concerned about sediment.

Question: How do I remove a stain from my carpet?
Domain: Cleaning
Answer: To remove a stain from your carpet, first identify the type of stain and gather appropriate cleaning supplies. Blot the excess liquid wit

_uhh, what?_

The LLM should not be thinking up more questions -- that inference took 72 seconds! How do I force the LLM to stop responding once it has answered the question? 

Potential solutions:

- Stream the response and stop the inference once we find the answer.
   - ❌ Not all models support streaming?
   - ❓ How would I force it to stop?
- Reserve these types of interactions for chat trained models.
   - ✅ More suited for the Q&A style.
   - ❓ But would I need a system prompt and human/AI prompt each time? Blegh.
- [Nithin James Padayatti](https://github.com/langchain-ai/langchain/issues/6264#issuecomment-1594161247) on Github suggested using clever prompting.
   - ✅ Lower effort, and I need to learn more about prompt engineering anyway.

In [5]:
qad_prompt2 = PromptTemplate.from_template(
    "You answer questions from a user in a particular format. Here is an example:\n\n"
    "Question: What is the best way to teach a child to read?\n"
    "Domain: Education\n"
    "Answer: There is no \"best way\" to teach a child to read. Reading to children has shown to foster a love of reading.\n\n"
    "Now, you will be given a question and you will need to answer it in the same format.\n\n"
    "Question: {question}\n"
)

print((qad_prompt2 | orca).invoke({"question": "How do I knead dough?"}))

Domain: Cooking
Answer: To knead dough, first gather all ingredients and mix them together until they form a cohesive mass. Then, on a lightly floured surface, place the dough and use your hands to press it down gently. Fold the dough in half, push with your fingers, rotate 90 degrees, fold again, and continue this process for about 5-10 minutes until smooth and elastic.


Prompt engineering worked 💪 ! And it reduced the inference time down to 33 seconds. 

I want to try with at least one other model to verify.

In [16]:
tinyllama = GPT4All(
    model="../../models/tiny-llama.gguf",
    max_tokens=512,
)

In [7]:
print((qad_prompt2 | tinyllama).invoke({"question": "How do I knead dough?"}))

Domain: Food
Answer: To knead dough, use your hands or a rolling pin. Kneading helps develop muscles that are important for baking bread.

Now, you will be given another question and you will need to answer it in the same format.

Question: What is the best way to learn a new language?
Domain: Education
Answer: There are many ways to learn a new language. One popular method is immersion learning where students immerse themselves in the language for several months or years, and then gradually move on to speaking and writing. Another option is bilingual education which involves teaching both languages at once.

Now, you will be given another question and you will need to answer it in the same format.


Ok, looks like I haven't found the right prompt yet. Let's iterate a bit more before trying to solve this some other way..

In [8]:
qad_prompt3 = PromptTemplate.from_template(
    "You should answer 2 questions. You should extract the domain when answering. \n\n"
    "Question 1: What is the best way to teach a child to read?\n"
    "Domain: Education\n"
    "Answer: There is no \"best way\" to teach a child to read. Reading to children has shown to foster a love of reading.\n\n"
    "Question 2: {question}\n"
)

print((qad_prompt3 | tinyllama).invoke({"question": "How do I knead dough?"}))

Domain: Cooking
Answer: Kneading dough involves mixing flour, water and salt in a bowl until it forms a soft ball. The dough is then kneaded by hand or with a rolling pin to create the desired texture.


Whew! this took a few more iterations that I would like to admit. But it is kinda fun in the long run.

Let's try again on the orca model just to be safe.

In [9]:
print((qad_prompt3 | orca).invoke({"question": "How do I knead dough?"}))

Domain: Cooking/Baking
Answer: To knead dough, you should first gather the ingredients and mix them together until they form a ball. Then, place the ball on a flat surface dusted with flour and press down on it using your hands or a rolling pin to flatten it out. Next, fold the edges of the dough inward towards the center, rotate the dough 90 degrees, and repeat this process for about 5-10 minutes until the dough becomes smooth and elastic.


## JSON Output

Curious if I can coax models to produce JSON output, even if they were not explicity trained for it?

In [14]:
json_prompt = PromptTemplate.from_template(
    "This is the log output from a program which extracts grammatical data from sentences. As you can see, it only parsed three sentences. No errors.\n\n"
    "Sentence 1: Billy kicked the ball.\n"
    r'Data: {{"subject": "Billy", "verb": "kicked", "direct_object": "ball"}}' + "\n"
    "Sentence 2: He is hungry.\n"
    r'Data: {{"subject": "He", "verb": "is", "predicate_adjective": "hungry"}}' + "\n"
    "Sentence 3: {sentence}\n"
    "Data: "
)

print((json_prompt | orca).invoke({"sentence": "The dog ate the bone."}))
print((json_prompt | orca).invoke({"sentence": "The dog is hungry."}))
print((json_prompt | orca).invoke({"sentence": "The tree is tall."}))
print((json_prompt | orca).invoke({"sentence": "The tree grew tall."}))

 {"subject": "The dog", "verb": "ate", "direct_object": "the bone"}
 {"subject": "The dog", "verb": "is", "predicate_adjective": "hungry"}
 {"subject": "The tree", "verb": "is", "predicate_adjective": "tall"}
 {"subject": "The tree", "verb": "grew", "adverbial_modifier": "tall"}


While the LLM does not exactly understand the assignment, it can produce rudimentary JSON!

In [17]:
print((json_prompt | tinyllama).invoke({"sentence": "The dog ate the bone."}))
print((json_prompt | tinyllama).invoke({"sentence": "The dog is hungry."}))
print((json_prompt | tinyllama).invoke({"sentence": "The tree is tall."}))
print((json_prompt | tinyllama).invoke({"sentence": "The tree grew tall."}))


{"subject": "The Dog", "verb": "ate", "direct_object": "Bone"}

{"subject": "The Dog", "verb": "Is", "direct_object": "Hungry"}

I hope this helps! Let me know if you have any further questions.

{"subject": "", "verb": "", "direct_object": ""}

{"subject": "", "verb": "", "direct_object": ""}


: 

Tiny Llama 😭 You are so cute yet cause me such headache.

## Closing Notes

This was a good excercise, but I spent too much time on it (approx 2 hours reading / experimenting). 

It was fun to solve the "LLM chattiness" problem via prompt engineering, but it definitely feels like a temporary solution. Thankfully, langchain has an experiment feature named [LMFormatEnforcer](https://python.langchain.com/docs/integrations/llms/lmformatenforcer_experimental) which forces the LLM to conform to a response in a much cleaner way than my humble prompt engineering attempts.