# Building a Q/A bot with LLMs

We'll build a Q/A bot with an open source alternative to ChatGPT. There are many options to choose from (see alternatives), and we'll use Mixtral7b:intsruct, because it can run on a Mac, it's open source, and we don't need to send data outside our computer. This simplifies the implementation and means we don't need to pay for API tokens from openAI. Mixtral7b's performance beats LLAMA2, which is great.

## Alternatives to chatGPT
These models can be configured for serving with [ollama](https://ollama.ai/library) and [llamacpp](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file):

1. Instruction tuned T5: https://huggingface.co/google/flan-t5-large
1. Few shot learner: https://huggingface.co/EleutherAI/gpt-j-6b
1. Instruction tuned model that outperforms LLAMA2: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
1. One shot learner: https://huggingface.co/adept/persimmon-8b-chat

## Example for Mixtral7b with Hugging face

This is an example for using Mixtral via Huggingface:

```
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "mps" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

```

This is great, but we'll decouple hosting the model from using the model.

## Getting a hosted version of mixtral

1. Download and install OLLAMA from: https://ollama.ai/
1. Install mixtral `curl http://localhost:11434/api/pull -d '{ "name": "mistral:instruct" }'`
1. Run `conda install requests`.

In [20]:
# '''
# curl http://localhost:11434/api/generate -d '{
#   "model": "mistral",
#   "prompt": "[INST] why is the sky blue? [/INST]",
#   "raw": true,
#   "stream": false
# }'
# '''


# format: the format to return a response in. Currently the only accepted value is json
# options: additional model parameters listed in the documentation for the Modelfile such as temperature
# system: system message to (overrides what is defined in the Modelfile)
# template: the full prompt or prompt template (overrides what is defined in the Modelfile)
# context: the context parameter returned from a previous request to /generate, this can be used to keep a short conversational memory
# stream: if false the response will be returned as a single response object, rather than a stream of objects
# raw: if true no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your request to the API.

OLLAMA_URL = "http://localhost:11434/api/generate"
import json
import requests
payload = { \
  'model': 'mistral', \
  'prompt': "[INST] why is the sky blue? [/INST]", \
  'raw': True, \
  'stream': False \
}
r = requests.post(OLLAMA_URL, data=json.dumps(payload))
print(json.loads(r.text)['response'])

 The color of the sky appears blue due to a phenomenon called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with different gases and particles present in the air. Blue light has a shorter wavelength and gets scattered more easily than other colors due to its smaller size. Consequently, when we look up at the sky, we primarily see the blue light that has been scattered in all directions.


In [19]:
SYSTEM_PROMPT = '''
You are a helpful Q/A bot that can only reference material from a knowledge base.
All context will be retrieved from a knowledge base.
For any questions not "from the knowledge base", say that you cannot answer.
'''