# Building a Q/A bot with LLMs

We'll build a Q/A bot with an open source alternative to ChatGPT. There are many options to choose from (see alternatives), and we'll use Mixtral7b:intsruct, because it can run on a Mac, it's open source, and we don't need to send data outside our computer. This simplifies the implementation and means we don't need to pay for API tokens from openAI. Mixtral7b's performance beats LLAMA2, which is great.

## Alternatives to chatGPT
These models can be configured for serving with [ollama](https://ollama.ai/library) and [llamacpp](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file):

1. Instruction tuned T5: https://huggingface.co/google/flan-t5-large
1. Few shot learner: https://huggingface.co/EleutherAI/gpt-j-6b
1. Instruction tuned model that outperforms LLAMA2: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
1. One shot learner: https://huggingface.co/adept/persimmon-8b-chat

## Example for Mixtral7b with Hugging face

This is an example for using Mixtral via Huggingface. It's ok to skip the example but I'm keeping it fore reference.

The reason it's ok to skip the example is that we'll use a hosted version of the model instead of directly calling the model with Huggingface APIs. The calls to the model will make use of OLLAMA's RESTful API. For more info on using OLLAMA's RESTful API, see the following section.

```
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "mps" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

```

This is great, but we'll decouple hosting the model from using the model.

## Using OLLAMA's API


OLLAMA's restful API works by making HTTP posts with the information requested. For example:

'''
curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt": "[INST] why is the sky blue? [/INST]",
  "raw": true,
  "stream": false
}'
'''

See the next section for how to make sure you are running OLLAMA and that MISTRAL is hosted on your local environment.

## Getting a hosted version of mixtral

1. Download and install OLLAMA from: https://ollama.ai/
1. Install mixtral `curl http://localhost:11434/api/pull -d '{ "name": "mistral:instruct" }'`
1. Run `conda install requests`.

Once you are ready to call Mistral on your local environment, you are ready to try the following code:

In [34]:
import json
import requests

def generateCompletion(prompt:str,
                       model:str='mistral:instruct',
                       server_url:str="http://localhost:11434/api/generate")->str:
  payload = {
    'model': model,
    'prompt': "[INST]%s[/INST]" % prompt,
    'raw': True,
    'stream': False
  }
  r = requests.post(server_url, data=json.dumps(payload))
  return json.loads(r.text)['response']

print(generateCompletion('why is the sky blue?'))

 The color of the sky appears blue due to a process called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with molecules and particles in the air, such as nitrogen and oxygen. These particles scatter the shorter wavelengths of light, primarily blue, more than other colors because they are smaller in size than the wavelengths of light. Consequently, the sky appears blue during the day. However, at sunrise or sunset, when the sunlight has to pass through more of the atmosphere, the scattering of longer wavelengths, such as red and orange, becomes more pronounced, resulting in the beautiful colors we observe in the sky during those times.


## Consulting the knowledge base

The code beneath has functions that consult the knowledge base and prepare knowledge base contents for calling the language model. The knowledge base contents will be provided as context when asking questions to the language model.

In [39]:
import psycopg2
from sentence_transformers import SentenceTransformer
from numpy import array as nparray
tokenizer = SentenceTransformer(
  'sentence-transformers/multi-qa-mpnet-base-cos-v1'
)

def makeEmbedding(chunks:list[str],
    tokenizer:SentenceTransformer=tokenizer)->list[nparray]:
  '''
  returns a list of vectors. Vectors are numpy arrays.
  chunks is a list of strings with text
  model is 'sentence-transformers/multi-qa-mpnet-base-cos-v1' and warpped in SentenceTransformer
  '''
  return tokenizer.encode(
    [chunk for chunk in chunks]
    , batch_size=32
    , device='mps' # send work to Metal shaders in M1 macs
    , show_progress_bar=True
  )

def findMatches(query:str,
                host:str="127.0.0.1",
                database:str="semantic_search",
                user:str="postgres",
                password:str="123456")->list[str]:
  '''
  returns matches from the knowledge base.
  query is a question to be answered by finding mathces in the knowledge base.
  '''
  q_encoded = makeEmbedding([query])[0].tolist()
  CMD = "select text_chunk from items \
    order by embedding <=> '%s' limit 5"
  results = None
  with psycopg2.connect(
    host=host,
    database=database,
    user=user,
    password=password
  ) as connection:
    with connection.cursor() as cursor:
      cursor.execute(CMD % q_encoded)
      results = cursor.fetchall()
  return [result[0] for result in results]

def asArticles(statements:list[str])->str:
  articles = {"article_%s"%i:statements[i] for i in range(len(statements))}
  return json.dumps(articles)

results = asArticles(findMatches("what are the virtues of a hero?"))
print(results)

Batches: 100%|██████████| 1/1 [00:00<00:00, 34.39it/s]

{"article_0": " which of all the virtues is the proper\nvirtue for this present use", "article_1": " suffice\nit to say, Herodes succeeded in defending himself to the satisfaction of\nthe emperor", "article_2": " as whether meekness, fortitude, truth,\nfaith, sincerity, contentation, or any of the rest", "article_3": " His care to preserve\nhis friends", "article_4": "Fronto replied, thanking the prince for his advice, and promising that\nhe will confine himself to the facts of the case. But he points out that\nthe charges brought against Herodes were such, that they can hardly be\nmade agreeable; amongst them being spoliation, violence, and murder"}





## Querying the language model with the knowledge base

The code below queries the knowledge base and uses prompt engineering to ask the language model to answer questions:

In [40]:
SYSTEM_PROMPT = '''
You are a helpful Q/A bot that can only reference material from a knowledge base.
All context will be retrieved from articles in a knowledge base. This is the knowledge base:
<knowledge_base>%s</knowledge_base>
For any questions not "from the knowledge base", say that you cannot answer.
<question>%s</question>
'''

def answer(question:str)->str:
  knowledge_base = asArticles(findMatches(question))
  prompt = SYSTEM_PROMPT % (findMatches(question),question)
  return generateCompletion(prompt)

print(answer('What is virtue?').strip())

Batches: 100%|██████████| 1/1 [00:00<00:00, 10.33it/s]
Batches: 100%|██████████| 1/1 [00:00<00:00, 45.56it/s]


According to the knowledge base, virtue is a quality or disposition that is considered good and desirable. It consists not in passion but in action (XIV). The true good or evil of a reasonable and charitable man does not consist in passion but in operation and action (XIV). However, the specific virtue for the present use is mentioned as something different and more excellent and divine than the motions of the elements (XVI), but it is not explicitly defined as such in the provided passages. The knowledge base also mentions that the virtue of suffering in itself is not discussed in the _Meditations_.
