# A RAG-enhanced chatbot 

So, in this example we will build a chatbot backend that will be providing tech support on the GNU make utility<sup>1</sup>.

---
<p><small>1. The reason for choosing the GNU make utility as the subject is two-fold: 1) the normal (and frequent) use of "make" is as a verb and has nothing to do with the noun "make" which forces the LLM to make(!) a non-trivial distinctions, and 2) the GNU Make Manual is freely available for download in the context of educational purposes like this. Unfortunately, it is also a case where RAG is somewhat unnecessary due to the abundance of make questions on the internet. Feel free to replace it with something of your own choice.</small></p>

---

## Prerequisites

The very first step is to make sure all requirements (in terms of python modules) are satisfied. My suggestion is to open a command line interface in Jupyter (File -> New... -> Terminal) and run the commands there instead of in this notebook:

`jovyan@jupyter-user:~$ pip install sentence-transformers qdrant-client ollama`

## A Basic Chatbot

First up is a simple chatbot class that keeps a record of the N latest query/answer pairs generated. Take a minute to appreciate the linguistic complexity of the instructions to the LLM in `sysmsg()`.

In [None]:
import ollama

class ChatClient:
    """A basic chat client that keeps a record of the N latest query/answer pairs generated."""

    N = 3

    def __init__(self, ollama_host, ollama_model):
        self.client = ollama.Client(host=ollama_host)
        self.model = ollama_model
        self.msg_hist = []

    def _post(self, query_msg):
        # drop oldest query+answer pair from history if there are more than N pairs
        self.msg_hist = self.msg_hist[-2*self.N:]
        # add the current query
        self.msg_hist.append(query_msg)
        # prepend the generic instructions
        msg_list = [self.sysmsg()] + self.msg_hist
        # print(msg_list)

        response = self.client.chat(
            model=self.model,
            messages=msg_list,
            stream=False,
        )
        reply = response['message']
        self.msg_hist.append(reply)
        return reply['content']


    def sysmsg(self):
        # a template for the instructions to the system
        return {
            'role': 'system',
            'content': '''
            You are an AI assistant providing tech support on the GNU Make program.
            In this context GNU Make, Make, and 'make' always refer to the GNU Make program,
            and so does the noun make.
            Provide short, concise answers prefixed by >>> .
            If you cannot answer the question, just say so.
            '''.strip(),
        }

    def post(self, query):
        # posting a query using a simple template, receiving (and printing) the answer
        msg = {
            'role': 'user',
             'content': f'{query}',
        }
        answer = self._post(msg)
        print(answer)

After instantiating a chatbot ...

In [None]:
OLLAMA_HOST = 'http://10.129.20.4:9090'
OLLAMA_MODEL = 'llama3:70b'

client = ChatClient(OLLAMA_HOST, OLLAMA_MODEL)

... we can pose it a series of questions. Note that there is less and less explicit context in the questions, with the last one only implicitly referring to the core question (compiling latex into pdf). See the screenshot below in case the conversation goes
haywire...
<img src="img/chatbot1.png" alt="Chatbot" style="width: 400px;"/>

In [None]:
client.post("What can I use make for?")

In [None]:
client.post("Can you give a simple example of how to compile a single-file C project?")

In [None]:
client.post("What would a similar example of compiling latex into pdf look like?")

In [None]:
client.post("but I don't have pdflatex on my system")

## Preparing content for RAG

This has been explained in the previous tutorials, and will thus be stated without much commentry. 

### Chunking

Rather than just taking sentences as chunks, we'll bundle bundle a number of sentences into chunks, respecting the (approximate) maximum chunksize (`CHUNKSIZE=1200`).

In [None]:
#
# Split the input data into chunks
#
import re

class Manual:
    
    CHUNKSIZE=1200
    
    def __init__(self, filepath):
        with open(filepath, 'r') as fd:
            text = fd.read() 
        self.sentences = re.split(r"(?:(?<=\.|\?|!)\n)|(?:\n(?=\s*\d{1,}(?:\.\d)*))|(?:\n\n)", text)

    def _chunk_sentences(self, idx):
        chunk = ""
        while len(chunk) < self.CHUNKSIZE and idx < len(self.sentences):
            chunk = chunk + "\n" + self.sentences[idx]
            idx += 1
        return chunk, idx

    def chunk_text(self):
        metadata = []
        chunks = []
        for first in range(0, len(self.sentences), 3):
            chunk, last = self._chunk_sentences(first)   
            chunks.append(chunk)
            metadata.append({"first": first, "last": last})
        
        return chunks, metadata

### Embedding

This is straightforward, except that embeddings are cached to save time on repeated runs. 

In [None]:
from sentence_transformers import SentenceTransformer

class Embedder:

    model_name = 'sentence-transformers/all-mpnet-base-v2'
    
    def __init__(self):
        # Get the model
        self.model = SentenceTransformer(self.model_name)
        self._cache = []
        self._hash = None

    def embed(self, chunks, force_compute = False):
        # Vectorize, i.e. create embeddings
        # This can take a couple of minutes, 
        # so use cached embeddings unless something changed
        _hash = hash((tuple(chunks), self.model_name))
        compute = len(self._cache) == 0 or self._hash != _hash or force_compute
        if compute:
            self._hash = _hash
            self._cache = self.model.encode(chunks, show_progress_bar=True)
        return self._cache

### Vector store

In [None]:
from qdrant_client import QdrantClient
from qdrant_client import models
from qdrant_client.models import Distance, VectorParams

class VectorStore:

    # Provide a name for the _collection_ making up your corner of the database 
    # use e.g. <signum>_gmm
    collection_name = "eperspe_gmm" 
    
    def __init__(self, host, port, embedding_model):
        # Create a client connecting to the service
        self.db = QdrantClient(host=host, port=port)
        self.embedding_model = embedding_model

    def _clear(self):
        # Check if collection (for this toy example) already exist, and remove if so
        if self.db.collection_exists(collection_name=self.collection_name):
           self.db.delete_collection(collection_name=self.collection_name)

        # Create a named collection and set vector dimension and metric (EUCLID => L2)
        self.db.create_collection(
            collection_name = self.collection_name,
            vectors_config = VectorParams(
                size=self.embedding_model.get_sentence_embedding_dimension(), 
                distance=Distance.EUCLID
            ),
        )

    # Upload our embeddings, one variant of many (destroys old data)
    def upload(self, embeddings, metadata = None):
        self._clear()
        # If ids are not provided, Qdrant Client will generate random UUIDs for each entry
        n = len(embeddings)
        self.db.upload_collection(
            collection_name = self.collection_name,
            ids = range(n),
            payload = metadata,
            vectors = embeddings,
        )

    def query(self, question, n = 1):
        query_embedding = self.embedding_model.encode(question)
        # Return the two closest matches
        search_results = self.db.search(
            collection_name = self.collection_name,
            search_params = models.SearchParams(hnsw_ef=10, exact=False),
            query_vector = query_embedding,
            limit = n,
        )

        return [(result.id, result.payload, result.score) for result in search_results]

FIXME: vvvvvvvvvvvv

In [None]:
gmm = Manual('gmm/gnu-make-manual.txt')
chunks, _ = gmm.chunk_text()

In [None]:
len(chunks)

In [None]:
embedder = Embedder()
embeddings = embedder.embed(chunks)

In [None]:
embeddings = embedder.embed(chunks)

In [None]:
host="10.129.20.4"
port=6333
store = VectorStore(host=host, port=port, embedding_model=embedder.model)

In [None]:
res = store.query("How can I use make to compile a program?", 3)
res

In [None]:
for idx, _, _ in res:
    print(f"{idx}:\n{chunks[idx]}\n\n")