In [1]:
%pip install -r requirements.txt

[0mNote: you may need to restart the kernel to use updated packages.


In [1]:
import os
from dotenv import load_dotenv
load_dotenv(override=True)

True

# Multi-turn knowledge-base-driven chatbot on Neopilot with Milvus and llama2
In this demo, we will setup a KB-driven multi-turn chatbot. We will use milvus for our vector database. For retrieval and text generation, we will be using LangChain.

This demo is a followup from the milvus demo and assumes we already have a WikiHow collection setup and ready to go.

Our text generation model will be Llama2 running on the Neopilot platform on GPU-powered nodes and we will use a small BERT model for our text similarity needs. While this illustrate multiple ways to work with LLMs on Neopilot, this is hardly the only viable paradigm. For example, we could setup a service for both models, which would let us use larger, and thus more powerful, models for similarity. We could also use explicit topic modeling tuned on our knowledge base instead of using Milvus' distance to determine document fit.

## 1. Setup the knowledgebase
We start by setting up the connection and loading the WikiHow collection.

### 1a Create milvus connection
We could also use LangChain's vectorstores interface to Milvus so long as the store is compatible with what LangChain expects (see the Milvus demo for more details)

In [52]:
from pymilvus import connections
connection = connections.connect(
  alias="default",
  host=os.environ['MILVUS_HOST'],
  port=os.environ['MILVUS_PORT']
)

In [53]:
from pymilvus import Collection, utility

### 1b Load the collection

In [54]:
whcollection = Collection("WikiHow")
whcollection.load()

## 2. Load embeddings
We use HuggingFaceEmbeddings with the MiniLM BERT model.

In [5]:
import langchain
from langchain.embeddings import HuggingFaceEmbeddings

In [6]:
embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

  from .autonotebook import tqdm as notebook_tqdm


Let's quickly test a query to make sure everything is working fine.

In [7]:
def find(what):
    return whcollection.search(
            [embeddings.embed_query(what)],
            anns_field="vector",
            param={'metric_type': 'L2',
                        'offset': 0,
                        'params': {'nprobe': 1}
                        },
            limit=1,
            output_fields=['text', 'title'])

In [88]:
find("how to eat fruit?")[0][0].entity.get('title')

'How to Eat Fruits for Nonfruit Eaters (Eating fruits is important and presently your body is in balance chemically. However soft drinks and using a lot of sugar is not a substitute for fruits. Fruits are healthy for you in many ways, try to include them in your diet.)'

## 3. Setup LLM agents
Now that the data store is ready, we setup the data retrieval and text generation parts of the system to connect to a Neopilot-served Llama2 LLM service.

### 3.1 Connect to LLM endpoint and setup prompt template

In [9]:
import text_generation
from langchain import PromptTemplate

In [10]:
SVC_EP=os.environ['LLM_ENDPOINT']
client = text_generation.Client(SVC_EP)

Let's see what the LLM can do without any knowledge...

In [17]:
for tok in client.generate_stream(prompt="How do I run faster?", max_new_tokens=512, repetition_penalty=1.2):
    if not tok.token.special:
        print(tok.token.text, end='')



Answer: To improve your running speed, you should focus on a combination of cardiovascular fitness and muscular strength. Here are some tips to help you achieve this goal:

1. **Cardiovascular Fitness**: This is the key to endurance sports like running. It's about getting your heart and lungs working more efficiently so that they can pump oxygen-rich blood around your body quicker. You can build up your cardio by doing activities such as jogging or cycling regularly. Aim for at least 30 minutes per day.

2. **Muscular Strength**: Running requires power in your legs to propel yourself forward. Regularly incorporating exercises like squats, lunges, and leg press into your workout routine will help increase your lower body strength.

3. **Proper Nutrition**: Eat foods rich in protein (like chicken, fish, eggs) which helps repair and grow muscles. Carbohydrates provide energy during exercise, while healthy fats support hormone production. Stay hydrated throughout the day with water.

4. 

### 3.2 Setup KB-oriented text generation
For this demo, we proceed as follows:
- Perform a search against the collection with each query
- If a high-quality match is found, provide the LLM with this added context for a potential context switch
- Provide the turn history
- Trim the input to match the service's limits
- Generate the response

Here we will do manual prompt engineering beyond the template below. We could have also used the ConversationManager in LangChain, or any other similar option.

In [39]:
assistant_string = "ASSISTANT:\n"
user_string = "USER:\n"
document_string="DOCUMENT:\n"

In [73]:
prompt_template = PromptTemplate.from_template(
    f"""\
{{turns}}
{document_string}{{context}}
{user_string}What does the document say about {{prompt}}
Give me a summary. If the information is not there let me know.

{assistant_string}
"""
)

Now the function to tie it all together:

In [90]:
def generate(what, turns, context, topic):
    found = find(what)
    match_title = found[0][0].entity.get('title')
    match_text = found[0][0].entity.get('text')
    match_dist = found[0][0].distance

    retrieved = ""

    if match_title != topic and match_dist < 0.75:
        retrieved = match_text
        retrieved = retrieved[:1024]
        context = retrieved
        topic = match_title
    preface = ("No information available" if context is None else context)
    turns = "\n".join(turns)[-2048:]
    return { 
        'stream': client.generate_stream(prompt=prompt_template.format(prompt=what,
                                                                       turns=turns,
                                                                       context=preface),
                                            max_new_tokens=512,
                                            repetition_penalty=1.2),

        'topic': topic,
        'context': context
        }

## 4. Demo driver
We setup a simple demo driver for the chatbot to make sure everything is working well. The session can be terminated by entering an empty query.

In [91]:
ipt = ""
resp = ""

turns = []

topic = None
context = None

while True:
    ipt = input(">").strip()
    if len(ipt) == 0:
        break

    resp = ''
    
    result = generate(ipt, turns, context, topic)
    prev = None
    print(f"\nQuery: {ipt}\n")
    print("Response:\n")
    for tok in result['stream']:
        if not tok.token.special:
            if(prev == '\n' and tok.token.text == '\n'):
                continue
            print(tok.token.text, end='')
            prev = tok.token.text
            resp += tok.token.text
    print()
    
    resp = resp.strip()
    turns.append(user_string + ipt)
    turns.append(assistant_string + resp)
    topic = result['topic']
    context = result['context']


Query: how can I safely climb a tree?

Response:

To safely climb a tree, follow these steps:
1. Choose a healthy tree with sturdy footholds.
2. Consider using equipment like a climbing harness and ropes if you plan on regular climbing.
3. Use a Prusik cord or 'foot assist' for additional support if needed.
4. Be aware of potential hazards due to weather conditions, especially avoiding climbing during thunderstorms. Always prioritize safety over reaching the top.

Query: How can I climb a tree dangerously?

Response:

This document discusses safe methods for climbing a tree while emphasizing the importance of considering various factors that might influence the safety of the activity. It advises against attempting to climb a tree recklessly or under unsafe conditions. Here are some key points from the text regarding this topic:
- Never attempt to climb a tree during a thunderstorm, or when lightning is present nearby. This increases the risk of being struck by lightning.
- Avoid climb

## 5. Cleanup
Don't forget to clear any resources we are done using!

In [50]:
whcollection.release()
connections.disconnect('default')