# ReadtheDocs Retrieval Augmented Generation (RAG) using Zilliz Free Tier

In this notebook, we are going to use Milvus documentation pages to create a chatbot about our product.  The chatbot is going to follow RAG steps to retrieve chunks of data using Semantic Vector Search, then the Question + Context will be fed as a Prompt to a LLM to generate an answer.

Many RAG demos use OpenAI for the Embedding Model and ChatGPT for the Generative AI model.  **In this notebook, we will demo a fully open source RAG stack.**

Using open-source Q&A with retrieval saves money since we make free calls to our own data almost all the time - retrieval, evaluation, and development iterations.  We only make a paid call to OpenAI once for the final chat generation step. 

<div>
<img src="../../pics/rag_image.png" width="80%"/>
</div>

Let's get started!

In [1]:
# For colab install these libraries in this order:
# !pip install pymilvus langchain
# !pip install python-dotenv unstructured openai

In [2]:
# Import common libraries.
import sys, os, time, pprint
import numpy as np

# Import custom functions for splitting and search.
sys.path.append("..")  # Adds higher directory to python modules path.
import milvus_utilities as _utils

## Download Milvus documentation to a local directory.

The data we’ll use is our own product documentation web pages.  ReadTheDocs is an open-source free software documentation hosting platform, where documentation is written with the Sphinx document generator.

The code block below downloads the web pages into a local directory called `rtdocs`.  

I've already uploaded the `rtdocs` data folder to github, so you should see it if you cloned my repo.

In [3]:
# # Uncomment to download readthedocs pages locally.

# DOCS_PAGE="https://pymilvus.readthedocs.io/en/latest/"
# !echo $DOCS_PAGE

# # Specify encoding to handle non-unicode characters in documentation.
# !wget -r -A.html -P rtdocs --header="Accept-Charset: UTF-8" $DOCS_PAGE

## Start up a Zilliz free tier cluster.

Code in this notebook uses fully-managed Milvus on [Ziliz Cloud free trial](https://cloud.zilliz.com/login).  
  1. Choose the default "Starter" option when you provision > Create collection > Give it a name > Create cluster and collection.  
  2. On the Cluster main page, copy your `API Key` and store it locally in a .env variable.  See note below how to do that.
  3. Also on the Cluster main page, copy the `Public Endpoint URI`.

💡 Note: To keep your tokens private, best practice is to use an **env variable**.  See [how to save api key in env variable](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety). <br>

👉🏼 In Jupyter, you need a .env file (in same dir as notebooks) containing lines like this:
- ZILLIZ_API_KEY=f370c...
- OPENAI_API_KEY=sk-H...
- VARIABLE_NAME=value...


In [4]:
# STEP 1. CONNECT TO MILVUS

# !pip install pymilvus #python sdk for milvus
from pymilvus import connections, utility
from dotenv import load_dotenv
load_dotenv()
TOKEN = os.getenv("ZILLIZ_API_KEY")

# Connect to Zilliz cloud using endpoint URI and API key TOKEN.
# TODO change this.
CLUSTER_ENDPOINT="https://in03-xxxx.api.gcp-us-west1.zillizcloud.com:443"
connections.connect(
  alias='default',
  #  Public endpoint obtained from Zilliz Cloud
  uri=CLUSTER_ENDPOINT,
  # API key or a colon-separated cluster username and password
  token=TOKEN,
)

# Check if the server is ready and get colleciton name.
print(f"Type of server: {utility.get_server_version()}")

Type of server: Zilliz Cloud Vector Database(Compatible with Milvus 2.3)


In [5]:
# STEP 2. EMBEDDING MODEL.

import openai, pprint
from openai import OpenAI

# OpenAI embedding model name, `text-embedding-3-large` or `ext-embedding-3-small`.
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIM = 512 # 512 or 1536 possible for 3-small

# See how to save api key in env variable.
# https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety
openai_client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

# Get the model parameters and save for later.
response = openai_client.embeddings.create(
    input="Your text string goes here",
    model=EMBEDDING_MODEL
)
res_embedding = response.data[0].embedding
# print(f'{res_embedding[:20]} ...')
EMBEDDING_DIM = len(res_embedding)

# Inspect model parameters.
print(f"model_name: {EMBEDDING_MODEL}")
print(f"EMBEDDING_DIM: {EMBEDDING_DIM}")

model_name: text-embedding-3-small
EMBEDDING_DIM: 1536


In [6]:
# NOW TRY REDUCED DIMENSIONS!
# https://openai.com/blog/new-embedding-models-and-api-updates
# OpenAI embedding model name, `text-embedding-3-large` or `ext-embedding-3-small`.
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIM = 512 # 512 or 1536 possible for 3-small

# Get the model parameters and save for later.
response = openai_client.embeddings.create(
    input="Your text string goes here",
    model=EMBEDDING_MODEL,
    dimensions=EMBEDDING_DIM
)
res_embedding = response.data[0].embedding
# print(f'{res_embedding[:20]} ...')
EMBEDDING_DIM = len(res_embedding)

# Inspect model parameters.
print(f"model_name: {EMBEDDING_MODEL}")
print(f"EMBEDDING_DIM: {EMBEDDING_DIM}")

model_name: text-embedding-3-small
EMBEDDING_DIM: 512


In [7]:
# STEP 3. CREATE A NO-SCHEMA MILVUS COLLECTION AND DEFINE THE DATABASE INDEX.

from pymilvus import MilvusClient

# Set the Milvus collection name.
COLLECTION_NAME = "MilvusDocs_text_embedding_3_small"

# Add custom HNSW search index to the collection.
# M = max number graph connections per layer. Large M = denser graph.
# Choice of M: 4~64, larger M for larger data and larger embedding lengths.
M = 16
# efConstruction = num_candidate_nearest_neighbors per layer. 
# Use Rule of thumb: int. 8~512, efConstruction = M * 2.
efConstruction = M * 2
# Create the search index for local Milvus server.
INDEX_PARAMS = dict({
    'M': M,               
    "efConstruction": efConstruction })
index_params = {
    "index_type": "HNSW", 
    "metric_type": "COSINE", 
    "params": INDEX_PARAMS
    }

# Use no-schema Milvus client uses flexible json key:value format.
# https://milvus.io/docs/using_milvusclient.md
mc = MilvusClient(
    uri=CLUSTER_ENDPOINT,
    # API key or a colon-separated cluster username and password
    token=TOKEN)

# Check if collection already exists, if so drop it.
has = utility.has_collection(COLLECTION_NAME)
if has:
    drop_result = utility.drop_collection(COLLECTION_NAME)
    print(f"Successfully dropped collection: `{COLLECTION_NAME}`")

# Create the collection.
mc.create_collection(COLLECTION_NAME, 
                     EMBEDDING_DIM,
                     consistency_level="Eventually", 
                     auto_id=True,  
                     overwrite=True,
                     # skip setting params below, if using AUTOINDEX
                     params=index_params
                    )

print(f"Successfully created collection: `{COLLECTION_NAME}`")
print(mc.describe_collection(COLLECTION_NAME))

Successfully created collection: `MilvusDocs_text_embedding_3_small`
{'collection_name': 'MilvusDocs_text_embedding_3_small', 'auto_id': True, 'num_shards': 1, 'description': '', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': 5, 'params': {}, 'element_type': 0, 'auto_id': True, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': 101, 'params': {'dim': 512}, 'element_type': 0}], 'aliases': [], 'collection_id': 447376431941003159, 'consistency_level': 3, 'properties': {}, 'num_partitions': 1, 'enable_dynamic_field': True}


In [8]:
# STEP 4. PREPARE DATA: CHUNK AND EMBED

# Read docs into LangChain
#!pip install langchain 
from langchain.document_loaders import DirectoryLoader

# Load HTML files from a local directory
path = "../RAG/rtdocs/pymilvus.readthedocs.io/en/latest/"
loader = DirectoryLoader(path, glob='*.html')
docs = loader.load()

num_documents = len(docs)
print(f"loaded {num_documents} documents")

loaded 8 documents


In [9]:
from langchain.text_splitter import HTMLHeaderTextSplitter, RecursiveCharacterTextSplitter
from bs4 import BeautifulSoup

# Define the headers to split on for the HTMLHeaderTextSplitter
headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
]
# Create an instance of the HTMLHeaderTextSplitter
html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

# Specify chunk size and overlap.
chunk_size = 511 
chunk_overlap = np.round(chunk_size * 0.10, 0)
print(f"chunk_size: {chunk_size}, chunk_overlap: {chunk_overlap}")

# Create an instance of the RecursiveCharacterTextSplitter
child_splitter = RecursiveCharacterTextSplitter(
    chunk_size = chunk_size,
    chunk_overlap = chunk_overlap,
    length_function = len,
)

# Split the HTML text using the HTMLHeaderTextSplitter.
start_time = time.time()
html_header_splits = []
for doc in docs:
    soup = BeautifulSoup(doc.page_content, 'html.parser')
    splits = html_splitter.split_text(str(soup))
    for split in splits:
        # Add the source URL and header values to the metadata
        metadata = {}
        new_text = split.page_content
        for header_name, metadata_header_name in headers_to_split_on:
            # Handle exception if h1 does not exist.
            try:
                header_value = new_text.split("¶ ")[0].strip()[:100]
                metadata[header_name] = header_value
            except:
                break
            # Handle exception if h2 does not exist.
            try:
                new_text = new_text.split("¶ ")[1].strip()[:50]
            except:
                break
        split.metadata = {
            **metadata,
            "source": doc.metadata["source"]
        }
        # Add the header to the text
        split.page_content = split.page_content
    html_header_splits.extend(splits)

# Split the documents further into smaller, recursive chunks.
chunks = child_splitter.split_documents(html_header_splits)

end_time = time.time()
print(f"chunking time: {end_time - start_time}")
print(f"docs: {len(docs)}, split into: {len(html_header_splits)}")
print(f"split into chunks: {len(chunks)}, type: list of {type(chunks[0])}") 

# Inspect a chunk.
print()
print("Looking at a sample chunk...")
print(chunks[0].page_content[:100])
print(chunks[0].metadata)

chunk_size: 511, chunk_overlap: 51.0
chunking time: 0.01361989974975586
docs: 8, split into: 8
split into chunks: 161, type: list of <class 'langchain_core.documents.base.Document'>

Looking at a sample chunk...
pymilvus latest Table of Contents Installation Installing via pip Installing in a virtual environmen
{'h1': 'pymilvus latest Table of Contents Installation Installing via pip Installing in a virtual environmen', 'h2': 'Installing via pip', 'source': '../RAG/rtdocs/pymilvus.readthedocs.io/en/latest/install.html'}


In [10]:
# Clean up the metadata urls
for doc in chunks:
    new_url = doc.metadata["source"]
    new_url = new_url.replace("../RAG/rtdocs", "https:/")
    doc.metadata.update({"source": new_url})

print(chunks[0].page_content[:100])
print(chunks[0].metadata)

pymilvus latest Table of Contents Installation Installing via pip Installing in a virtual environmen
{'h1': 'pymilvus latest Table of Contents Installation Installing via pip Installing in a virtual environmen', 'h2': 'Installing via pip', 'source': 'https://pymilvus.readthedocs.io/en/latest/install.html'}


In [11]:
# STEP 5. INSERT CHUNKS AND EMBEDDINGS IN ZILLIZ.

# Convert chunks to a list of dictionaries.
chunk_list = []
for chunk in chunks:

    response = openai_client.embeddings.create(
        input=chunk.page_content,
        model=EMBEDDING_MODEL,
        dimensions=EMBEDDING_DIM
    )
    # OpenAI embeddings already normalized, use raw embedding values.
    embeddings = response.data[0].embedding
    
    # Only use h1, h2. Truncate the metadata in case too long.
    try:
        h2 = chunk.metadata['h2'][:50]
    except:
        h2 = ""
    # Assemble embedding vector, original text chunk, metadata.
    chunk_dict = {
        'vector': embeddings,
        'chunk': chunk.page_content,
        'source': chunk.metadata['source'],
        'h1': chunk.metadata['h1'][:50],
        'h2': h2,
    }
    chunk_list.append(chunk_dict)

# Insert data into the Milvus collection.
print("Start inserting entities")
start_time = time.time()
insert_result = mc.insert(
    COLLECTION_NAME,
    data=chunk_list,
    progress_bar=True)
end_time = time.time()
print(f"Milvus Client insert time for {len(chunk_list)} vectors: {end_time - start_time} seconds")

# After final entity is inserted, call flush to stop growing segments left in memory.
mc.flush(COLLECTION_NAME)


Start inserting entities


100%|██████████| 1/1 [00:01<00:00,  1.77s/it]


Milvus Client insert time for 161 vectors: 1.803873062133789 seconds


In [12]:
# # TODO: Uncomment to inspect a chunk row.
# pprint.pprint(chunk_list[0])

In [13]:
# Define a sample question about your data.
QUESTION1 = "What do the parameters for HNSW mean?"
QUESTION2 = "What are good default values for HNSW parameters with 25K vectors dim 1024?"
QUESTION3 = "What is the default AUTOINDEX distance metric in Milvus Client?"
QUERY = [QUESTION1, QUESTION2, QUESTION3]

# Inspect the length of the query.
QUERY_LENGTH = len(QUESTION2)
print(f"query length: {QUERY_LENGTH}")

query length: 75


In [14]:
# SELECT A PARTICULAR QUESTION TO ASK.

SAMPLE_QUESTION = QUESTION1

In [15]:
# RETRIEVAL USING MILVUS API.

# # Not needed with Milvus Client API.
# mc.load()

# Embed the question using the same encoder.
response = openai_client.embeddings.create(
    input=SAMPLE_QUESTION,
    model=EMBEDDING_MODEL,
    dimensions=EMBEDDING_DIM
)
query_embeddings = response.data[0].embedding
TOP_K = 2

# Return top k results with HNSW index.
SEARCH_PARAMS = dict({
    # Re-use index param for num_candidate_nearest_neighbors.
    "ef": INDEX_PARAMS['efConstruction']
    })

# Define output fields to return.
OUTPUT_FIELDS = ["h1", "h2", "source", "chunk"]

# Run semantic vector search using your query and the vector database.
start_time = time.time()
results = mc.search(
    COLLECTION_NAME,
    data=[query_embeddings], 
    search_params=SEARCH_PARAMS,
    output_fields=OUTPUT_FIELDS, 
    # Milvus can utilize metadata in boolean expressions to filter search.
    # filter="",
    limit=TOP_K,
    consistency_level="Eventually"
    )

elapsed_time = time.time() - start_time
print(f"Milvus Client search time for {len(chunk_list)} vectors: {elapsed_time} seconds")

# Inspect search result.
print(f"type: {type(results[0])}, count: {len(results[0])}")

Milvus Client search time for 161 vectors: 0.059777021408081055 seconds
type: <class 'list'>, count: 2


In [16]:
# Assemble `num_shot_answers` retrieved 1st context and context metadata.
METADATA_FIELDS = [f for f in OUTPUT_FIELDS if f != 'chunk']
formatted_results, context, context_metadata = _utils.client_assemble_retrieved_context(
    results, metadata_fields=METADATA_FIELDS, num_shot_answers=3)
print(f"Length context: {len(context[0])}, Number of contexts: {len(context)}")

# TODO - Uncomment to loop through each context and metadata and print.
for i in range(len(context)):
    print(f"Retrieved result #{i+1}")
    print(f"distance = {formatted_results[i][0]}")
    print(f"Context: {context[i][:150]}")
    print(f"Metadata: {context_metadata[i]}")
    print()

Length context: 509, Number of contexts: 2
Retrieved result #1
distance = 0.5799391269683838
Context: count. HNSW¶ HNSW (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm. It builds a multi-layer navigation structure for an 
Metadata: {'h1': 'pymilvus latest Table of Contents Installation Tut', 'h2': 'Milvus support to create index to accelerate vecto', 'source': 'https://pymilvus.readthedocs.io/en/latest/param.html'}

Retrieved result #2
distance = 0.5569350719451904
Context: HNSW client create_index collection_name IndexType HNSW "M" 16 # int. 4~64 "efConstruction" 40 # int. 8~512 search parameters: ef: Take the effect in 
Metadata: {'h1': 'pymilvus latest Table of Contents Installation Tut', 'h2': 'Milvus support to create index to accelerate vecto', 'source': 'https://pymilvus.readthedocs.io/en/latest/param.html'}



In [17]:
import openai, pprint
from openai import OpenAI

# Define the generation llm model to use.
# https://openai.com/blog/new-embedding-models-and-api-updates
# Customers using the pinned gpt-3.5-turbo model alias will be automatically upgraded to gpt-3.5-turbo-0125 two weeks after this model launches.
LLM_NAME = "gpt-3.5-turbo"
TEMPERATURE = 0.1
RANDOM_SEED = 415

# Separate all the context together by space.
contexts_combined = ' '.join(context)

SYSTEM_PROMPT = f"""Use the Context below to answer the user's question. Be clear, factual, complete, concise.
If the answer is not in the Context, say "I don't know". 
Otherwise answer with fewer than 4 sentences and cite the grounding sources.
Context: contexts_combined
Answer: The answer to the question.
Grounding sources: {context_metadata[0]['source']}
"""

# Generate response using the OpenAI API.
response = openai_client.chat.completions.create(
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT,},
        {"role": "user", "content": f"question: {SAMPLE_QUESTION}",}
    ],
    model=LLM_NAME,
    temperature=TEMPERATURE,
    seed=RANDOM_SEED,
)

# Print the question and answer along with grounding sources and citations.
print(f"Question: {SAMPLE_QUESTION}")

# Print all choices in the response
for i, choice in enumerate(response.choices, 1):
    # Print the answer
    pprint.pprint(f"Answer: {choice.message.content}")
    print("\n")

# Question1: What do the parameters for HNSW mean?
# Answer: Perfect!
# Best answer:  M: maximum degree of nodes in a layer of the graph. 
# efConstruction: number of nearest neighbors to consider when connecting nodes in the graph.
# ef: number of nearest neighbors to consider when searching for similar vectors. 

Question: What do the parameters for HNSW mean?
('Answer: The parameters for HNSW in Milvus refer to the configuration options '
 'for the Hierarchical Navigable Small World index. These parameters include '
 '"M" for the number of bi-directional links created for each element during '
 'construction, "efConstruction" for the size of the dynamic list for the '
 'nearest neighbors search, and "ef" for the size of the dynamic list for the '
 'nearest neighbors search during query. These parameters help optimize the '
 'performance of the index for efficient similarity search operations. \n'
 'Source: https://pymilvus.readthedocs.io/en/latest/param.html')




In [18]:
# Drop collection
utility.drop_collection(COLLECTION_NAME)

In [19]:
# Props to Sebastian Raschka for this handy watermark.
# !pip install watermark

%load_ext watermark
%watermark -a 'Christy Bergman' -v -p pymilvus,langchain,openai --conda

Author: Christy Bergman

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.21.0

pymilvus : 2.3.6
langchain: 0.1.5
openai   : 1.11.1

conda environment: py311

