# WatsonX.ai with Milvus and LangChain

This guide demonstrates how to build an Watsonx.ai LLM-driven question-answering application with Milvus and LangChain

## Set up the environment
Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a Watson [Machine Learning (WML)](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) Service instance (a free plan is offered and information about how to create the instance can be found [here](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics)).

### Install and import dependecies

In [2]:
!python -m pip install pymilvus "langchain>=0.0.353" openai tiktoken bs4 sentence_transformers "ibm-watson-machine-learning>=1.0.327"  humanize  pandas  rouge_score ibm_watsonx_ai  chromadb "pydantic>=1.4.0,<2"

[0m

In [4]:
import os
from dotenv import load_dotenv
load_dotenv()
COLLECTION_NAME = 'doc_qa_db'
DIMENSION = 768
MILVUS_PORT = "19530"
REMOTE_SERVER = os.environ.get("REMOTE_SERVER", "localhost")

In [5]:
from pymilvus import connections
connections.connect(host=REMOTE_SERVER, port=MILVUS_PORT)

If the collection already exists, drop it.

In [6]:
from pymilvus import utility
if utility.has_collection(COLLECTION_NAME):
    utility.drop_collection(COLLECTION_NAME)

In [7]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.zilliz import Zilliz
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.prompts import PromptTemplate

loader = WebBaseLoader([
    'https://milvus.io/docs/overview.md',
    'https://milvus.io/docs/release_notes.md',
    'https://milvus.io/docs/architecture_overview.md',
    'https://milvus.io/docs/four_layers.md',
    'https://milvus.io/docs/main_components.md',
    'https://milvus.io/docs/data_processing.md',
    'https://milvus.io/docs/bitset.md',
    'https://milvus.io/docs/boolean.md',
    'https://milvus.io/docs/consistency.md',
    'https://milvus.io/docs/coordinator_ha.md',
    'https://milvus.io/docs/replica.md',
    'https://milvus.io/docs/knowhere.md',
    'https://milvus.io/docs/schema.md',
    'https://milvus.io/docs/dynamic_schema.md',
    'https://milvus.io/docs/json_data_type.md',
    'https://milvus.io/docs/metric.md',
    'https://milvus.io/docs/partition_key.md',
    'https://milvus.io/docs/multi_tenancy.md',
    'https://milvus.io/docs/timestamp.md',
    'https://milvus.io/docs/users_and_roles.md',
    'https://milvus.io/docs/index.md',
    'https://milvus.io/docs/disk_index.md',
    'https://milvus.io/docs/scalar_index.md',
    'https://milvus.io/docs/performance_faq.md',
    'https://milvus.io/docs/product_faq.md',
    'https://milvus.io/docs/operational_faq.md',
    'https://milvus.io/docs/troubleshooting.md',
])
docs = loader.load()

In [8]:
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
texts  = text_splitter.split_documents(docs)

After preparing the documents, the next step is to convert them into vector embeddings and save them in the vector store.

In [9]:
#Create an embedding function
# The performance of Elasticsearch may differ depending on the embedding model used.
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.embeddings.base import Embeddings
from langchain.vectorstores.milvus import Milvus
emb_func = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

In [10]:
from langchain.vectorstores import Chroma
docsearch = Chroma.from_documents(texts, embeddings)

In [11]:
connection_args={"host": REMOTE_SERVER, "port": MILVUS_PORT}
vector_store = Milvus(
    #embedding_function=emb_func,
    embedding_function=embeddings,
    connection_args=connection_args,
    collection_name=COLLECTION_NAME,
    drop_old=True,
).from_documents(
    texts,
    #embedding=emb_func,
    embedding=embeddings,
    collection_name=COLLECTION_NAME,
    connection_args=connection_args,
)


And here is how you retrieve that stored collection

In [12]:
vector_db = Milvus(
    embeddings,
    connection_args={"host": REMOTE_SERVER, "port": MILVUS_PORT},
    collection_name=COLLECTION_NAME,
)

In [13]:
query = "What are the main components of Milvus?"
docs = vector_db.similarity_search(query)

In [14]:
docs[0].page_content

'Knowhere in the Milvus architecture.'

To perform text-to-text similarity searches, use the following code snippet. The results will return the most relevant text in the document to the queries.

In [15]:
query = "What are the main components of Milvus?"
docs = vector_store.similarity_search(query)
print(len(docs))

4


In [16]:
docs[0].page_content

'Knowhere in the Milvus architecture.'

In [17]:
# This will only documents related with the query
query = "What are the main components of Milvus?"
docs_search =vector_store.as_retriever(
).get_relevant_documents(query)

## watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation Model inferencing.
Action: Provide the IBM Cloud user API key. For details, see [documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

Hint: You can find the project_id as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be Projects / <project name> /. Click on the <project name> link. Then get the project_id from Project's Manage tab (Project -> Manage -> General -> Details).


In [18]:
from dotenv import load_dotenv
import os
load_dotenv()
try:
    API_KEY = os.environ.get("API_KEY")
    project_id =os.environ.get("PROJECT_ID")
except KeyError:
    API_KEY: input("Please enter your WML api key (hit enter): ")
    project_id  = input("Please  project_id (hit enter): ")

In [19]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": API_KEY  
}

## Foundation Models on watsonx
### Defining model
You need to specify model_id that will be used for inferencing:



In [20]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
model_id = ModelTypes.GRANITE_13B_CHAT_V2

## Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [21]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods
parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

LangChain CustomLLM wrapper for watsonx model
Initialize the WatsonxLLM class from Langchain with defined parameters and ibm/granite-13b-chat-v2

In [22]:
from langchain.llms import WatsonxLLM
watsonx_granite = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

Generate a retrieval-augmented response to a question
Build the RetrievalQA (question answering chain) to automate the RAG task.


In [23]:
retriever = vector_store.as_retriever()

In [24]:
retriever

VectorStoreRetriever(tags=['Milvus', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.milvus.Milvus object at 0x7f374814ffd0>)

In [25]:
from langchain.chains import RetrievalQA
#qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff", retriever=vector_store.as_retriever())
#qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff", retriever=retriever)
qa = RetrievalQA.from_chain_type(llm=watsonx_granite,
                                  chain_type="stuff", 
                                  retriever=docsearch.as_retriever())


## Ask your question
After preparing the documents, you can set up a chain to include them in a prompt. This will allow LLM to use the docs as a reference when preparing answers.
Get questions from the previously loaded dataset.

In [26]:
query = "What are the main components of Milvus?"
qa.run(query)

  warn_deprecated(


TypeError: Object of type DecodingMethods is not JSON serializable

In [None]:
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
rag_prompt = PromptTemplate.from_template(template)

In [None]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | watsonx_granite
)

In [None]:
print(rag_chain.invoke("Explain IVF_FLAT in Milvus."))



TypeError: Object of type DecodingMethods is not JSON serializable