# WatsonX.ai with Chroma and LangChain

This guide demonstrates how to build an Watsonx.ai LLM-driven question-answering application with Chroma and LangChain

## Set up the environment
Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a Watson [Machine Learning (WML)](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) Service instance (a free plan is offered and information about how to create the instance can be found [here](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics)).

### Install and import dependecies

In [1]:
from IPython.display import clear_output
!pip install "langchain==0.0.345" 
!pip install wget 
!pip install sentence-transformers 
!pip install chromadb==0.3.22 
!pip install "ibm-watson-machine-learning>=1.0.335" 
!pip install "pydantic>=1.4.0,<2" 
!pip install bs4
!pip install ipywidgets
#clear_output()
from langchain.vectorstores import Chroma

Collecting langchain==0.0.345
  Using cached langchain-0.0.345-py3-none-any.whl (2.0 MB)
Collecting pydantic<3,>=1
  Using cached pydantic-2.6.1-py3-none-any.whl (394 kB)
Collecting PyYAML>=5.3
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Collecting requests<3,>=2
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting langsmith<0.1.0,>=0.0.63
  Using cached langsmith-0.0.92-py3-none-any.whl (56 kB)
Collecting langchain-core<0.1,>=0.0.9
  Using cached langchain_core-0.0.13-py3-none-any.whl (188 kB)
Collecting jsonpatch<2.0,>=1.33
  Using cached jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting anyio<4.0
  Using cached anyio-3.7.1-py3-none-any.whl (80 kB)
Collecting aiohttp<4.0.0,>=3.8.3
  Using cached aiohttp-3.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
Collecting numpy<2,>=1
  Using cached numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Collecting async-tim

In [2]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.zilliz import Zilliz
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.prompts import PromptTemplate

loader = WebBaseLoader([
    'https://milvus.io/docs/overview.md',
    'https://milvus.io/docs/release_notes.md',
    'https://milvus.io/docs/architecture_overview.md',
    'https://milvus.io/docs/four_layers.md',
    'https://milvus.io/docs/main_components.md',
    'https://milvus.io/docs/data_processing.md',
    'https://milvus.io/docs/bitset.md',
    'https://milvus.io/docs/boolean.md',
    'https://milvus.io/docs/consistency.md',
    'https://milvus.io/docs/coordinator_ha.md',
    'https://milvus.io/docs/replica.md',
    'https://milvus.io/docs/knowhere.md',
    'https://milvus.io/docs/schema.md',
    'https://milvus.io/docs/dynamic_schema.md',
    'https://milvus.io/docs/json_data_type.md',
    'https://milvus.io/docs/metric.md',
    'https://milvus.io/docs/partition_key.md',
    'https://milvus.io/docs/multi_tenancy.md',
    'https://milvus.io/docs/timestamp.md',
    'https://milvus.io/docs/users_and_roles.md',
    'https://milvus.io/docs/index.md',
    'https://milvus.io/docs/disk_index.md',
    'https://milvus.io/docs/scalar_index.md',
    'https://milvus.io/docs/performance_faq.md',
    'https://milvus.io/docs/product_faq.md',
    'https://milvus.io/docs/operational_faq.md',
    'https://milvus.io/docs/troubleshooting.md',
])
docs = loader.load()

In [3]:
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
texts  = text_splitter.split_documents(docs)

After preparing the documents, the next step is to convert them into vector embeddings and save them in the vector store.

In [4]:
#Create an embedding function
# The performance of Elasticsearch may differ depending on the embedding model used.
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.embeddings.base import Embeddings
from langchain.vectorstores.milvus import Milvus
#emb_func = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

In [5]:
from langchain.vectorstores import Chroma
docsearch = Chroma.from_documents(texts, embeddings)

Using embedded DuckDB without persistence: data will be transient
No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction


To perform text-to-text similarity searches, use the following code snippet. The results will return the most relevant text in the document to the queries.

In [8]:
query = "What are the main components of Milvus?"
docs = docsearch.similarity_search(query)
print(len(docs))

4


In [9]:
docs[0].page_content

'Knowhere in the Milvus architecture.'

## watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation Model inferencing.
Action: Provide the IBM Cloud user API key. For details, see [documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).

Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

Hint: You can find the project_id as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be Projects / <project name> /. Click on the <project name> link. Then get the project_id from Project's Manage tab (Project -> Manage -> General -> Details).


In [11]:
from dotenv import load_dotenv
import os
load_dotenv()
try:
    API_KEY = os.environ.get("API_KEY")
    project_id =os.environ.get("PROJECT_ID")
except KeyError:
    API_KEY: input("Please enter your WML api key (hit enter): ")
    project_id  = input("Please  project_id (hit enter): ")

In [12]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": API_KEY  
}

## Foundation Models on watsonx
### Defining model
You need to specify model_id that will be used for inferencing:



In [13]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes
model_id = ModelTypes.GRANITE_13B_CHAT_V2

## Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [15]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods
parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

LangChain CustomLLM wrapper for watsonx model
Initialize the WatsonxLLM class from Langchain with defined parameters and ibm/granite-13b-chat-v2

In [16]:
from langchain.llms import WatsonxLLM
watsonx_granite = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

Generate a retrieval-augmented response to a question
Build the RetrievalQA (question answering chain) to automate the RAG task.


In [17]:
#retriever = vector_store.as_retriever()
retriever =docsearch.as_retriever()

In [18]:
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain.vectorstores.chroma.Chroma object at 0x7f4fb468e5f0>)

In [19]:
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=watsonx_granite,
                                  chain_type="stuff", 
                                  retriever=retriever)


## Ask your question
After preparing the documents, you can set up a chain to include them in a prompt. This will allow LLM to use the docs as a reference when preparing answers.
Get questions from the previously loaded dataset.

In [20]:
query = "What are the main components of Milvus?"
qa.run(query)

"\nMilvus standalone includes three components:\n\n\nMilvus: The core functional component.\n\nMetadata engine: Accesses and stores metadata of Milvus' internal components, including proxies, index nodes, and more.\n\nStorage engine: Responsible for data persistence for Milvus.\n\nMilvus cluster includes eight microservice components and three third-party dependencies. All microservices can be deployed on Kubernetes, independently from each other.\n\nMicroservice components"

In [21]:
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
rag_prompt = PromptTemplate.from_template(template)

In [22]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | watsonx_granite
)

In [23]:
print(rag_chain.invoke("Explain IVF_FLAT in Milvus."))



 IVF_FLAT is a type of index in Milvus that divides vector space into list clusters. At the default list value of 16,384, Milvus compares the distances between the target vector and the centroids of all 16,384 clusters to return probe nearest clusters. It then compares the distances between the target vector and the vectors in the selected clusters to get the nearest vectors. This method is different from FLAT, which directly compares the distances between the target vector and every
