# Retrieval Augmented Generation (RAG) with WatsonX, Langchain, and Chroma

Based on the IBM Cloud notebook [Use WatsonX, Chroma, and LangChain to answer questions (RAG)](https://dataplatform.cloud.ibm.com/exchange/public/entry/view/d3a5f957-a93b-46cd-82c1-c8d37d4f62c6?context=wx)

## In this notebook
This notebook contains instructions for performing Retrieval Augumented Generation (RAG) in watsonx.ai. RAG is an architectural pattern that can be used to augment the performance of language models by recalling factual information from a knowledge base, and adding that information to the model query.

RAG use cases include:
- Customer service: Answering questions about a product or service using facts from the product documentation.
- Domain knowledge: Exploring a specialized domain (e.g., finance) using facts from papers or articles in the knowledge base.
- News chat: Chatting about current events by calling up relevant recent news articles.

In its simplest form, RAG requires 3 steps:

- Initial setup:
  - Index knowledge-base passages for efficient retrieval. In this recipe, we take embeddings of the passages using WatsonX, and store them in a vector database.
- Upon each user query:
  - Retrieve relevant passages from the database. In this recipe, we using an embedding of the query to retrieve semantically similar passages.
  - Generate a response by feeding retrieved passage into a large language model, along with the user query.

## Setting up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a [Watson Machine Learning (WML) Service](https://cloud.ibm.com/catalog/services/watson-machine-learning) instance using the free plan. Information about how to create the instance can be found [here](https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp).

### Install and import the dependencies

This notebook was tested with Python 3.10.15.

In [1]:
!pip install "langchain-core" -U | tail -n 1
!pip install "langchain" | tail -n 1
!pip install "ibm-watsonx-ai" | tail -n 1
!pip install "langchain_ibm" | tail -n 1
!pip install "wget" | tail -n 1
!pip install "sentence-transformers" | tail -n 1
!pip install "chromadb" | tail -n 1
!pip install "pydantic" | tail -n 1
!pip install "sqlalchemy" | tail -n 1
!pip install "langchain-community" | tail -n 1



### Set up the WatsonX API connection credentials
Provide the IBM Cloud user API key in an environment variable called `WATSONX_API_KEY`, or at the input. 

If you don't have an API key, you should be able to [create one here](https://cloud.ibm.com/iam/apikeys). For more details, see the <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

NOTE: A string that starts with `cpd-apikey` or `APIKey` is _not_ the key value, but the name or id of the key.

In [2]:
import os

try:
    apikey = os.environ["WATSONX_APIKEY"]
except KeyError:
    from getpass import getpass
    apikey = getpass("Please enter your apikey (hit enter): ")

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": apikey,
}

Please enter your apikey (hit enter):  ········


### Provide the project id

Provide the WatsonX Project Id in an environment variable called `WATSONX_PROJECT_ID`, or at the input.

You can find the `Project ID` for your project like this:
* Visit the [Projects page in WatsonX](https://dataplatform.cloud.ibm.com/projects/?context=wx).
* From the list of projects, click on your project's name to get to the individual Project page.
* From there, navigate to the Manage tab and to the General view, where you will see the `Project ID` in the Details section.

In [3]:
try:
    project_id = os.environ["WATSONX_PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

Please enter your project_id (hit enter):  hello


## Building the Database

The most common approach in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take the State of the Union speech content (filename), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

### Download the document

Here we use President Biden's State of the Union address from March 1, 2022.

In [5]:
import wget

filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(filename):
    wget.download(url, out=filename)

### Split the document into chunks

Split the document into text segments that can fit into the model's context window.

In [6]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

### Create an embedding function

Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service. We can check available embedding models using `get_embedding_model_specs`

In [7]:
from ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs

get_embedding_model_specs(credentials.get('url'))

{'total_count': 7,
 'limit': 100,
 'first': {'href': 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2023-09-30&filters=function_embedding'},
 'resources': [{'model_id': 'cross-encoder/ms-marco-minilm-l-12-v2',
   'label': 'ms-marco-minilm-l-12-v2',
   'provider': 'cross-encoder',
   'source': 'cross-encoder',
   'functions': [{'id': 'embedding'}],
   'short_description': 'Used for Information Retrieval: Encode and sort a query will all possible passages.',
   'long_description': 'The model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order.',
   'input_tier': 'class_c1',
   'output_tier': 'class_c1',
   'number_params': '33.4m',
   'limits': {'lite': {'call_time': '5m0s'},
    'v2-professional': {'call_time': '10m0s'},
    'v2-standard': {'call_time': '10m0s'}},
   'lifecycle': [{'id': 'available', 'start_date': '2024-09-17'}]},
  {'m

In [8]:

from langchain_ibm import WatsonxEmbeddings
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes

embeddings = WatsonxEmbeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id
    )
docsearch = Chroma.from_documents(texts, embeddings)

#### Compatibility watsonx.ai Embeddings with LangChain

 LangChain retrievals use `embed_documents` and `embed_query` under the hood to generate embedding vectors for uploaded documents and user query respectively.

In [9]:
help(WatsonxEmbeddings)

Help on class WatsonxEmbeddings in module langchain_ibm.embeddings:

class WatsonxEmbeddings(pydantic.main.BaseModel, langchain_core.embeddings.embeddings.Embeddings)
 |  WatsonxEmbeddings(*, model_id: str, project_id: Optional[str] = None, space_id: Optional[str] = None, url: pydantic.types.SecretStr = None, apikey: Optional[pydantic.types.SecretStr] = None, token: Optional[pydantic.types.SecretStr] = None, password: Optional[pydantic.types.SecretStr] = None, username: Optional[pydantic.types.SecretStr] = None, instance_id: Optional[pydantic.types.SecretStr] = None, version: Optional[pydantic.types.SecretStr] = None, params: Optional[dict] = None, verify: Union[str, bool, NoneType] = None, watsonx_embed: ibm_watsonx_ai.foundation_models.embeddings.embeddings.Embeddings = None, watsonx_client: Optional[ibm_watsonx_ai.client.APIClient] = None) -> None
 |  
 |  IBM watsonx.ai embedding models.
 |  
 |  Method resolution order:
 |      WatsonxEmbeddings
 |      pydantic.main.BaseModel
 | 

## Configuring the Model

IBM watsonx foundation models are among the [LLM models supported by Langchain](https://python.langchain.com/docs/integrations/llms/ibm_watsonx/).

### Select the model
Specify the `model_id` of the model that will be used for inferencing.

In [10]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.GRANITE_13B_CHAT_V2
model_id

<ModelTypes.GRANITE_13B_CHAT_V2: 'ibm/granite-13b-chat-v2'>

### Define the model parameters
Provide the parameters that will be used to configure the model.

In [11]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### Initialize the WatsonX model
Using a CustomLLM wrapper from Langchain, initialize the model with defined parameters and the model specified above.

In [12]:
from langchain_ibm import WatsonxLLM

watsonx_granite = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

## Answering Queries

### Automate the RAG pipeline

Build a question-answering chain with the model and the document retriever.

In [13]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff", retriever=docsearch.as_retriever())

### Generate a retrieval-augmented response to a question

Use the question-answering chain to process the query. 

NOTE: This may take a little while if you are on the 'Lite' Watson Machine Learning plan.

In [14]:
query = "What did the president say about Ketanji Brown Jackson"
qa.invoke(query)

{'query': 'What did the president say about Ketanji Brown Jackson',
 'result': ' The president said, "One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence." This statement was made in reference to Ketanji Brown Jackson, who was nominated by the president to serve on the United States Supreme Court.'}