![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx Granite Model Series, Chroma, and LangChain to answer questions (RAG)

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

## Notebook content
This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.10.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Document data loading](#data)
- [Build up knowledge base](#build_base)
- [Foundation Models on watsonx](#models)
- [Generate a retrieval-augmented response to a question](#predict)
- [Summary and next steps](#summary)


<a id="setup"></a>
##  Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).


### Install and import the dependecies

In [1]:
!pip install "langchain==0.3.7" | tail -n 1
!pip install "ibm-watsonx-ai==1.1.23" | tail -n 1
!pip install "langchain_ibm==0.3.3" | tail -n 1
!pip install "wget==3.2" | tail -n 1
!pip install "sentence-transformers==3.3.0" | tail -n 1
!pip install "chromadb==0.5.18" | tail -n 1
!pip install "pydantic==2.9.2" | tail -n 1
!pip install "sqlalchemy==2.0.35" | tail -n 1
!pip install "faiss-cpu==1.9.0" | tail -n 1
!pip install "flashrank==0.2.9" | tail -n 1

!pip install "langchain-community==0.3.7" | tail -n 1
!pip install "pymupdf==1.24.13" | tail -n 1



In [2]:
import os, getpass

### watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.
**Link to directly add an API key** https://cloud.ibm.com/iam/apikeys ""Make sure that your are under the correct cloud instance**


### Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [3]:
# My Environment
os.environ["PROJECT_ID"]="4f6edd25-9060-4a3f-a906-b3915545a5a9"
api_key = "7j4m2zAuBZYdelHTzFz5OC0pwRtI09wz7BHsj_mc7Jse"

In [4]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
		"apikey" : getpass.getpass("Please enter your api key (hit enter): ") if ( api_key == "") else api_key
}   

<a id="data"></a>
## Document data loading

Download the file with State of the Union.

In [5]:
print (credentials)

{'url': 'https://us-south.ml.cloud.ibm.com', 'apikey': '7j4m2zAuBZYdelHTzFz5OC0pwRtI09wz7BHsj_mc7Jse'}


In [6]:
import wget

filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(filename):
    wget.download(url, out=filename)

<a id="build_base"></a>
## Build up knowledge base

The most common approach in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take the State of the Union speech content (filename), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

In [7]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

### Create an embedding function

Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service. We can check available embedding models using `get_embedding_model_specs`

In [8]:
from ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs

get_embedding_model_specs(credentials.get('url'))

{'total_count': 7,
 'limit': 100,
 'first': {'href': 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2023-09-30&filters=function_embedding'},
 'resources': [{'model_id': 'ibm/slate-125m-english-rtrvr',
   'label': 'slate-125m-english-rtrvr',
   'provider': 'IBM',
   'source': 'IBM',
   'functions': [{'id': 'autoai_rag'},
    {'id': 'embedding'},
    {'id': 'rerank'},
    {'id': 'similarity'}],
   'short_description': 'An embedding model. It has 125 million parameters and an embedding dimension of 768.',
   'long_description': "This model follows the standard 'sentence transformers' approach, relying on bi-encoders. It generates embeddings for various inputs such as queries, passages, or documents. The training objective is to maximize cosine similarity between two text pieces: text A (query text) and text B (passage text). This process yields sentence embeddings q and p, allowing for comparison through cosine similarity.",
   'input_tier': 'class_c1',
   'output

In [9]:
from langchain_ibm import WatsonxEmbeddings
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes

embeddings = WatsonxEmbeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id
    )
docsearch = Chroma.from_documents(texts, embeddings)

#### Compatibility watsonx.ai Embeddings with LangChain

 LangChain retrievals use `embed_documents` and `embed_query` under the hood to generate embedding vectors for uploaded documents and user query respectively.

<a id="models"></a>
## Foundation Models on `watsonx.ai`

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Defining model
You need to specify `model_id` that will be used for inferencing:

In [16]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.GRANITE_13B_CHAT_V2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [17]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### LangChain CustomLLM wrapper for watsonx model
Initialize the `WatsonxLLM` class from Langchain with defined parameters and `ibm/granite-13b-chat-v2`. 

In [18]:
from langchain_ibm import WatsonxLLM

watsonx_granite = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)



<a id="predict"></a>
## Generate a retrieval-augmented response to a question

Build the `RetrievalQA` (question answering chain) to automate the RAG task.

In [19]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=watsonx_granite, 
                                 chain_type="stuff", 
                                 retriever=docsearch.as_retriever(),
                                return_source_documents=True)

In [20]:
RetrievalQA

langchain.chains.retrieval_qa.base.RetrievalQA

## Generate a retrieval-augmented response to a question

In [21]:
query = "What did the president say about Ketanji Brown Jackson"
output = qa.invoke(query)
answer = output["result"]
context = [item.page_content for item in output["source_documents"]]

In [22]:
print(query)

What did the president say about Ketanji Brown Jackson


In [23]:
print(answer)

 The president said, "One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence." This statement was made in reference to Ketanji Brown Jackson, who was nominated by the president to serve on the United States Supreme Court.


In [24]:
print(context)

['Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', 'And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year,

Exercise: Try writing your own query and print out how many pieces of context are used

In [25]:
query = ""
num_context = 0
print(num_context)

0


## Metrics

Use an LLM to calculate answer relevance

In [26]:
def answer_relevance(model, query, rag_response):

    template: str = f"""[INST] Your task is to ascertain the relevance of the provided answer by comparing it against the given query. You are to answer with either YES or NO:
        - Answer YES if the answer accurately aligns with or is relevant to any aspect of the query, irrespective of other unrelated content.
        - Answer NO if the answer does not align with the query or introduces information not relevant to the query, indicating a potential hallucination or factual inaccuracy.
        Avoid providing any additional explanations with your YES or NO response.

        Information: {rag_response}
        Query: {query}
        Answer:[/INST]"""

    # print(template)
    # print("-------")
    evaluate_response = model.generate([template])
    # print(evaluate_response)
    
    return 'YES' in evaluate_response.generations[0][0].text.upper()   

In [27]:
answer_relevance(watsonx_granite, query, answer)

True

In [28]:
# Note that the model is able to tell when the answer has nothing to do with the query
answer_relevance(watsonx_granite, query, "nonsensical answer")

False

In [29]:
# Note that the model is NOT able to tell when the query is not relevant to the answer 
answer_relevance(watsonx_granite, "random query", answer)

False

# Exercises:

1. Create similar functions for both faithfulness and context relevance

2. Create a list of questions for this dataset or for your own dataset

3. Create a function to compute average evaluation metrics for all the questions. These will be your RAG triad metrics.

4. Use such function to evaluate different RAG techniques (HyDe, fusion, query transformations, parent-child, etc.)

In [30]:
def my_answer_relevance1(model, query, rag_response):

    template: str = f"""[INST] Your task is to determine if an answer to a given question is relevant. You are to answer with either YES or NO:
        - Answer YES if the answer accurately aligns with the question
        - Answer NO if the answer does not align with the question or introduces information not relevant to the quesiton.
        Avoid providing any additional explanations with the YES or NO response.

        Information: {rag_response}
        Query: {query}
        Answer:[/INST]"""

    # print(template)
    # print("-------")
    evaluate_response = model.generate([template])
    # print(evaluate_response)
    
    return 'YES' in evaluate_response.generations[0][0].text.upper()   

In [32]:
my_answer_relevance1(watsonx_granite, query, "nonsensical answer")

False

In [33]:
my_answer_relevance1(watsonx_granite, query, answer)

True

In [34]:
my_answer_relevance1(watsonx_granite, "random query", answer)

False

In [52]:
# New questions for the data set
new_questions = [ 
"What is the president asking the Senate to do tonight",
"Who gave their support to Ketanji Jackson",
"Who gave their support to Ketanji Brown Jackson",
"Who are the members of NATO mentioned in the text" ]


In [53]:

for q in new_questions:
    output = qa.invoke(q)
    answer = output["result"]
    print( "Question: " + q + " Answer: " + answer ) 

    print ("Relevance: " + str(my_answer_relevance1(watsonx_granite, q, answer)))

Question: What is the president asking the Senate to do tonight Answer:  The president is asking the Senate to pass the Freedom to Vote Act, the John Lewis Voting Rights Act, and the Disclose Act.

Explanation: The president's speech includes a call for the Senate to pass several pieces of legislation. Specifically, he mentions the Freedom to Vote Act, the John Lewis Voting Rights Act, and the Disclose Act. These are the bills that the president is asking the Senate to pass tonight.
Relevance: True
Question: Who gave their support to Ketanji Jackson Answer:  President Joe Biden
Relevance: False
Question: Who gave their support to Ketanji Brown Jackson Answer:  A broad range of support - from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.
Relevance: True
Question: Who are the members of NATO mentioned in the text Answer:  The members of NATO mentioned in the text are Poland, Romania, Latvia, Lithuania, and Estonia.
Relevance: True


In [51]:
print ( query)

What is the president asking the Senate to do tonight


In [40]:
print (answer)

 The president is asking the Senate to pass the Freedom to Vote Act, the John Lewis Voting Rights Act, and the Disclose Act.

Explanation: The president's speech includes a call for the Senate to pass several pieces of legislation. Specifically, he mentions the Freedom to Vote Act, the John Lewis Voting Rights Act, and the Disclose Act. These are the bills that the president is asking the Senate to pass tonight.


In [41]:
print( context)

['Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', 'And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year,

<a id="summary"></a>
## Summary and next steps

 You successfully completed this notebook!.
 
 You learned how to answer question using RAG using watsonx and LangChain.
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

Copyright © 2023, 2024 IBM. This notebook and its source code are released under the terms of the MIT License.