# Building a basic domain specific question and answer chat application(RAG)

## Notebook content
This notebook contains the steps and code to demonstrate Retrieval Augumented Generation to build a chat application to answer questions specific to a domain. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Introduction to RAG](#intro)
- [Setup](#setup)
- [Document data loading](#data)
- [Build up knowledge base](#build_base)
- [Foundation Models on watsonx](#models)
- [Generate a retrieval-augmented response to a question](#predict)
- [References](#references)


<a id="intro"></a>
## Introduction to Retrieval Augmented Generation(RAG)

RAG implementation flow:
![image](https://dataplatform.cloud.ibm.com/docs/api/content/wsj/analyze-data/images/fm-rag-embed.svg?context=wx&locale=en)

<a id="setup"></a>
##  Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).


### Install and import the dependecies

In [None]:
!pip install "langchain==0.1.10" | tail -n 1
!pip install "ibm-watsonx-ai>=0.2.6" | tail -n 1
!pip install -U langchain_ibm | tail -n 1
!pip install wget | tail -n 1
!pip install sentence-transformers | tail -n 1
!pip install "chromadb==0.3.26" | tail -n 1
!pip install "pydantic==1.10.0" | tail -n 1
!pip install "sqlalchemy==2.0.1" | tail -n 1
!pip install "pypdf" | tail -n 1


In [1]:
import os, getpass

### watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

In [18]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": "****" 
}

### Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [32]:
project_id = "***"

<a id="data"></a>
## Document data loading

Download the file with State of the Union.

In [None]:
import requests
import os

filename = 'itr-faq.pdf'
url = 'https://www.incometax.gov.in/iec/foportal/sites/default/files/2024-06/Top%2010%20issues%20of%20taxpayers%20updated.pdf'

if not os.path.isfile(filename):
    response = requests.get(url)
    if response.status_code == 200:
        with open(filename, 'wb') as f:
            f.write(response.content)
        print(f"Downloaded {filename}")
    else:
        print(f"Failed to download file. Status code: {response.status_code}")


In [33]:
ls -ltr itr-faq*

-rw-r--r--@ 1 krishna  staff  231539 Jul 21 14:21 itr-faq.pdf


<a id="build_base"></a>
## Build up knowledge base

The most common approach in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take the Income tax returns FAQ content (in PDF file), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

In [22]:
from langchain.document_loaders import TextLoader, PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = PyPDFLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

The dataset we are using is already split into self-contained passages that can be ingested by Chroma.

### Create an embedding function

Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service. We can check available embedding models using `get_embedding_model_specs`

In [23]:
from ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs

get_embedding_model_specs(credentials.get('url'))

{'total_count': 5,
 'limit': 100,
 'first': {'href': 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2023-09-30&filters=function_embedding'},
 'resources': [{'model_id': 'baai/bge-large-en-v1',
   'label': 'bge-large-en-v1',
   'provider': 'baai',
   'source': 'baai',
   'functions': [{'id': 'embedding'}],
   'short_description': 'An embedding model with version 1.5. It has 335 million parameters and an embedding dimension of 1024.',
   'long_description': 'This model has multi-functionality like dense retrieval, sparse retrieval, multi-vector, Multi-Linguality, and Multi-Granularity(8192 tokens)',
   'input_tier': 'class_c1',
   'output_tier': 'class_c1',
   'number_params': '335m',
   'limits': {'consumer-service-default': {'call_time': '10m0s'},
    'consumer-user-default': {'call_time': '10m0s'},
    'devops': {'call_time': '2m0s',
     'max_output_tokens': 2,
     'max_input_tokens': 20},
    'lite': {'call_time': '5m0s'},
    'v2-professional': {'call_time

In [24]:
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
EmbedTextParamsMetaNames().show()

---------------------  ----  --------
META_PROP NAME         TYPE  REQUIRED
TRUNCATE_INPUT_TOKENS  int   N
RETURN_OPTIONS         dict  N
---------------------  ----  --------


In [25]:
from langchain_ibm import WatsonxEmbeddings
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes


embeddings = WatsonxEmbeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id,
    params={"TRUNCATE_INPUT_TOKENS": 500}
    )

docsearch = Chroma.from_documents(texts, embeddings, collection_name="itr-docs", persist_directory= "/tmp/vector_data")

#### Compatibility watsonx.ai Embeddings with LangChain

 LangChain retrievals use `embed_documents` and `embed_query` under the hood to generate embedding vectors for uploaded documents and user query respectively.

<a id="models"></a>
## Foundation Models on `watsonx.ai`

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Defining model
You need to specify `model_id` that will be used for inferencing:

In [26]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.GRANITE_13B_CHAT_V2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [27]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100, # old: 100
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### LangChain CustomLLM wrapper for watsonx model
Initialize the `WatsonxLLM` class from Langchain with defined parameters and `ibm/granite-13b-chat-v2`. 

In [28]:
from langchain_ibm import WatsonxLLM

watsonx_granite = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

<a id="predict"></a>
## Generate a retrieval-augmented response to a question

Build the `RetrievalQA` (question answering chain) to automate the RAG task.

In [29]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff", retriever=docsearch.as_retriever())

### Select questions

Get questions from the previously loaded test dataset.

In [30]:
query = "how to select a bank for refund"
print(f"Question:\n  {query}")
answer = qa.invoke(query)["result"]
print(f"Answer:\n {answer}")

Question:
  how to select a bank for refund
Answer:
 
To select a bank for refund, you need to add a bank account in your profile by following these steps:

1. Go to Profile > My Bank Account > Add Bank Account.
2. Provide correct bank details and validate the bank account.

The request will be sent to the respective bank or NPCI for validation. Once validation is successful, you can nominate the bank account for refund.

Note: While filing ITR, if you have a bank account


In [15]:
query = "What are the documentes required to register for legal heir"
print(f"Question:\n  {query}")
answer = qa.invoke(query)["result"]
print(f"Answer:\n {answer}")

Question:
  What are the documentes required to register for legal heir
Answer:
  The documents required to register for legal heir include a copy of PAN of the deceased and copy of legal heir proof as per the norms. The legal heir proof can be any of the following:

− The legal heir certificate issued by a court of law.
− The legal heir certificate issued by the local revenue authorities.
− The surviving family members certificate issued by the local revenue authorities.


In [21]:
query = "Hello"
print(f"Question:\n  {query}")
answer = qa.invoke(query)["result"]
print(f"Answer:\n {answer}")

Question:
  Hello
Answer:
  I'm sorry, I don't have the necessary knowledge to respond to that inquiry.


---

<a id="references"></a>
## References
- [Using vectorized text with retrieval-augmented generation tasks](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-embedding-rag.html?context=wx&audience=wdp)

- [Sample notebook](https://dataplatform.cloud.ibm.com/exchange/public/entry/view/d3a5f957-a93b-46cd-82c1-c8d37d4f62c6?context=wx?context=wx&audience=wdp)
