# Source attribution detection for RAG based natural language question responses using WatsonX 

### Edit the two values below with your API key and Project ID

Refer to the  [lab guide](https://github.com/ericmartens/rag-explain/blob/main/RAG_Explanations_Lab_Guide.docx) for instructions on gathering the correct values for API\_KEY and PROJECT\_ID. Copy and paste the values into the cell below in between the quotation marks.

In [None]:
API_KEY = "___PASTE_YOUR_API_KEY_HERE_BETWEEN_THE_QUOTES___"
PROJECT_ID = "___PASTE_YOUR_PROJECT_ID_HERE_BETWEEN_THE_QUOTES___"

## Notebook content
This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai and identify source attribution. It introduces commands for data retrieval, knowledge base building & querying, and model testing. Some familiarity with Python is helpful. This notebook uses Python 3.11, and is based on [this notebook](https://github.com/IBM/watson-openscale-samples/blob/main/WatsonX.Governance/Cloud/GenAI/samples/source-attribution-using-protodash-for-rag-usecase%20.ipynb) by Sowmaya Kollipara.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

#### Source Attribution Detection
Source attribution detection is to identify the part(s) from the context which could have attributed to the response from the foundation model . 

#### The flow of this notebook is as follows :
1. Building a knowledge base
2. Getting the relevant information from the vectordb to get the relevant context for a bunch of questions for which user is looking for responses.
3. Construct the prompt using the question and relevant context for each question considered.
4. Generate the retrieval augmented response to the question using the foundation models hosted on watsonx.ai
5. Intialize the WOS client, supply the configuration needed for identifying source attribution.
6. Identify the source attribution for the RAG based responses.

### Install and import the dependecies

In [None]:
!pip install "langchain==0.0.345" | tail -n 1
!pip install wget | tail -n 1
!pip install sentence-transformers | tail -n 1
!pip install "chromadb==0.3.26" | tail -n 1
!pip install "ibm-watson-machine-learning>=1.0.335" | tail -n 1
!pip install "pydantic==1.10.0" | tail -n 1
!pip install --upgrade ibm-metrics-plugin  --no-cache | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1
!pip install --upgrade pyspark==3.3.1 | tail -n 1
!pip install -U "torch==2.0.0"

## Foundation Models on `watsonx.ai`

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

In [None]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.GRANITE_13B_CHAT_V2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result.

In [None]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### LangChain CustomLLM wrapper for watsonx model
Initialize the `WatsonxLLM` class from Langchain with defined parameters and `ibm/granite-13b-chat-v2`. 

In [None]:
from langchain.llms import WatsonxLLM

watsonx_granite = WatsonxLLM(
    model_id=model_id.value,
    url="https://us-south.ml.cloud.ibm.com",
    apikey=API_KEY,
    project_id=project_id,
    params=parameters
)

## Generate a retrieval-augmented response to a question
Build the `RetrievalQA` (question answering chain) to automate the RAG task.

In [None]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff", retriever=docsearch.as_retriever())

### Set up openscale client

In [None]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator, BearerTokenAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *


authenticator = IAMAuthenticator(apikey=API_KEY)
client = APIClient(authenticator=authenticator)
client.version

### Update the configuration needed for source attribution

In [None]:
from ibm_metrics_plugin.common.utils.constants import ExplainabilityMetricType
from ibm_metrics_plugin.metrics.explainability.entity.explain_config import ExplainConfig
from ibm_metrics_plugin.common.utils.constants import InputDataType, ProblemType

config_json = {
            "configuration": {
                "input_data_type": InputDataType.TEXT.value,
                "problem_type": ProblemType.QA.value,
                "feature_columns":["context"],
                "prediction": "generated_text", #Column name that has the prompt response from FM
                "context_column": "context",
                "explainability": {
                    "metrics_configuration":{
                        ExplainabilityMetricType.PROTODASH.value:{
                                    "embedding_fn": embeddings.embed_documents #Make sure to supply the embedded function else TfIDfvectorizer will be used
                                }
                    }
                }
            }
        }

<a id="data"></a>
## Document data loading

Download the file with State of the Union.

In [None]:
import wget
import os

filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(filename):
    wget.download(url, out=filename)

<a id="build_base"></a>
## Build up knowledge base

The most common approach in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take the State of the Union speech content (filename), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

The dataset we are using is already split into self-contained passages that can be ingested by Chroma.

### Create an embedding function

Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

In [None]:
query0 = "What did the president say about Ketanji Brown Jackson?"
query1 = "What is ARPA-H?"
query2 = "How much does it cost to make  a vial of Insulin?"
query3 = "What is the investment of Ford and GM to build electric vehicles?"
query4 = "What is the proposed tax rate for corporations?"
query5 = "What did president say about Bipartisan Infrastructure Act"
query6 = "What is Intel going to build?"
query7 = "How many new manufacturing jobs are created last year?"
query8 = "What did the president say about cancer death rate?"
query9 = "What are the dangers faced by troops in Iraq and Afganistan?"
query10 = "How many electric vehicle charging stations are built?"

questions_list = [query0, query1 , query2, query3,query4, query5, query6, query7, query8, query9,query10]

### Select questions

Get questions from the previously loaded test dataset and retain the context for each question.

In [None]:
#Select the question from the question list above .
questions = questions_list[0:2]
for question in questions:
    print(question)


### Generate a retrieval-augmented response to a question

In [None]:
responses = []
contexts = []
for query in questions:
    #Retrive relevant context for each question from the vector db
    docs = docsearch.as_retriever().get_relevant_documents(query)

    context = []
    #Extaract the needed information
    for doc in docs:
        context.append(doc.to_json()['kwargs']['page_content'])

    #Capture the context
    contexts.append(context)

    #Run the prompt and get the response
    response = qa.run(query)
    responses.append(response)
    

In [None]:
#Print a sample context retrieved for a query 
print(f"Question:{questions[0]}\n context:{contexts[0]}")

In [None]:
#Print the result
for query in questions:
    print(f"{query} \n {responses[questions.index(query)]} \n")

<a id="sourceattribution"></a>
### Source attribution detection for RAG based response for LLMs

Source Attrbution for RAG based response is computed using Protodash Explainer . The information needed for this computation :
1. Response data for which source attribution has to be identified. This is considered as input data.
2. Context information retained using RAG . This is considered as reference data

Using the above information , prototypes of the input are identified . Using this technique the source in the context which has attributed to the response is identified.

### Construct a dataframe with results , contexts to supply for source attribution
- generated_text : Response from the foundation model
- context: Relevant context retreived from vector db (chromadb in this example) for each question . For this notebook 5 questions are been considered for source attribution

In [None]:
import pandas as pd
data = pd.DataFrame({"generated_text":responses,"context":contexts})
data.head()

### Run protodash explainer to identify source attribution for the RAG based responses

In [None]:
import warnings

warnings.filterwarnings("ignore")
results = client.ai_metrics.compute_metrics(configuration=config_json,data_frame=data)

In [None]:
metrics = results.get("metrics_result")
results = metrics.get("explainability").get("protodash")

In [None]:
import json
for idx, entry in enumerate(results):
    print(f"====idx:{idx}: Question:{questions[idx]} Response:{data['generated_text'][idx]}====")
    print(json.dumps(entry,indent=4))

### Explanation
Source attribution can be understood using the weights ( the attribution/contribution factor) and the prototypes ( the relevant context/source) which has attributed to the response by the foundation model behind the scenes . For example a weight: 1.0 indicate that that a single paragraph of the context has attributed for response by foundation model. Likewise weights : 0.6,0.3,0.1 indicate that 3  paragraphs have attributed for response by foundation model behind the scenes.   The prototype values are the paragraphs supplied as part of the relevant context . 