![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx Granite Model Series, Chroma, and LangChain to answer questions (RAG)

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.

## Notebook content
This notebook contains the steps and code to demonstrate support of Retrieval Augumented Generation in watsonx.ai. It introduces commands for data retrieval, knowledge base building & querying, and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.10.

### About Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is a versatile pattern that can unlock a number of use cases requiring factual recall of information, such as querying a knowledge base in natural language.

In its simplest form, RAG requires 3 steps:

- Index knowledge base passages (once)
- Retrieve relevant passage(s) from knowledge base (for every user query)
- Generate a response by feeding retrieved passage into a large language model (for every user query)

## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Document data loading](#data)
- [Build up knowledge base](#build_base)
- [Foundation Models on watsonx](#models)
- [Generate a retrieval-augmented response to a question](#predict)
- [Summary and next steps](#summary)


<a id="setup"></a>
##  Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).


### Install and import the dependecies

In [1]:
!pip install "langchain==0.1.10" | tail -n 1
!pip install "ibm-watsonx-ai>=0.2.6" | tail -n 1
!pip install -U langchain_ibm | tail -n 1
!pip install wget | tail -n 1
!pip install sentence-transformers | tail -n 1
!pip install "chromadb" --upgrade | tail -n 1
!pip install "sqlalchemy==2.0.1" | tail -n 1
!pip install "lastmile-eval[ui]"

Successfully installed PyYAML-6.0.1 SQLAlchemy-2.0.30 aiohttp-3.9.5 aiosignal-1.3.1 annotated-types-0.7.0 attrs-23.2.0 certifi-2024.2.2 charset-normalizer-3.3.2 dataclasses-json-0.6.6 frozenlist-1.4.1 idna-3.7 jsonpatch-1.33 jsonpointer-2.4 langchain-0.1.10 langchain-community-0.0.38 langchain-core-0.1.52 langchain-text-splitters-0.0.2 langsmith-0.1.60 marshmallow-3.21.2 multidict-6.0.5 mypy-extensions-1.0.0 numpy-1.26.4 orjson-3.10.3 packaging-23.2 pydantic-2.7.1 pydantic-core-2.18.2 requests-2.32.2 tenacity-8.3.0 typing-inspect-0.9.0 urllib3-2.2.1 yarl-1.9.4
Successfully installed ibm-cos-sdk-2.13.4 ibm-cos-sdk-core-2.13.4 ibm-cos-sdk-s3transfer-2.13.4 ibm-watsonx-ai-1.0.4 jmespath-1.0.1 lomond-0.3.3 pandas-2.1.4 pytz-2024.1 tabulate-0.9.0 tzdata-2024.1 urllib3-2.1.0
Successfully installed langchain_ibm-0.1.7
Successfully installed wget-3.2
Successfully installed MarkupSafe-2.1.5 Pillow-10.3.0 filelock-3.14.0 fsspec-2024.5.0 huggingface-hub-0.23.1 jinja2-3.1.4 joblib-1.4.2 mpmath-1.3

In [1]:
import os, getpass

### watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

You can either set these explicitly (uncomment the lines below), or save them in a `.env` file within this project directory.

In [2]:
import os

try:
    # First try this in case we're running on Google Collab
    from google.colab import userdata
    os.environ['WATSONX_APIKEY'] =  userdata.get('WATSONX_APIKEY')
    os.environ['PROJECT_ID'] =  userdata.get('PROJECT_ID')
    os.environ['SPACE_ID'] =  userdata.get('SPACE_ID')
except ModuleNotFoundError:
    import dotenv
    dotenv.load_dotenv()


# os.environ['WATSONX_APIKEY'] =  <WATSONX_APIKEY>
# os.environ['PROJECT_ID'] = <PROJECT_ID>
# os.environ['SPACE_ID'] = <SPACE_ID>

True

In [28]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass(os.getenv("WATSONX_APIKEY"))
}

### Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [29]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

try:
    space_id = os.environ["SPACE_ID"]
except KeyError:
    space_id = input("Please enter your space id if you have one (hit enter): ") or None

<a id="data"></a>
## Document data loading

Download the file with State of the Union.

In [5]:
import wget

filename = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(filename):
    wget.download(url, out=filename)

# LastMile Auto Instrumentation Setup

An instance of `LangChainInstrumentor` is created with a project name. The `instrument()` method is called to instrument the code for tracing and monitoring.

In [6]:
from lastmile_eval.rag.debugger.tracing.auto_instrumentation import LangChainInstrumentor

# Create an instance of LangChainInstrumentor and instrument the code
instrumentor = LangChainInstrumentor(project_name="ibm x lastmile")
instrumentor.instrument()

  from .autonotebook import tqdm as notebook_tqdm
2024-05-22 10:29:17,122 - PyTorch version 2.3.0 available.


<a id="build_base"></a>
## Build up knowledge base

The most common approach in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

In this basic example, we take the State of the Union speech content (filename), split it into chunks, embed it using an open-source embedding model, load it into <a href="https://www.trychroma.com/" target="_blank" rel="noopener no referrer">Chroma</a>, and then query it.

In [7]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(filename)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

The dataset we are using is already split into self-contained passages that can be ingested by Chroma.

### Create an embedding function

Note that you can feed a custom embedding function to be used by chromadb. The performance of Chroma db may differ depending on the embedding model used. In following example we use watsonx.ai Embedding service. We can check available embedding models using `get_embedding_model_specs`

In [8]:
from ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs

get_embedding_model_specs(credentials.get('url'))

{'total_count': 5,
 'limit': 100,
 'first': {'href': 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2023-09-30&filters=function_embedding'},
 'resources': [{'model_id': 'baai/bge-large-en-v1',
   'label': 'bge-large-en-v1',
   'provider': 'baai',
   'source': 'baai',
   'functions': [{'id': 'embedding'}],
   'short_description': 'An embedding model with version 1.5. It has 335 million parameters and an embedding dimension of 1024.',
   'long_description': 'This model has multi-functionality like dense retrieval, sparse retrieval, multi-vector, Multi-Linguality, and Multi-Granularity(8192 tokens)',
   'tier': 'class_c1',
   'number_params': '335m',
   'limits': {'lite': {'call_time': '5m0s'},
    'v2-professional': {'call_time': '10m0s'},
    'v2-standard': {'call_time': '10m0s'}},
   'lifecycle': [{'id': 'available', 'start_date': '2024-05-16'}]},
  {'model_id': 'ibm/slate-125m-english-rtrvr',
   'label': 'slate-125m-english-rtrvr',
   'provider': 'IBM',
   'so

In [9]:
from langchain_ibm import WatsonxEmbeddings
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes
from langchain_openai import OpenAIEmbeddings

embeddings = WatsonxEmbeddings(
    model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value,
    url=credentials["url"],
    apikey=credentials["apikey"],
    project_id=project_id,
    space_id=space_id
    )

docsearch = Chroma.from_documents(texts, embeddings)

2024-05-22 10:29:31,532 - Client successfully initialized
2024-05-22 10:29:32,276 - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2024-05-22 10:29:32,326 - Starting component System
2024-05-22 10:29:32,326 - Starting component Posthog
2024-05-22 10:29:32,327 - Starting component OpenTelemetryClient
2024-05-22 10:29:32,327 - Starting component SqliteDB
2024-05-22 10:29:32,330 - Starting component QuotaEnforcer
2024-05-22 10:29:32,330 - Starting component LocalSegmentManager
2024-05-22 10:29:32,331 - Starting component SegmentAPI
2024-05-22 10:29:32,859 - Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-10'
2024-05-22 10:29:32,860 - Response(POST https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-10): {"model_id":"ibm/slate-30m-english-rtrvr","created_at":"2024-05-22T14:29:32.828Z","results":[{"embedding":[0.029716752,0.06614497,0.0606624

#### Compatibility watsonx.ai Embeddings with LangChain

 LangChain retrievals use `embed_documents` and `embed_query` under the hood to generate embedding vectors for uploaded documents and user query respectively.

In [18]:
help(WatsonxEmbeddings)

Help on class WatsonxEmbeddings in module langchain_ibm.embeddings:

class WatsonxEmbeddings(pydantic.v1.main.BaseModel, langchain_core.embeddings.embeddings.Embeddings)
 |  WatsonxEmbeddings(*, model_id: str = '', project_id: str = '', space_id: str = '', url: Optional[pydantic.v1.types.SecretStr] = None, apikey: Optional[pydantic.v1.types.SecretStr] = None, token: Optional[pydantic.v1.types.SecretStr] = None, password: Optional[pydantic.v1.types.SecretStr] = None, username: Optional[pydantic.v1.types.SecretStr] = None, instance_id: Optional[pydantic.v1.types.SecretStr] = None, version: Optional[pydantic.v1.types.SecretStr] = None, params: Optional[dict] = None, verify: Union[str, bool, NoneType] = None, watsonx_embed: ibm_watsonx_ai.foundation_models.embeddings.embeddings.Embeddings = None, watsonx_client: ibm_watsonx_ai.client.APIClient = None) -> None
 |  
 |  Method resolution order:
 |      WatsonxEmbeddings
 |      pydantic.v1.main.BaseModel
 |      pydantic.v1.utils.Representat

<a id="models"></a>
## Foundation Models on `watsonx.ai`

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Defining model
You need to specify `model_id` that will be used for inferencing:

In [10]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.GRANITE_13B_CHAT_V2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [11]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### LangChain CustomLLM wrapper for watsonx model
Initialize the `WatsonxLLM` class from Langchain with defined parameters and `ibm/granite-13b-chat-v2`. 

In [12]:
from langchain_ibm import WatsonxLLM

watsonx_granite = WatsonxLLM(
    model_id=model_id.value,
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters
)

2024-05-22 10:29:59,098 - Client successfully initialized
2024-05-22 10:30:00,008 - Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-05-10&project_id=7bf87b36-0eab-4e99-96d4-174a1cdf1788&limit=200'
2024-05-22 10:30:00,009 - Response(GET https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-05-10&project_id=7bf87b36-0eab-4e99-96d4-174a1cdf1788&limit=200): {"total_count":25,"limit":200,"first":{"href":"https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-05-10&project_id=7bf87b36-0eab-4e99-96d4-174a1cdf1788&limit=200"},"resources":[{"model_id":"baai/bge-large-en-v1","label":"bge-large-en-v1","provider":"baai","source":"baai","functions":[{"id":"embedding"}],"short_description":"An embedding model with version 1.5. It has 335 million parameters and an embedding dimension of 1024.","long_description":"This model has multi-functionality like dense retrieval

<a id="predict"></a>
## Generate a retrieval-augmented response to a question

Build the `RetrievalQA` (question answering chain) to automate the RAG task.

In [13]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff", retriever=docsearch.as_retriever())

### Sample QA chain

Get questions from the previously loaded test dataset.

In [25]:
query = "What is the goal of the Cancer Moonshot initiative mentioned in the speech?"
qa.invoke(query)

2024-05-22 16:23:59,609 - Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-10'
2024-05-22 16:23:59,610 - Response(POST https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-10): {"model_id":"ibm/slate-30m-english-rtrvr","created_at":"2024-05-22T20:23:59.628Z","results":[{"embedding":[-0.04719374,-0.005378661,0.06922088,0.0014521753,0.0649358,-0.0389726,0.02482337,0.0035557966,0.069491,-0.0024815958,0.031852197,-0.043076318,0.061914798,0.0038633677,-0.017031213,0.051376354,0.020512566,0.012809123,0.024800276,-0.040999033,0.0005625458,-0.023966713,-0.0037695738,0.026985712,-0.06836927,-0.06085426,0.015830545,-0.07461701,0.07033399,-0.047211286,0.0492157,0.01214432,0.043521743,0.029385792,0.07688585,-0.026944343,0.079846084,0.012887746,0.0656877,-0.044924635,0.0047650947,-0.017185673,0.025516946,0.016835531,0.0650998,0.026923515,0.026585622,0.06978408,0.0062697753,0.025013687,0.031134915,0.07067306,-0.022516

{'query': 'What is the goal of the Cancer Moonshot initiative mentioned in the speech?',
 'result': ' The goal of the Cancer Moonshot initiative mentioned in the speech is to cut the cancer death rate by at least 50% over the next 25 years and turn more cancers from death sentences into treatable diseases.'}

# Evaluations

This section demonstrates how to evaluate the outputs of the question-answering model using the `lastmile_eval` package.


In [23]:
import pandas as pd
from lastmile_eval.rag.debugger.api.evaluation import (
    run_and_evaluate_outputs,
    create_input_set,
    run_and_evaluate_outputs_with_input_set
)
from lastmile_eval.text import calculate_qa_score, calculate_custom_llm_metric_example_semantic_similarity, calculate_relevance_score

# Define inputs and ground truth for the evaluation
inputs = ["What is the goal of the Cancer Moonshot initiative mentioned in the speech?", # 1
          "What did President Zelenskyy say in his speech to the European Parliament?", # 2
          'What statement did the President make regarding the sanctions on Russia?',
          "What did the President say about the future exploration of Mars?"]  # 4 (not in data)

ground_truths = ["The Cancer Moonshot initiative aims to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers into treatable diseases, and provide more support for patients and families.", # 1
                 'In his speech to the European Parliament, President Zelenskyy said, "Light will win over darkness."', # 2
                 'We are cutting off Russia’s largest banks from the international financial system.  Preventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless.', #3
                 "The President did not address the future exploration of Mars in the speech."] # 4

# Create an input set for evaluation
test_set_id = create_input_set(inputs=inputs, ground_truth=ground_truths, input_set_name="State of the Union Test Cases").ids[0]

# Define evaluators
def wrap_semantic_similarity(outputs, ground_truths, _inputs):
    return calculate_custom_llm_metric_example_semantic_similarity(outputs, ground_truths)

def  wrap_relevance(outputs, ground_truths, _inputs):
    return calculate_relevance_score(outputs, ground_truths)

evaluators = {
    "QA Score": calculate_qa_score,
    "Semantic Similarity": wrap_semantic_similarity,
    "Relevance": wrap_relevance
}

In [26]:
# Run and evaluate outputs using the created input set

from lastmile_eval.rag.debugger.api.evaluation import (
    run_and_evaluate_outputs_with_input_set
)
# Wrap the qa.invoke function to return the final value as the output
def run_qa_and_return_final_value(query):
    return qa.invoke(query)["result"]

run_and_evaluate_outputs_with_input_set( 
    evaluators,
    dataset_level_evaluators={}, # No dataset level evaluators
    evaluation_set_name="State of the Union Evals",
    rag_query_fn=run_qa_and_return_final_value,
    input_set_id=test_set_id,
    create_test_set_name="State of the Union Test Run",
    project_id="clwhwwua200bqpe9nu5ybtlt7"
)


2024-05-22 16:30:27,759 - Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-10'
2024-05-22 16:30:27,760 - Response(POST https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2024-05-10): {"model_id":"ibm/slate-30m-english-rtrvr","created_at":"2024-05-22T20:30:27.789Z","results":[{"embedding":[-0.04719374,-0.005378661,0.06922088,0.0014521753,0.0649358,-0.0389726,0.02482337,0.0035557966,0.069491,-0.0024815958,0.031852197,-0.043076318,0.061914798,0.0038633677,-0.017031213,0.051376354,0.020512566,0.012809123,0.024800276,-0.040999033,0.0005625458,-0.023966713,-0.0037695738,0.026985712,-0.06836927,-0.06085426,0.015830545,-0.07461701,0.07033399,-0.047211286,0.0492157,0.01214432,0.043521743,0.029385792,0.07688585,-0.026944343,0.079846084,0.012887746,0.0656877,-0.044924635,0.0047650947,-0.017185673,0.025516946,0.016835531,0.0650998,0.026923515,0.026585622,0.06978408,0.0062697753,0.025013687,0.031134915,0.07067306,-0.022516

CreateEvaluationsResult(success=True, message='{"id":"clwia65ju00xhpblys8fdbff5","createdAt":"2024-05-22T20:32:35.226Z","updatedAt":"2024-05-22T20:32:35.226Z","name":"State of the Union Evals","paramSet":null,"testSetId":"clwia3n6900eiquh0wk13q6ct","creatorId":"cldfcu2780008qsueqgiqvenw","projectId":"clwhwwua200bqpe9nu5ybtlt7","organizationId":null,"visibility":"MEMBER","metadata":null,"active":true}', df_metrics_trace=                   testSetId                 testCaseId           metricName  \
0  clwia3n6900eiquh0wk13q6ct  clwia3n6e00ejquh0i3y9nqa4             QA Score   
1  clwia3n6900eiquh0wk13q6ct  clwia3n6e00ekquh0i42occud             QA Score   
2  clwia3n6900eiquh0wk13q6ct  clwia3n6e00elquh0m5hopbmp             QA Score   
3  clwia3n6900eiquh0wk13q6ct  clwia3n6e00emquh09zo0va5a             QA Score   
0  clwia3n6900eiquh0wk13q6ct  clwia3n6e00ejquh0i3y9nqa4  Semantic Similarity   
1  clwia3n6900eiquh0wk13q6ct  clwia3n6e00ekquh0i42occud  Semantic Similarity   
2  clwia3n6900eiq

# Launch The Rag Debugger UI

<img width="1915" alt="Screenshot 2024-05-21 at 11 30 01 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/2f2a72c9-fb07-402b-bc6f-fdc46f529edd"> 

<br><br>

<img width="1917" alt="Screenshot 2024-05-21 at 11 30 14 PM" src="https://github.com/lastmile-ai/aiconfig/assets/141073967/4a98b9b3-b96e-430e-9156-79560cbea1be">

In [None]:
# Launch the rag-debugger
!rag-debug launch

<a id="summary"></a>
## Summary and next steps

 You successfully completed this notebook!.
 
 You learned how to answer question using RAG using watsonx and LangChain.
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

Copyright © 2023, 2024 IBM. This notebook and its source code are released under the terms of the MIT License.