### Install and import dependecies

In [None]:
%pip install langchain | tail -n 1
%pip install -U langchain-community
%pip install elasticsearch | tail -n 1
%pip install langchain_elasticsearch | tail -n 1
%pip install sentence_transformers | tail -n 1
%pip install humanize | tail -n 1
%pip install pandas | tail -n 1
%pip install rouge_score | tail -n 1
%pip install nltk | tail -n 1
%pip install wget | tail -n 1
%pip install ibm_watsonx_ai | tail -n 1
%pip install "pydantic==1.10.0" | tail -n 1
%pip install "ibm-watson-machine-learning>=1.0.327" | tail -n 1

In [3]:
import os, getpass
import pandas as pd
import humanize
import random
from typing import Optional, Any, Iterable, List

### watsonx API connection
This cell defines the credentials required to work with watsonx API for Foundation
Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

In [20]:
credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your WML api key (hit enter): ")
}

### Defining the project id
The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

**Hint**: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be `Projects / <project name> /`. Click on the `<project name>` link. Then get the `project_id` from Project's Manage tab (Project -> Manage -> General -> Details).


In [21]:
try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

<a id="data"></a>
## Data (test) loading

Download the test dataset. This dataset is used to calculate the metrics score for selected model, defined prompts and parameters.

In [6]:
import wget

questions_test_filename = 'questions_test.csv'
questions_train_filename = 'questions_train.csv'
questions_test_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/questions_test.csv'
questions_train_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/questions_train.csv'


if not os.path.isfile(questions_test_filename): 
    wget.download(questions_test_url, out=questions_test_filename)


if not os.path.isfile(questions_train_filename): 
    wget.download(questions_train_url, out=questions_train_filename)

In [7]:
filename_test = './questions_test.csv'
filename_train =  './questions_train.csv'

test_data = pd.read_csv(filename_test)
train_data = pd.read_csv(filename_train)

Inspect data sample

In [None]:
train_data.head()

### Build up knowledge base

The current state-of-the-art in RAG is to create dense vector representations of the knowledge base in order to calculate the semantic similarity to a given user query.

We can generate dense vector representations using embedding models. In this notebook, we use <a href="https://www.sbert.net/" target="_blank" rel="noopener no referrer">Sentence Transformers</a> <a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" target="_blank" rel="noopener no referrer">all-MiniLM-L6-v2</a> to embed both the knowledge base passages and user queries. `all-MiniLM-L6-v2` is a performant open-source model that is small enough to run locally.

A vector database is optimized for dense vector indexing and retrieval. This notebook uses <a href="https://python.langchain.com/docs/integrations/vectorstores/elasticsearch#basic-example" target="_blank" rel="noopener no referrer">Elasticsearch</a>, a distributed, RESTful search and analytics engine, capable of performing both vector and lexical search. It is built on top of the Apache Lucene library, which offers good speed and performance with all-MiniLM-L6-v2 embedding model.

The dataset we are using is already split into self-contained passages that can be ingested by Elasticsearch. 

The size of each passage is limited by the embedding model's context window (which is 256 tokens for `all-MiniLM-L6-v2`).

### Load knowledge base documents

Load set of documents used further to build knowledge base. 

In [9]:
knowledge_base_dir = "./knowledge_base"

In [10]:
my_path = f"{os.getcwd()}/knowledge_base"
if not os.path.isdir(my_path):
   os.makedirs(my_path)

In [11]:
documents_filename = 'knowledge_base/psgs.tsv'
documents_url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/RAG/psgs.tsv'


if not os.path.isfile(documents_filename): 
    wget.download(documents_url, out=documents_filename)

In [12]:
documents = pd.read_csv(f"{knowledge_base_dir}/psgs.tsv", sep='\t', header=0)
documents['indextext'] = documents['title'].astype(str) + "\n" + documents['text']
documents = documents[:1000]

### Create an embedding function

Note that you can feed a custom embedding function to be used by Elasticsearch. The performance of Elasticsearch may differ depending on the embedding model used.

In [13]:
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.embeddings.base import Embeddings

emb_func = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

<a id="models"></a>
## Foundation Models on watsonx

### Defining model
You need to specify `model_id` that will be used for inferencing:

In [14]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

model_id = ModelTypes.FLAN_UL2

### Defining the model parameters
We need to provide a set of model parameters that will influence the result:

In [15]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 50
}

### Initialize the `Model` class.

In [22]:
from ibm_watson_machine_learning.foundation_models import Model

watsonx_granite = Model(
    model_id=model_id.value,
    credentials=credentials,
    project_id=project_id,
    params=parameters
).to_langchain()

<a id="elastic_conn"></a>

We'll use the Cloud ID to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.
To find the password for the `elastic` user, go to https://cloud.elastic.co/deployments and select your deployment. Then on the left-hand sided menu select the `Security` settings.
Click on the `Reset password` button and copy the generated password.


The following cell retrieves the Elasticsearch Cloud ID and password for the `elastic` user from the environment if available and prompts you otherwise.

In [32]:
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
try:
    es_cloud_id = os.environ["ELASTIC_CLOUD_ID"]
except KeyError:
    es_cloud_id = input("Please enter your Elasticsearch Cloud ID (hit enter): ")


try:
    es_password = os.environ["ELASTIC_PASSWORD"]
except KeyError:
    es_password = input("Please enter your Elasticsearch Deployment PASSWORD (password per Elasticsearch online deployment, hit enter): ")

<a id="elasticsearchstore"></a>
## Set up ElasticsearchStore connector from Langchain


We first create a regular Elasticsearch Python client connection. Then we pass it into LangChain's ElasticsearchStore wrapper together with the WatsonX model based embedding function.

Consult the LangChain documentation For more information about <a href="https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html" target="_blank" rel="noopener no referrer">ElasticsearchStore</a> connector.

In [35]:
from langchain_elasticsearch import ElasticsearchStore
from elasticsearch import Elasticsearch

# Create the client instance
es_connection = Elasticsearch(
    cloud_id=es_cloud_id,
    basic_auth=("elastic", es_password)
)

# Successful response!
es_connection.info()


knowledge_base = ElasticsearchStore(es_connection=es_connection,
                                    index_name="test_index",
                                    embedding=emb_func,
                                    strategy=ElasticsearchStore.ApproxRetrievalStrategy(),
                                    distance_strategy="DOT_PRODUCT")


<a id="elasticsearchstore_index"></a>
### Embed and index documents with Elasticsearch

**Note: Could take several minutes if you don't have pre-built indices**

In [36]:
if es_connection.indices.exists(index="test_index"):
    es_connection.indices.delete(index="test_index")
_ = knowledge_base.add_texts(texts=documents.indextext.tolist(),
                             metadatas=[{'title': title, 'id': doc_id}
                                for (title, doc_id) in
                                zip(documents.title, documents.id)],  # filter on these!
                             index_name="test_index",
                             ids=[str(i) for i in documents.id]  # unique for each doc
                            )

Let's take a look in Elasticsearch what the LangChain wrapper has created. First we display the newly created index ("tables" in Elasticsearch are always called "index"). Note the field `vector` of type `dense_vector` with `dot_product` similarity.

In [None]:
dict(es_connection.indices.get(index="test_index"))

Verify the number of documents loaded into the Elasticsearch index.

In [None]:
doc_count = es_connection.count(index='test_index')["count"]
doc_count

Let's retrieve a random document as a sample. Note the embedding in the vector field, that was generated with the WatsonX embedding model.

In [None]:
dict(es_connection.get(index="test_index", id=random.randint(0, len(documents)-1)))

Display the total size and indexing time of the new index in Elasticsearch.

In [None]:
index_stats = es_connection.indices.stats(index="test_index").get('_all').get('primaries')
print("Index size:    " + humanize.naturalsize(index_stats.get('store').get('size_in_bytes')))
print("Indexing time: " + humanize.precisedelta(index_stats.get('indexing').get('index_time_in_millis')/1000, minimum_unit='minutes'))


<a id="predict"></a>
## Generate a retrieval-augmented response to a question

`RetrievalQA` is a chain to do question answering.

**Hint:** To use Chain interface from LangChain with watsonx.ai models you must call `model.to_langchain()` method. 

It returns `WatsonxLLM` wrapper compatible with LangChain CustomLLM specification.

### Select questions

The prompts we will use to test the RAG flow

In [41]:
questions_and_answers = {
            'names of founding fathers of the united states?': "Thomas Jefferson::James Madison::John Jay::George Washington::John Adams::Benjamin Franklin::Alexander Hamilton",
            'who played in the super bowl in 2013?': 'Baltimore Ravens::San Francisco 49ers',
            'when did bucharest become the capital of romania?': '1862'
}

### Retrieve relevant context

Fetch paragraphs similar to the question

In [42]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(llm=watsonx_granite, chain_type="stuff",verbose=True, retriever=knowledge_base.as_retriever(), return_source_documents=True)

In [None]:
results = []


for question in questions_and_answers.keys():

    result = qa.invoke({'query': question})

    print("result: ", result)
    results.append( result)

Get the set of chunks for one of the questions.

In [None]:
for idx, result in enumerate(results):
    print("=========")
    print("Question = ", result['query'])
    print("Answer = ", result['result'])
    print("Expected Answer(s) (may not be appear with exact wording in the dataset) = ", questions_and_answers[result['query']])
    print("\n")
    print("Source documents:")
    print(*(x.page_content for x in result['source_documents']), sep='\n')
    print("\n")
    

---

Copyright © 2023 IBM. This notebook and its source code are released under the terms of the MIT License.