# Build AI Apps with RAG using watsonx.ai

## Overview
This Jupyter Notebook provides an example of how to:

1. Create a Watson Discovery collection and upload documents to it.

2. Customize this notebook to perform a simple RAG exercise.

A prompt/query is passed into via this notebook. The code will perform the **Retrieval** task from the document(s) in the Watson Discovery collection. The returned information together with the prompt to the Large Language Model (LLM) of your choice (as named in the Notebook) to generate the result. 

In [None]:
# Install library
!pip install --upgrade ibm-watson

In [None]:
# Import libraries
import json
import os

from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# WML python SDK
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes, DecodingMethods

## 1. Watson Discovery set up

When you set up Watson Discovery, you should have saved the credentials in a file called **ibm-credentials.env**. You will need to use the values from that file. You can open the file using a simple text editor. 

1. Find the value for **DISCOVERY_APIKEY** from the file and paste it as the value for **IAMAuthenticator** below (between the 2 single quotes).

2. Find the value for **DISCOVERY_URL** from the file and paste it as the value for **discovery.set_service_url** below (between the 2 single quotes).

This initializes a connection to a Watson Discovery instance with a preloaded pdf document (the IBM Annual Report 2022). 

In [None]:
#Set up Watson Discovery credentials
authenticator = IAMAuthenticator('<YOUR WATSON DISCOVERY API KEY HERE>') # DISCOVERY_APIKEY  
discovery = DiscoveryV2(
    version='2020-08-30',
    authenticator=authenticator
)

discovery.set_service_url('<YOUR WATSON DISCOVERY URL HERE>') # DISCOVERY_URL

## 2. Watson Discovery Search 

This is a simple question (prompt) that is being posted to the model. This can be surfaced in a Streamlit GUI - which is not the focus of the lab. Clients may have other GUI tools. Here we focus on the underlying Watson Discovery, and later on watsonx.ai.

In [None]:
question = 'I’m interested in IBM’s effect on the environment. What efforts have they been making in sustainability?'

In [None]:
#question = 'I’m interested in IBM’s initiatives on the business in AI. What efforts have they been making in AI?'

In [None]:
#question = 'What is IBM net profit and revenue in 2022?'

For the block below, you will need to provide the proper information from the Watson Discovery project you created. 

1. The **Project ID**, paste the value in for **project_id** below (between the 2 single quotes).

2. The **Collection ID** (for the collection that includes the IBM Annual Report 2022 report), paste the value in for **collection_ids** below (between the 2 single quotes).

There are a few parameters defined for Watson Discovery Search:

* **passages.enabled**: A Boolean that specifies whether the service returns a set of the most relevant passage from the documents that were returned by a query that uses the natural_language_query parameter. Watson Discovery uses sophisticated algorithms to determine the best passages of text from all of the documents that are returned by a query. They are displayed as a section within each document query result and are ordered by passage relevance. Including passage retrieval in queries increases the response time because it takes more time to score the passages.

* **passages.max_per_document**: One passage is returned per document by default. You can increase the maximum number of passages to return per document by specifying a higher number in the passages.max_per_document parameter.

* **find_answers**: By default, Watson Discovery provides answers by returning the entire passage that contains the answer to a natural language query. When the answer-finding feature is enabled, Watson Discovery also provides a "short answer" within the passage, and a confidence score to show whether the "short answer" answers the question that is explicit or implicit in the user query.

* **natural_language_query**: Use a natural language query to enter queries that are expressed in natural language, as might be received from a user in a conversational or free-text interface, such as IBM Watson Assistant. The parameter uses the entire input as the query text. It does not recognize operators. The maximum query string length for a natural language query is 2048.

For more details on the query parameters, see https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-query-parameters.

In [None]:
# Utilize the IBM Watson Discovery service to query a collection for information based on a natural language query
response = discovery.query(
  project_id='<YOUR PROJECT ID HERE>',
  collection_ids = ['<YOUR COLLECTION ID HERE>'],
  passages = {'enabled': True, 
              'max_per_document': 5,
             'find_answers': True},
  natural_language_query = question
).get_result()

with open('data.json', 'w') as f:
    json.dump(response, f)

The next 4 blocks provide some parsing for the output. You should not need to update these. 

In [None]:
# Inspecting the key fields in the WD output
response.keys()

In [None]:
# Only one relevant document (because one document in the collection)
len(response['results'])

In [None]:
# Removing some tags
passages = response['results'][0]['document_passages']
passages = [p['passage_text'].replace('<em>', '').replace('</em>', '').replace('\n','') for p in passages]
passages

In [None]:
# Concatenating passages
context = '\n '.join(passages)
context

## 3. Creating Prompt

This section creates a prompt with instructions and context to allow the LLM to generate answers based on the passages retrieved by Watson Discovery, and on the rules specified below.

In [None]:
# https://huggingface.co/blog/llama2#how-to-prompt-llama-2

prompt = \
"<s>[INST] <<SYS>> \
Please answer the following question in one sentence using this text. \
If the question is unanswerable, say 'unanswerable'. \
If you responded to the question, don't say 'unanswerable'. \
Do not include information that's not relevant to the question. \
Do not answer other questions. \
Make sure the language used is English.'\
Do not use repetitions.' \
Question:" + question + \
'<</SYS>>' + context + '[/INST]'

# complete_prompt = context + instruction + question

print("----------------------------------------------------------------------------------------------------")
print("*** Prompt:" + prompt + "***")
print("----------------------------------------------------------------------------------------------------")

## 4. Configuring watsonx.ai

The following section defines the input to the Large Language Model (LLM).  The only item you need to specify is the project_id for watsonx.ai.. Paste the value into **project_id** (between the 2 double quotation marks).



In [None]:
# Initialize the watsonx model
def get_model(model_type,max_tokens,min_tokens,decoding,temperature):#, repetition_penalty):

    generate_params = {
        GenParams.MAX_NEW_TOKENS: max_tokens,
        GenParams.MIN_NEW_TOKENS: min_tokens,
        GenParams.DECODING_METHOD: decoding,
        GenParams.TEMPERATURE: temperature,
    }
    
    model = Model(
        model_id=model_type,
        params=generate_params,
        credentials={
            "apikey": api_key,
            "url": url
        },
        project_id= "<YOUR WATSONX.AI PROJECT ID HERE>"
        )

    return model

This section provides the credential for watsonx.ai. 

1. The watsonx.ai **Project ID** (not the one for Waston Discovery), paste the value into **watsonx_project_id** (between the 2 double quotes).

2. The **API Key** (not the one for Watson Discovery), paste the value into **api_key** (between the 2 double quotes).

In [None]:
# URL of the hosted LLMs is hardcoded because at this time all LLMs share the same endpoint
url = "https://us-south.ml.cloud.ibm.com"

# Replace with your watsonx project id (look up in the project Manage tab)
watsonx_project_id = "<YOUR WATSONX.AI PROJECT ID HERE>"

# Replace with your IBM Cloud key
api_key = "<YOUR IBM CLOUD API KEY HERE>"

The following block specifies the the specifics for the LLM. In a PoX, you may want to vary these values to show a client how they can get the best results.

1. **model_type** specifies the LLM being used. In the example below it is the llama-2-70b-chat model. You can change it to other models. Note that the size of the model will have implications on resource usage. You may wish to try some of the other ones in a PoX and see if they will provide different results. In the block below, there are 4 models (with 3 commented out, so llama2 is being used - you can comment out different ones to try).

2. **max_tokens** specifies the maximum number of output tokens. Keep in mind that 1 token does not equal 1 word. In general, you can estimate roughly 3 tokens per word.

3. **min_tokens** specifies the minimum number of output tokens.

4. **decoding** specifies the decoding method. You can also choose to do **sampling** decoding - in which case you can specify more parameters (such as **Top-P** and **Top-K**). More information on these additional parameters can be found from the Watsonx.ai Technical Sales Level 3 class (https://learn.ibm.com/course/view.php?id=13452).

5. **temperature** specifies how conservative or creative the model will be. The lower it is, the more conservative it it. The range is from 0 to 2.

In [None]:
# Set up watsonx model and parameters
model_type = "meta-llama/llama-2-70b-chat"
# model_type = "google/flan-t5-xxl"
# model_type = "ibm/granite-13b-chat-v1"
# model_type = "ibm/granite-13b-instruct-v1"
# model_id = "ibm/mpt-7b-instruct2"
max_tokens = 100
min_tokens = 50
decoding = DecodingMethods.GREEDY
temperature = 0.7

# Get the watsonx model
model = get_model(model_type, max_tokens, min_tokens, decoding, temperature)

## 5. Answer Generation

This block generates the answer based on the input prompt, the specified parameters, and above all the specified Watson Discovery collection of data.

In [None]:
# Send a prompt to model
generated_response = model.generate(prompt)
response_text = generated_response['results'][0]['generated_text']

# Print model response
print("--------------------------------- Generated response -----------------------------------")
print(response_text)
print("*********************************************************************************************")