## RAG - Query Documents from watsonx.data Milvus in watsonx.ai (Web)

### Overview
This Jupyter Notebook provides a step-by-step guide on how to develop RAG using watsonx.data Milvus as a vector database (knowledge base). 
We already have the documents stored as vector embeddings in Milvus, we are now ready to perform queries against the vector database.
We will use the same `sentence-transformers/all-MiniLM-L6-v2` embedding model to generate the query vector and then use Milvus to find the most similar vectors in the vector database.

- Author: ahmad.muzaffar@ibm.com (APAC Ecosystem Technical Enablement).
- This material has been adopted from material originally produced by Katherine Ciaravalli, Ken Bailey and George Baklarz.

### 1. Install and import ibraries

In [None]:
# Install libraries
!pip install grpcio==1.60.0 
!pip install pymilvus
!pip install ipython-sql==0.4.1
!pip install sqlalchemy==1.4.46
!pip install sqlalchemy==1.4.46 "pyhive[presto]"
!pip install sentence_transformers
!pip install python-dotenv
!pip install ibm-cloud-sdk-core

In [None]:
# Import libraries
import os

from dotenv import load_dotenv
from ibm_cloud_sdk_core import IAMTokenManager
from ibm_watson_studio_lib import access_project_or_space
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams

from sentence_transformers import SentenceTransformer
from pymilvus import(
    Milvus,
    IndexType,
    Status,
    connections,
    FieldSchema,
    DataType,
    Collection,
    CollectionSchema,
)
from pymilvus import utility

import warnings
warnings.filterwarnings('ignore')

### 2. Credential Settings
To streamline the credential setup process, we'll create a config.env file to consolidate all necessary credentials. 
1. Download the config.env file here (https://ibm.box.com/s/f1ku32ekh8jmfmievvxmpnwsvuttwa2x) and populate it with the required credentials listed below.
2. Upload the config.env file into your watsonx.ai project.

watsonx.ai:
- PROJECT_ID
- ACCESS_TOKEN
- IBM_CLOUD_URL (Example: https://us-south.ml.cloud.ibm.com)
- API_KEY 

watsonx.data:  
- LH_HOST_NAME (Example: useast.services.cloud.techzone.ibm.com)
- LH_PORT (From TechZone: Watsonx UI:xxxxx)

Milvus:
- MILVUS_HOST (Example: useast.services.cloud.techzone.ibm.com)
- MILVUS_PORT (From TechZone: Milvus Port - Server:xxxxx)


In [None]:
# Credential settings
wslib = access_project_or_space({
        'token': '<YOUR WATSONX.AI PROJECT ACCESS TOKEN HERE>',
        'project_id': '<YOUR WATSONX.AI PROJECT ID>'
})

# Download the config.env file and load the content
wslib.download_file('config.env')
load_dotenv('config.env')

# Define connection variables
api_key = os.getenv("API_KEY", None)
ibm_cloud_url = os.getenv("IBM_CLOUD_URL", None) 
project_id = os.getenv("PROJECT_ID", None)

creds = {
    "url": ibm_cloud_url,
    "apikey": api_key 
}

access_token = IAMTokenManager(
    apikey = api_key,
    url = "https://iam.cloud.ibm.com/identity/token"
).get_token()

In [None]:
# Download the cert.crt file, .crt is a standard extension for certificate files, usually encoded in PEM (Privacy-Enhanced Mail) format, containing the public key and certificate information used in SSL/TLS communications to validate the identity of a server or client.
wslib.download_file('cert.crt')

### 3. Set Up Connection

In [None]:
# Retrieve the credential information
host = os.getenv("MILVUS_HOST", None)
port = os.getenv("MILVUS_PORT", None)
password = 'password'
user = 'ibmlhadmin'
server_pem_path = 'cert.crt'

# Set connection
connections.connect(alias = 'default',
                   host = host,
                   port = port,
                   user = user,
                   password = password,
                   server_pem_path = server_pem_path,
                   server_name = 'watsonxdata',
                   secure = True)

### 4. Load Milvus Collection 

In [None]:
# Check collection name
utility.list_collections()

In [None]:
# Load collection
basic_collection = Collection("rag_docs")      
basic_collection.load()

### 5. Query Milvus

In [None]:
# Query function that vectorize query, search documents via Semantic Search and return the search result
def query_milvus(query, num_results):
    
    # Vectorize query
    model = SentenceTransformer('sentence-transformers/all-minilm-l12-v2') # 384 dim
    query_embeddings = model.encode([query])

    # Search
    search_params = {
        "metric_type": "L2", 
        "params": {"nprobe": 5}
    }
    results = basic_collection.search(
        data=query_embeddings, 
        anns_field="vector", 
        param=search_params,
        limit=num_results,
        expr=None, 
        output_fields=['article_text'],
    )
    return results

### 6. Prompt watsonx.ai LLM with Context (Query Results)

In [None]:
# Sample query on topics related to how climate change may relate to other industries and processes related to your business

question_text = "How do businesses negatively affect climate change?"
#question_text = "What can a businesses do to have a positive effect on climate change?"
#question_text = "How can a business reduce their carbon footprint?"

# Irrelevant sample query
#question_text = "How much is the processing fee for credit card replacement?"

In [None]:
# Define a distance threshold (adjust based on your data and model)
threshold = 1.5  # Example value, tune it as necessary
print(f"Threshold value is {threshold}.")

num_results = 3
results = query_milvus(question_text, num_results)

relevant_chunks = []
for i in range(num_results):
    id = results[0].ids[i]
    distance = results[0].distances[i]
    
    # Filter results based on the distance threshold
    if distance <= threshold:
        print(f"id: {id}")
        print(f"distance: {distance}")
        
        text = results[0][i].entity.get('article_text')
        relevant_chunks.append(text)
        
        print(f"Relevant Chunk {i+1}:")
        print(text)
        print("\n")
    else:
        print(f"Result {i+1} skipped due to high distance ({distance}).")
        relevant_chunks = "NO RELEVANT CONTEXT FOUND"

In [None]:
print(relevant_chunks)

In [None]:
# This function construct a prompt template
def make_prompt(context, question_text):
    return (f"{context}\n\nPlease answer a question using this text. "
          + f"If there is no text found, say \"unanswerable\"."
          + f"\n\nQuestion: {question_text}")

# Build prompt w/ Milvus results
# Embed retrieved passages(context) and user question into into prompt text

#context = "\n\n".join(relevant_chunks)
context = "".join(relevant_chunks)

prompt = make_prompt(context, question_text)

print(prompt)

### 7. Set up LLM, parameters and inferencing

In [None]:
# Model inferencing parameters
params = {
        GenParams.DECODING_METHOD: "greedy",
        GenParams.MIN_NEW_TOKENS: 1,
        GenParams.MAX_NEW_TOKENS: 500,
        GenParams.TEMPERATURE: 0,
}

# LLM
model = Model(
        model_id='ibm/granite-13b-chat-v2', 
        params=params, credentials=creds, 
        project_id=project_id
)

# Inferencing
response = model.generate_text(prompt)
print(f"Question: {question_text}{response}")