# 1. Build AI Apps with RAG using watsonx.ai, LangChain & Vector DB

## Overview

In this hands-on, you will use LangChain, a framework for building LLM applications.
You will learn about:
- 1.1 Simple Prompt to LLM using LangChain
- 1.2 Zero-Shot Prompt and Few-Shot Prompt using Prompt Template
- 1.3 Sequential Prompts using Simple Sequential Chain
- 1.4 Retrieval Question Answering (QA)
- 1.5 Documents Summarization

## 1.1 Simple Prompt to LLM using LangChain
- Basic use case of sending prompts to LLM in watsonx (without using Langchain). 
- In this example, we are sending a simple prompt directly to the LLM model (Google flan-ul2).

In [None]:
# Install library
!pip install chromadb==0.4.2
!pip install langchain==0.0.312
!pip install langchain --upgrade
!pip install flask-sqlalchemy --user
!pip install pypdf 
!pip install sentence-transformers
!pip install langchain_openai

In [None]:
# Import libraries
import os
import warnings

#from dotenv import load_dotenv
from time import sleep
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM
from langchain import PromptTemplate # Langchain Prompt Template
from langchain.chains import LLMChain, SimpleSequentialChain # Langchain Chains
from langchain.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator # Vectorize db index with chromadb
from langchain.embeddings import HuggingFaceEmbeddings # For using HuggingFace embedding models
from langchain.text_splitter import CharacterTextSplitter # Text splitter

warnings.filterwarnings("ignore")

In [None]:
# Get API key and URL from .env
#load_dotenv()
api_key = "<YOUR API KEY HERE>"
ibm_cloud_url = "https://us-south.ml.cloud.ibm.com"
project_id = "<YOUR PROJECT ID HERE>"

if api_key is None or ibm_cloud_url is None or project_id is None:
    raise Exception("One or more environment variables are missing!")
else:
    creds = {
        "url": ibm_cloud_url,
        "apikey": api_key 
    }

In [None]:
# Initialize the watsonx model
params = {
    GenParams.DECODING_METHOD: "sample",
    GenParams.TEMPERATURE: 0.2,
    GenParams.TOP_P: 1,
    GenParams.TOP_K: 25,
    GenParams.REPETITION_PENALTY: 1.0,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 20
}

llm_model = Model(
    model_id="google/flan-ul2",
    params=params,
    credentials=creds,
    project_id=project_id
)

print("Done initializing LLM.")

In [None]:
# Send a simple prompt to model
countries = ["France", "Japan", "Australia"]

try:
  for country in countries:
    question = f"What is the capital of {country}"
    res = llm_model.generate_text(question)
    print(f"The capital of {country} is {res.capitalize()}")
except Exception as e:
  print(e)

## 1.2 Zero-Shot Prompt and Few-Shot Prompt using Prompt Template
- Real use case can be more complex. Instead of sending plain prompts to LLM, we are using Langchain Prompt Template. 
- In this example, we are using Langchain Prompt Template to send prompt to the LLM model (Google flan-ul2).
- Advantags of using Prompt Template:
    1. **Modularity**: With a prompt template, you can define a structured template once and reuse it with different input variables. This makes your code more modular and easier to maintain.
    2. **Dynamic Input**: Prompt templates allow for dynamic input variables, such as "country" in this example. This means you can easily change the input value without modifying the entire prompt structure.
    3. **Readability**: Templates provide a clear structure for the prompt, making it easier for other developers to understand the purpose of the prompt and how it interacts with the model.
    4. **Flexibility**: You can customize the template to suit your specific use case or domain requirements. This flexibility enables you to adapt the prompt to different scenarios without rewriting the entire prompt logic.

## Zero-shot Prompt
- Zero-shot prompt is the simplest type of prompt. It provides no examples to the model, just the instruction. 
- You can phrase the instruction as a question. i.e: *"Explain the concept of Generative AI."*
- You can also give the model a 'role'. i.e: *"You are a Data Scientist. Explain the concept of Generative AI."*

In [None]:
# Define the prompt template
prompt = PromptTemplate(
  input_variables=["country"],
  template= "What is the capital of {country}?",
)

try:
  # In order to use Langchain, we need to instantiate Langchain extension
  lc_llm_model = WatsonxLLM(model=llm_model)
  
  # Define a chain based on model and prompt
  chain = LLMChain(llm=lc_llm_model, prompt=prompt)

  # Getting predictions
  countries = ["France", "Japan", "Australia"]
  for country in countries:
    response = chain.run(country)
    print(prompt.format(country=country) + " = " + response.capitalize())
    sleep(0.5)
except Exception as e:
  print(e)

## Few-shot Prompt
- Few-shot prompt is giving the model a few examples to figure out how to handle similar task in the future.
- It helps the model understand the task better.

In [None]:
from langchain.prompts import FewShotChatMessagePromptTemplate, ChatPromptTemplate

# Few -shot examples
examples = [
    {"input": "What is the capital of Sweden?", "output": "Stockholm"},
    {"input": "What is the capital of Malaysia?", "output": "Kuala Lumpur"},
]

example_prompt = ChatPromptTemplate.from_messages(
    [('human', '{input}'), ('ai', '{output}')]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
)

final_prompt = ChatPromptTemplate.from_messages(
    [
        #('system', 'You are a helpful AI Assistant'),
        few_shot_prompt,
        ('human', '{input}'),
    ]
)

In [None]:
try:
  # In order to use Langchain, we need to instantiate Langchain extension
  lc_llm_model = WatsonxLLM(model=llm_model)
  
  # Define a chain based on model and prompt
  chain = LLMChain(llm=lc_llm_model, prompt=final_prompt)

  # Getting predictions
  countries = ["France", "Japan", "Australia"]
  for country in countries:
    prompt = f"What is the capital of {country}?"
    print(prompt)
    response = chain.run(prompt)
    print(response)
    #print(prompt.format(country=country) + " = " + response.capitalize())
    sleep(0.5)
except Exception as e:
  print(e)

In [None]:
from langchain.prompts import FewShotChatMessagePromptTemplate, ChatPromptTemplate

# Few -shot examples
examples = [
    {"input": "What is the capital of Sweden?", "output": "The capital of Sweden is Stockholm"},
    {"input": "What is the capital of Malaysia?", "output": "The capital of Malaysia is Kuala Lumpur"},
]

example_prompt = ChatPromptTemplate.from_messages(
    [('human', '{input}'), ('ai', '{output}')]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
)

final_prompt = ChatPromptTemplate.from_messages(
    [
        #('system', 'You are a helpful AI Assistant'),
        few_shot_prompt,
        ('human', '{input}'),
    ]
)

In [None]:
try:
  # In order to use Langchain, we need to instantiate Langchain extension
  lc_llm_model = WatsonxLLM(model=llm_model)
  
  # Define a chain based on model and prompt
  chain = LLMChain(llm=lc_llm_model, prompt=final_prompt)

  # Getting predictions
  countries = ["France", "Japan", "Australia"]
  for country in countries:
    prompt = f"What is the capital of {country}?"
    print(prompt)
    response = chain.run(prompt)
    print(response)
    #print(prompt.format(country=country) + " = " + response.capitalize())
    sleep(0.5)
except Exception as e:
  print(e)

## 1.3 Sequential Prompts using Simple Sequential Chain
- By using Simple Sequential Chain in LangChain, you can easily chain multiple prompts to create sequential prompts.
- Prompt chaining, also known as Sequential prompts, enables the response to one prompt to become the input for the next prompt in the sequence.
- Each subsequent prompt is informed by the AI's previous response, creating a chain of interactions that progressively refines the model's output.
- Reference: [SimpleSequentialChain](https://api.python.langchain.com/en/latest/chains/langchain.chains.sequential.SimpleSequentialChain.html)

In [None]:
# Create two sequential prompts 
pt1 = PromptTemplate(input_variables=["topic"], template="Generate a random question about {topic}: Question: ")
pt2 = PromptTemplate(
    input_variables=["question"],
    template="Answer the following question: {question}",
)

In [None]:
# Instantiate 2 models (Note, these could be different models depending on use case)
# Note the .to_langchain() method which returns a WatsonxLLM wrapper, like above.
model_1 = Model(
    model_id="google/flan-ul2",
    params=params,
    credentials=creds,
    project_id=project_id
).to_langchain()

model_2 = Model(
    model_id="google/flan-ul2",
    credentials=creds,
    project_id=project_id
).to_langchain()

In [None]:
# Construct the sequential chain
prompt_to_model_1 = LLMChain(llm=model_1, prompt=pt1)
prompt_to_model_2 = LLMChain(llm=model_2, prompt=pt2)
qa = SimpleSequentialChain(chains=[prompt_to_model_1, prompt_to_model_2], verbose=True)

In [None]:
# Run our chain with the topic: "an animal"
# Play around with providing different topics to see the output. eg. cars, the Roman empire
try:
  qa.run("an animal")
except Exception as e:
  print(e)

## 1.4 Retrieval Question Answering (QA)
- Using Retrieval Question Answering (QA) in LangChain, you can easily extract passages from documents as answers to your prompt (Question). 
- To begin, download a sample pdf file from this link: [what_is_generative_ai.pdf](https://ibm.box.com/v/what-is-generative-ai)
- Then, upload your file to Project and create the access token.

In [None]:
# Import library
from ibm_watson_studio_lib import access_project_or_space
from langchain.chains import RetrievalQA

In [None]:
# Create access token in project
token = "<YOUR ACCESS TOKEN HERE>"
wslib = access_project_or_space({"token":token})
wslib.download_file("what_is_generative_ai.pdf")

In [None]:
# Load PDF document
pdf = 'what_is_generative_ai.pdf'
loaders = [PyPDFLoader(pdf)]

In [None]:
# Index loaded PDF
index = VectorstoreIndexCreator(
    embedding = HuggingFaceEmbeddings(),
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)).from_loaders(loaders)

In [None]:
# Initialize watsonx google/flan-ul2 model
params = {
    GenParams.DECODING_METHOD: "sample",
    GenParams.TEMPERATURE: 0.2,
    GenParams.TOP_P: 1,
    GenParams.TOP_K: 100,
    GenParams.MIN_NEW_TOKENS: 50,
    GenParams.MAX_NEW_TOKENS: 300
}

model = Model(
    model_id="google/flan-ul2",
    params=params,
    credentials=creds,
    project_id=project_id
).to_langchain()

In [None]:
# Initialize RAG chain
chain = RetrievalQA.from_chain_type(llm=model, 
                                    chain_type="stuff", 
                                    retriever=index.vectorstore.as_retriever(), 
                                    input_key="question")

In [None]:
# Answer based on the document
res = chain.run("What is Machine Learning?")
print(res)

In [None]:
# Answer based on the document
res = chain.run("What are the problems generative AI can solve?")
print(res)

In [None]:
# Answer based on the document
res = chain.run("What are the risks of Generative AI?")
print(res)

## 1.5 Documents Summarization
- Text summarization is a task in NLP that makes short but informative summaries of long texts. LLM can be used to make summaries of news articles, research papers, technical documents, and other kinds of text.
- Summarizing long documents can be challenging. To generate summaries, you need to apply summarization strategies on your indexed documents. 
- In this example, we will summarize long documents from these 3 websites:
     - https://www.ibm.com/blog/what-can-ai-and-generative-ai-do-for-governments/
     - https://www.govexec.com/technology/2023/07/what-will-federal-government-do-generative-ai/388595/
     - https://www.thomsonreuters.com/en-us/posts/government/ai-use-government-agencies/
- When building a summarizer app, these are methods to pass your documents into the LLM’s context window:
    1. **Method 1: Stuff** - Simply “stuff” all documents into a single prompt. (Simplest method)
    2. **Method 2: MapReduce** - Summarize each document on it’s own in a “map” step and then “reduce” the summaries into a final summary.

In [None]:
# Install library
!pip3 install transformers chromadb langchain

In [None]:
# Import libraries
import os
from dotenv import load_dotenv
from langchain.document_loaders import WebBaseLoader
from langchain.chains.summarize import load_summarize_chain
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.foundation_models.extensions.langchain import WatsonxLLM

## Method 1: Stuff
- This method simply “stuff” all documents into a single prompt.
- What you need to do is setting `stuff` as `chain_type` of your chain.

### Stuff without using Prompt Template
- Prompt and LLMs pipeline is wrapped in a single object: `load_summarize_chain`.
- Set `stuff` as the `chain_type`.
- In this example, you will see that the relatively short document will be summarized successfully.

In [None]:
# Initialize document loader
loader = WebBaseLoader("https://www.ibm.com/blog/what-can-ai-and-generative-ai-do-for-governments/")
doc = loader.load()

# Initialize watsonx google/flan-t5-xxl model
# You might need to tweak some of the runtime parameters to optimize the results
params = {
    GenParams.DECODING_METHOD: "sample",
    GenParams.TEMPERATURE: 0.15,
    GenParams.TOP_P: 1,
    GenParams.TOP_K: 20,
    GenParams.REPETITION_PENALTY: 1.0,
    GenParams.MIN_NEW_TOKENS: 20,
    GenParams.MAX_NEW_TOKENS: 205
}

flan_model = Model(
    model_id="google/flan-t5-xxl", 
    params=params,
    credentials=creds,
    project_id=project_id
).to_langchain()

# Set chain_type as 'stuff'
chain = load_summarize_chain(flan_model, chain_type="stuff")

# Run summarization task
res = chain.run(doc)
print(res)

### Stuff using Prompt Template
- You will load the document into a prompt template and run a "stuffed document chain". Note that we can stuff a list of documents as well.
- `StuffDocumentsChain` will be used as part of the `load_summarize_chain` method.
- In this example, you will see the same summarization output as above.
- Reference: [StuffDocumentsChain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.StuffDocumentsChain.html#langchain.chains.combine_documents.stuff.StuffDocumentsChain)

In [None]:
#Import librararies
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

# Define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# Define LLMs chain
llm_chain = LLMChain(llm=flan_model, prompt=prompt)

# Define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(
    llm_chain=llm_chain, document_variable_name="text"
)

# Run summarization task 
res = stuff_chain.run(doc)
print(res)

### Limitation of 'Stuff' Method due to LLMs token limit
- In this example, you will see that as we add more documents (which increase the tokens), this error will be raised: `the number of input tokens 5222 cannot exceed the total tokens limit 4096 for this model`
- This is due to the token limit for the model (Max context window length). 
- With LangChain, this can be worked around by using `MapReduce` which execute chunking and recursive summarization method.

In [None]:
# Load a new document from URL
loader_2 = WebBaseLoader('https://www.govexec.com/technology/2023/07/what-will-federal-government-do-generative-ai/388595/')
doc_2 = loader_2.load()

# Combine the new document to the previous document
docs = doc + doc_2

# Run the stuff chain
try:
  res = stuff_chain.run(docs)
  print(res)
except Exception as e:
  print(e)

## Method 2: MapReduce
- This method summarize each document on it’s own in a “map” step and then “reduce” the summaries into a final summary.
- Reference: [ReduceDocumentsChain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.reduce.ReduceDocumentsChain.html#langchain.chains.combine_documents.reduce.ReduceDocumentsChain)
- Reference: [MapReduceDocumentsChain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.map_reduce.MapReduceDocumentsChain.html#langchain.chains.combine_documents.map_reduce.MapReduceDocumentsChain)

In [None]:
from transformers import AutoTokenizer
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain
from time import perf_counter

# Add a 3rd document
print("Loading 3rd document...")
loader_3 = WebBaseLoader("https://www.thomsonreuters.com/en-us/posts/government/ai-use-government-agencies/")
doc_3 = loader_3.load()
docs = docs + doc_3

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
print("Init map chain...")
map_chain = LLMChain(llm=flan_model, prompt=map_prompt)

# Reduce
reduce_template = """The following is set of summaries:
{doc_summaries}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
print("Init reduce chain...")
reduce_chain = LLMChain(llm=flan_model, prompt=reduce_prompt)

# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
print("Stuff documents using reduce chain...")
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="doc_summaries"
)

# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,
    # The maximum number of tokens to group documents into.
    token_max=4000
)

# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=True,
    verbose=True
)

# Note here we are using a pretrained tokenizer from Huggingface, specifically for the flan-ul2 model.
# You might want to play around with different tokenizers and text splitters to see how the results change.
print("Init chunk splitter...")
try:
    tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xxl") # Hugging face tokenizer for flan-ul2
    text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
        tokenizer=tokenizer
    )
    split_docs = text_splitter.split_documents(docs)
    print(f"Using {len(split_docs)} chunks: ")
except Exception as ex:
    print(ex)

print("Run map-reduce chain. This should take ~15-30 seconds...")
try:
    t1_start = perf_counter()
    results = map_reduce_chain(split_docs)
    steps = results["intermediate_steps"]
    output = results["output_text"]
    t1_stop = perf_counter()
    print("Elapsed time:", round((t1_stop - t1_start), 2), "seconds.\n") 

    print("Results from each chunk: \n")
    for idx, step in enumerate(steps):
        print(f"{idx + 1}. {step}\n")
    
    print("\n\nFinal output:\n")
    print(output)

    print("\nDone.")
except Exception as e:
    print(e)

- As you can see, Langchain along with a tokenizer for the model can quickly divide a larger amount of text into chunks and recursively summarize into a concise sentence or two. You might want to play around with trying different documents, tweaking the model runtime parameters, and trying a different model alltogether to see how things behave. One of the most important things to note in order to get good results is that the way the input is chunked and tokenized matters a lot. Passing poor map results will result in a lower quality summarization.