Install required dependencies for watsonx, llama-index...

In [1]:
!pip install ibm-watsonx-ai -q
!pip install wget -q
!pip install llama-index-llms-watsonx -q
!pip install llama-index-embeddings-huggingface -q
!pip install llama-index -q


Download the PDF file that will be interrogated...

In [2]:
import wget
import os


pdf_url = "https://www.ibm.com/annualreport/assets/downloads/IBM_Annual_Report_2023.pdf" 
input_filename = "IBM_Annual_Report_2023.pdf"

download_dir = os.getcwd()

if not os.path.isfile(input_filename):
   wget.download(pdf_url, out=input_filename)



Load the file using llama-index's SimpleDirectoryReader. The file is loaded and split into a list of Document objects...

In [3]:
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext

#document loader
documents = SimpleDirectoryReader(download_dir).load_data()

Set up credentials for Watson Machine Learning service inference endpoint that hosts the foundation models. Input the api_key and watsonx project id. Cell is hidden. Please head over here to see what is in it: https://community.ibm.com/community/user/blogs/ravi-bansal/2024/03/16/document-interrogation-using-retrieval-augmented-g

In [4]:
# The code was removed by Watson Studio for sharing.

An instance of the wrapper IBM WatsonX is initialized using the WatsonX class from the llama-index. Note that not all IBM-hosted foundation modes are supported by llama-index yet. Version 1 of IBM's Granite models are supported but they will be deprecated soon. So, we will use Meta's llama-2-70b-chat model.

We set parameters for the AI model, specifying the decoding method, minimum and maximum number of new tokens, and stop sequences. The model ID, URL + API key (part of credentials dictionary), project ID, and other parameters are provided as inputs. The instance of WatsonX gives us a handle to the "LLAMA_2_70B_CHAT" foundation model so we can run inferences against it in the next step.

In [5]:
from llama_index.llms.watsonx import WatsonX

llm = WatsonX(
    model_id= "meta-llama/llama-2-70b-chat", 
    credentials=credentials,
    project_id=project_id,
    temperature=0.3,
    max_new_tokens=200,
)

Let us pose a series of questions to the "unaugmented" LLM directly. The LLM is not yet augmented with additional context via RAG. It should not be able to provide accurate answers. 

In [6]:

print("---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "Which IBM segment had the highest revenue in 2023?"
resp = llm.complete(query)
print(f"{query}", resp)
print("---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "In 2023, who acquired the company Equine Global?"
resp = llm.complete(query)
print(f"{query}", resp)
print("---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "What was IBM revenue (in millions) from the United States in 2023?"
resp = llm.complete(query)
print(f"{query}", resp)
print("----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "Generate a bulleted list of names of companies acquired by IBM in 2023."
resp = llm.complete(query)
print(f"{query}", resp)
print("----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Which IBM segment had the highest revenue in 2023? 
In 2023, the IBM segment with the highest revenue was Cloud & Cognitive Software, which generated $24.3 billion in revenue, accounting for 53% of the company's total revenue. This segment includes IBM's cloud computing, artificial intelligence, and data analytics offerings.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
In 2023, who acquired the company Equine Global? 

Answer: In 2023, the company Equine Global was acquired by Elanco Animal Health.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
What wa

The answers received are indeed not accurate. Let us provide the LLM additional relevant context via RAG. First we need to create vector embeddings, that is, semantically meaningful numerical representations of the text chunks of our document and generate a vector index. Here, we are using "huggingface bge-small-en-v1.5" model for the embedding generation and we store the embeddings in the VectorStoreIndex - a default in-memory vector database provided by LlamaIndex. 

The index will make it easy to quickly find contextually relevant data from the vector store. Note that llama-index provides a method "VectorStoreIndex.from_documents()" that abstracts the underlying chunking (referred to as "Nodes" in llama-index). It directly provides the index stored in memory based on the Document object passed in. In this notebook, the index is not persisted to a database like Chroma or Milvus although that can be easily done with 2 additional calls. If persisted, the vector database has to be loaded before being  searched. 

In [7]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Use BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name= "BAAI/bge-small-en-v1.5")

from llama_index.core import Settings
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(documents)

Next, from the VectorStoreIndex, we obtain an instance of the query engine that will respond to our queries. An instance of the LLM is passed to it. 
When the query engine receives a user query, it will create an embedding of the query and use the index to retrieve the top 2 semantically similar (i.e. relevant) chunks using a semantic simlarity search. In the code that follows, the same 4 queries that were sent to the "unaugmented" LLM are now passed to the LLM with RAG. The responses are found to be much more accurate.

In [8]:
query_engine = index.as_query_engine(llm=llm)
print("----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "Which IBM segment had the highest revenue in 2023?"
resp = query_engine.query(query)
print(f"{query}", resp)
print("----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "In 2023, who acquired the company Equine Global?"
resp = query_engine.query(query)
print(f"{query}", resp)
print("----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "What was IBM revenue (in millions) from the United States in 2023?"
resp = query_engine.query(query)
print(f"{query}", resp)
print("----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")
query = "Generate a bulleted list of names of companies acquired by IBM in 2023"
resp = query_engine.query(query)
print(f"{query}", resp)
print("----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------")


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Which IBM segment had the highest revenue in 2023? 
The highest IBM segment revenue in 2023 was Software, with a revenue of $26,308 million, which increased 5.1% as reported (5% adjusted for currency) compared to the prior year.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
In 2023, who acquired the company Equine Global?  IBM acquired Equine Global in 2023.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
What was IBM revenue (in millions) from the United States in 2023? 25,309.
-------------------------------------------------------------------------