## Installation

Install required libs

In [None]:
!pip install llama-index-llms-ibm==0.1.0 --user
!pip install llama-index-embeddings-ibm==0.1.0 --user
!pip install llama-index==0.10.65 --user

Import required libs

In [None]:
# Use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

from llama_index.llms.ibm import WatsonxLLM
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.ibm import WatsonxEmbeddings
from llama_index.core import VectorStoreIndex

## Loading the data

`SimpleDirectoryReader` reads data from local file to LlamaIndex. Reads all files from specified directory and processes them as text files.

A `Document` acts as a container around any datas ource.
Key attributes of a `Document`:
* metadata: annotations that can be append to a text (name of the author, title of document)
* relationships: reference to other documents or notes

In [None]:
!wget "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/pfSOEORnYBZppsnhmZ1a8A/lora-paper.pdf"
documents = SimpleDirectoryReader(input_files=["lora-paper.pdf"]).load_data()
# get first page of paper
documents[0]

## Splitting the data

Splitting into smaller chunks: dividing into nods. A `Node` is a chunk (piece of text, image, other data) and contains metadata and relationships to other nodes.
`chunk_size` is the maximum size of each node. 
Here we use `SentenceSplitter` with a chunk_size of 500

In [None]:
splitter = SentenceSplitter(chunk_size=500)
nodes = splitter.get_nodes_from_documents(documents)
len(nodes) # outputs 87

node_metadata = nodes[0].get_content(metadata_mode=True)
print(str(node_metadata))

## Indexing the chunks

An `Index` is the core foundation of RAG use-cases

Embedding = converting text data into vectors

Here we use the `VectorStoreIndex` from LlamaIndex: converts nodes into vector representation and stores them in a vector store

After indexing we can retrieve the most relevant nodes by using the index as a retriever.

In [None]:
watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="skills-network",
    truncate_input_tokens=3,
)

index = VectorStoreIndex(
    nodes=nodes, 
    embed_model=watsonx_embedding, 
    show_progress=True
)

# use index as a retriever
base_retriever = index.as_retriever(similarity_top_k=3) # 3 for top 3 results
source_nodes = base_retriever.retrieve("GPT-2") # querying about GPT-2
for node in source_nodes:
    # print(node.metadata)
    print(f"---------------------------------------------")
    print(f"Score: {node.score:.3f}")
    print(node.get_content())
    print(f"---------------------------------------------\n\n")

## Querying

Integrate a LLM to generate responses. LLM takes the responses from the VectorStoreIndex and generates an answer to the user input. 
First define the model

Granite = decoder-only model

In [None]:
temperature = 0.1
max_new_tokens = 75
additional_params = {
    "decoding_method": "sample",
    "min_new_tokens": 1,
    "top_k": 50,
    "top_p": 1,
}

watsonx_llm = WatsonxLLM(
    model_id="ibm/granite-3-8b-instruct",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="skills-network",
    temperature=temperature,
    max_new_tokens=max_new_tokens,
    additional_params=additional_params,
)

response = watsonx_llm.complete("What is a Generative AI?")
print(response)