# Step 1: Retrieving Information from different sources (Websites, Files, Databases ...)

## 1A: Loading Web Content 

We'll use LangChain's Selenium plugin from the Unstructured library to retrieve content from websites:

In [1]:
from langchain_community.document_loaders import SeleniumURLLoader

In [2]:
urls = ["https://raw.githubusercontent.com/synaptrixai/SpiLLI/refs/heads/main/README.md"]
loader = SeleniumURLLoader(urls=urls, browser='chrome', headless=True)
docs = loader.load()

In [3]:
print(docs[0].page_content)

# SpiLLI
SpiLLI provides infrastructure to manage, host, deploy and run decentralized AI inference

SpiLLI infrastructure comprises of two components: 1. SpiLLI SDK (a library / framework to write decentralized AI applications) and 2. SpiLLIHost (a host software allowing decentralized nodes to execute and connect AI models to peer nodes)

**Note SpiLLI is currently in beta testing and comes with no warranties, can have bugs and we appreciate you helping us iron out all its flaws with feedback and suggestions in the Issues and Discussions tabs in this repository**

# System requirements

We currently support Ubuntu 24.04 and Windows 10/11 operating systems for **SpiLLIHost** (host nodes for AI models). SpiLLIHost currently requires a NVidia GPU (with its driver installed) to run the AI models. Support for other OS and GPU/CPU variants will be coming soon.

SpiLLI SDK currently provides an interface to python 3.12 (support for other python versions and languages coming soon).

# Installa

### Text Splitting

Split the loaded content into smaller manageable chunks of information:

In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
documents_web = text_splitter.split_documents(docs)

## 1B: Working with Files

### Loading PDF Documents

For local pdf files, we'll use `PyPDFLoader'

For the purpose of this tutorial, lets download an example pdf (A survey on how Agentic RAG systems are built)

In [5]:
url = "https://arxiv.org/pdf/2501.09136"
filename = "AgenticRAG.pdf"
import requests
import os
if not os.path.exists(filename):
    try:
        # Send a GET request to the URL
        response = requests.get(url)
        
        # Check if the request was successful (status code 200)
        if response.status_code == 200:
            with open(filename, 'wb') as f:
                f.write(response.content)
            print(f"PDF downloaded successfully and saved as {filename}")
        else:
            print(f"Failed to download PDF. Status code: {response.status_code}")
            
    except Exception as e:
        print(f"An error occurred while downloading the PDF: {str(e)}")
        print(f"Please browse to the url from your browser, download and save it as AgenticRAG.pdf to the scripts folder to use with this tutorial")

PDF downloaded successfully and saved as AgenticRAG.pdf


In [6]:
from langchain.document_loaders import PyPDFLoader
# Load content from a PDF file
loader = PyPDFLoader("AgenticRAG.pdf")
documents_pdf = loader.load()

In [7]:
len(documents_pdf)

39

## 1C: Working with databases

In [8]:
import sqlite3

# Create a SQLite database file
conn = sqlite3.connect("tutorial.db")
cursor = conn.cursor()

# Create a table
cursor.execute("""
CREATE TABLE IF NOT EXISTS documents (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    content TEXT
)
""")

# Insert sample data
cursor.executemany("INSERT INTO documents (content) VALUES (?)", [
    ("LangChain makes it easy to work with LLMs.",),
    ("SQLLoader allows you to load data from SQL databases.",),
    ("This is a sample document for the tutorial.",),
])

conn.commit()
conn.close()

In [9]:
from langchain_community.document_loaders.sql_database import SQLDatabaseLoader, SQLDatabase

# Example for PostgreSQL
db = SQLDatabase.from_uri("sqlite:///tutorial.db")
loader = SQLDatabaseLoader(
    db=db,
    query="SELECT content FROM documents;"
)
documents_db = loader.load()

There are a lot of other data sources that could be of interest to you to retrieve data from. You can find a comprehensive list of document loaders from the community at:

https://python.langchain.com/api_reference/community/document_loaders.html

# Step 2: Encoding & Indexing Retrievable Data

In order to efficiently search and retrieve the relevant information later, we create an efficient index out of our information pool and save it as a indexed database.

We will use a FAISS vector database for creating a efficient index.

In [10]:
from langchain.vectorstores import FAISS

In [11]:
from langchain_huggingface import HuggingFaceEmbeddings

In [12]:
embedding_function = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm


#### Create a vectorstore for our index 

In [13]:
# Create the vector store
vectorstore = FAISS.from_documents(documents_web+documents_pdf, embedding_function)

#### Create a retriever object to get indexed documents from the vectorstore

In [14]:
retriever = vectorstore.as_retriever()

# Step 3: Use the retriever to Augment LLM context

We setup a chain of operation using langchains convenient syntax. The chain performs the following operations for a given query input:

1.  first performs a retrieval operation for the relevant documents from the vectordatabase
2.  calls the LLM with the user query and retrieved documents passed as an input to the LLM (instead of just the user query)

### Lets get an LLM object using SpiLLI

In [15]:
from SpinLLM import SpinLLM
# models to try: llama3.2:latest, gemma3:1b
llm = SpinLLM(
    model_name="llama3.2:latest",
    encryption_path='/root/.spilli/SpiLLI.pem',
    temperature=0.8,
    max_tokens=512
)

Connecting using cus_id: cus_RYZvwKJNJ4M6Jc


Invalid message format: Missing command field


### Create the RAG chain

In [16]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Get the RAG prompt template from LangChain Hub (you can ignore iany langsmithapikeywarnings)
rag_prompt = hub.pull("rlm/rag-prompt")



In [17]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

In [18]:
rag_prompt  # You can look at what the prompt look like here

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

### Run queries using the RAG chain

In [19]:
query = "What can you tell me about SpiLLI?"
response = qa_chain.invoke(query)
print(response)

SpiLLI is a decentralized network of AI hosts and AI users that allows applications to connect to the best available computing resources without relying on centralized cloud providers. It uses three key principles: dynamic resource allocation, decentralized nodes, and AI models. SpiLLI is currently in beta testing and has no warranties, with potential bugs and limitations.


In [20]:
query = "What are the principles on which decentralized AI is built?"
response = qa_chain.invoke(query)
print(response)

The decentralized AI is built on three key principles: Dynamic Resource Allocation, Decentralized Infrastructure, and Real-time Adaptation.


#### Naive RAG

With the above code we implemented a naive RAG system, where a similarity search is performed directly on the user query to the documents and the closest documents are added to the context of the LLM for generating a response to the query.

While this is ok to do in principle, it does not always yield the best results. 

Why you may ask?  

When we perform context retrieval this way, we only look at keywords similarity between keywords in the query and the document objects. But relevance of the document is not just be a matter of searching for keywords, there is often meaning in non keywords, for example "Tell me about SpiLLI", yields "To install SpiLLI SDK, ...", not quite capturing the intent of what we asked. While it is true that the document relating to installation for SpiLLI SDK (scoringing high on keyword similarity) is relevant, it is not exactly what we asked for. We had to explicitly put the keywords "decentralized AI" and "principles" in our second query to coax out the document from the retriever similarity search that would have been more meaningful in responding to a query like "What can you tell me about SpiLLI". 

This is a common occurence and problem with Naive RAG and thus more advanced RAG techniques are often used in practice (see the AgenticRAG.pdf for more ideas and details).

In [21]:
query = "What can you tell me about Naive RAG in Agentic RAG systems?"
response = qa_chain.invoke(query)
print(response)

I don't know.


In [22]:
query = "What is the role of ranking and agents in RAG systems?"
response = qa_chain.invoke(query)
print(response)

According to the context, the role of ranking and agents in RAG systems is to enable autonomous decision-making processes, allowing the system to adapt to complex queries and handle diverse data sources.


In [23]:
query = "What are the different RAG strategies?"
response = qa_chain.invoke(query)
print(response)

I don't know.


#### As you may have noticed, the Naive RAG does not have semantic understanding of the documents and user queries and thus often fails to retrieve the relevant information even when present in the vector database. This is where addtional LLM and agentic steps are required to improve the retrieval performance.

#### We can improve upon the retrieval performance using ideas like "Re-ranking", "Chain-of-thought" and agentic retrieval where AI models are used to in a multi-step retrieval process to create a better set of retrieved documents to feed into the final response generation LLM.

# Step 4: Performance improvement

## Re-ranking Models for RAG

### Exercise: Implement document re-ranking as a intermediate step between retrieval and response generation

## Chain-of-Thought for RAG

### Exercise: Implement a a multi-step chain of thought for response generation from the retrieved documents