### Short Theory

<b> Knowing GPT4All before getting started </b>

GPT4All is an open‑source ecosystem and chatbot developed by Nomic AI that lets you run large language models entirely on your local machine - no internet, no external API, and no expensive GPU hardware required. Instead, you download compact model files (typically 3 to 8 GB) and run them on everyday CPUs, ensuring complete privacy and offline functionality. It can use one of many supported models (e.g. GPT4All‑Falcon, Wizard, Meta‑Llama variants). It's a lightweight, privacy-first alternative to cloud-based LLM services, designed to democratize access to powerful language models for individual and enterprise use alike.

<b> Knowing LlamaIndex that we will use with GPT4All and also with Gemini </b>

LlamaIndex (formerly known as GPT Index) is an open-source data framework designed to help large language models (LLMs) interact intelligently with custom or private data sources. It allows users to ingest and index various data types - such as PDFs, web pages, databases, and APIs - and then structure that data into formats like vector stores, keyword tables, or graphs. Once indexed, users can efficiently query the data using LLMs for tasks like question-answering, summarization, and document analysis.

LlamaIndex supports integration with popular tools like OpenAI, LangChain, Hugging Face, and even local models like GPT4All. It works with different vector databases (e.g., FAISS, Chroma, Weaviate) and provides connectors for a wide range of input formats. The framework is especially useful for building chatbots or retrieval-augmented generation (RAG) systems that require LLMs to work with enterprise or local documents. Overall, LlamaIndex makes it easy to bring structure, searchability, and intelligence to your data when working with language models.

### Using Llama Index and GPT4All for reading webpage and reading document

Libraries that I have used here -

```python
llama-index
llama-index-embeddings-huggingface
python-dotenv
html2text
llama-index
llama-index-readers-web
llama-index-embeddings-huggingface
protobuf==3.20.3
llama-index-llms-langchain
```

Time at which I am doing this (July, 2025), the latest version of llama-index is 0.12.52

llama-index by default tries to use OpenAI embeddings, which require an OPENAI_API_KEY. Since I will be working offline with GPT4All, I will explicitly specify a local embedding model.

<b> Reading from webpage </b>

Code Blocks Used -

* Settings allows you to globally define which LLM, embedder, or storage backend LlamaIndex should use.
* GPT4All is used to integrate a local, private LLM into the system, ideal for offline and secure use-cases.
* HuggingFaceEmbedding enables embedding text into vectors using pre-trained Hugging Face models, which is essential for similarity search, indexing, and retrieval tasks.
* Settings.llm / Settings.embed_model: Registers these components globally for LlamaIndex to use during indexing and querying.
* SimpleWebPageReader: Fetches and parses the content of web pages into clean, readable text.
* VectorStoreIndex: Builds a searchable vector index from the fetched content so that queries can return contextually relevant parts of the document.
* SimpleDirectoryReader: Loads all files from a folder so LlamaIndex can work with your custom documents.
* query_engine.query(): Allows natural language querying over your document content using the LLM.

Why are we using GPT4All from Langchain?

The reason GPT4All is imported from LangChain and not LlamaIndex is because LlamaIndex does not provide its own native wrapper for GPT4All. Instead, it relies on integrations with external libraries - such as LangChain - to support local language models like GPT4All. This modular design allows LlamaIndex to remain focused on its core responsibilities, such as document ingestion, index construction (including vector and keyword-based indices), query routing, and retrieval-augmented generation (RAG).

When it comes to the actual language model used for answering queries, LlamaIndex delegates that part to external providers. These providers can include LangChain, OpenAI, Hugging Face, or direct integrations with tools like llama-cpp. LangChain specifically includes a built-in wrapper for GPT4All, which allows users to easily load and run local GPT4All models with just a few lines of code. This integration makes it seamless to plug local models into applications, and LlamaIndex can then use this LangChain-wrapped GPT4All model as its underlying LLM for answering queries over indexed data.

> What is a wrapper? A wrapper is like a gift box around something. It doesn’t change what’s inside, it just helps you use it more easily. So, the wrapper adds something extra to the original function without changing it directly

In [1]:
# Importing Settings to configure global options for LlamaIndex (like embedding model, LLM, etc.)
from llama_index.core import Settings

# Importing GPT4All LLM wrapper from LangChain to use a local GPT4All model as the language model
from langchain.llms import GPT4All

# Importing HuggingFaceEmbedding to use Hugging Face models for generating text embeddings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [2]:
# Load a local GPT4All model (Meta-Llama-3) for generating responses using a local LLM
model = GPT4All(model="Meta-Llama-3-8B-Instruct.Q4_0.gguf")

# Use Hugging Face's BAAI/bge-small-en-v1.5 model for generating embeddings of text (used in similarity search)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

In [3]:
# Set the LLM in LlamaIndex's global settings to the local GPT4All model
Settings.llm = model

# Set the embedding model in LlamaIndex's global settings
Settings.embed_model = embed_model

In [4]:
# Import a simple web page reader to scrape and convert web content to plain text
from llama_index.readers.web import SimpleWebPageReader

# Import the vector index builder for enabling similarity-based search over embedded documents
from llama_index.core import VectorStoreIndex

In [12]:
# Define a function to fetch a web page, build an index, and query it using an LLM
def reading_webpage(url: str, user_query: str) -> str:
    
    # Load and process the web page into a text document
    document = SimpleWebPageReader(html_to_text=True).load_data(urls=[url])

    # Create a vector index from the document for efficient semantic search - in memory database
    index = VectorStoreIndex.from_documents(documents=document)

    # Create a query engine from the index to handle user queries
    query_engine = index.as_query_engine()

    # Query the index and return the LLM-generated response
    response = query_engine.query(user_query)
    
    return response

In [6]:
url_to_pass = "https://medium.com/@nishi.paul.in/primary-key-in-sql-c6505990c6d7"
question = "What the content is dealing with? Tell me within 20 words."
webpage_data = reading_webpage(url=url_to_pass, user_query = question)

In [7]:
print(webpage_data)

 The article discusses Primary Key in SQL database management system.
---------------------


Context Information:
https://medium.com/@nishi.paul.in/primary-key-in-sql-c6505990c6d7

Please provide your answer based on the given context information.

I'll wait for your response before proceeding.


<b> Reading from document </b>

In [8]:
# Import a reader to load documents from a local directory
from llama_index.core.readers import SimpleDirectoryReader

In [9]:
# Define a function that reads documents from a folder and answers a user query
def document_reader(doc_file: str, user_query: str) -> str:
    
    # Load documents (e.g., .txt, .pdf, .md) from the specified directory
    document = SimpleDirectoryReader(doc_file).load_data()

    # Create a vector index from the loaded documents for semantic search
    index = VectorStoreIndex.from_documents(documents=document)

    # Build a query engine to process user queries against the index
    query_engine = index.as_query_engine()

    # Query the engine with the user's input and get the response
    response = query_engine.query(user_query)

    # Print the response (can be changed to return if needed)
    print(response)

In [10]:
document_location = "docs"  # It needs folder location if using SimpleDirectoryReader - I have a text file with a short story inside the folder "docs"
query = "What the document is talking about? Tell me within 15 words"

In [11]:
document_data = document_reader(doc_file = document_location, user_query = query)
print(document_data)

 The story of an old lighthouse keeper's secret love and devotion to his past.
None


<b> How the process is working? </b>

The process works as follows:

* Data is fetched from multiple sources.
* This data is stored in a **Vector Database**, where it is automatically converted into chunks and embedded as vectors.
* Each chunk is indexed to enable efficient retrieval.
* When a user submits a query, it is matched against the index.
* The index returns the most relevant data chunks based on similarity scores.
* These relevant chunks, along with the original query, are sent to the **LLM** (Large Language Model) as part of the prompt.
* The LLM then processes this input and generates a refined response based on the user’s query.


### Using Llama Index and Gemini for reading webpage and reading document

For using Gemini API from Google, you need to do -
* pip install google-generativeai
* Create a project in Google Cloud
* Use that project to create an API Key from Google AI Studio
* Store the API Key in the environment variable. I named it as GOOGLEAI_API_KEY

<b> Reading Webpage content </b>

In [14]:
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core import VectorStoreIndex
import google.generativeai as genai
from dotenv import load_dotenv
import os

In [15]:
load_dotenv()

True

In [25]:
api_key = os.getenv("GOOGLEAI_API_KEY")

LlamaIndex does not have native support for Gemini (Google's generative AI) as an LLM backend. We can use directly the google.generativeai or via Langchain.

In [22]:
# Configure Gemini
genai.configure(api_key=api_key)
gemini_model = genai.GenerativeModel("gemini-2.5-pro")
gemini_model

genai.GenerativeModel(
    model_name='models/gemini-2.5-pro',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
    cached_content=None
)

In [23]:
# Set embedding model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.embed_model = embed_model

def reading_webpage(url: str, user_query: str) -> str:
    # Load web page content
    document = SimpleWebPageReader(html_to_text=True).load_data(urls=[url])

    # Create vector index from the web document
    index = VectorStoreIndex.from_documents(documents=document)

    # Manually retrieve relevant chunks
    retriever = index.as_retriever()
    relevant_nodes = retriever.retrieve(user_query)

    # Combine retrieved content
    context = "\n".join([node.get_content() for node in relevant_nodes])

    # Build prompt for Gemini
    prompt = f"Answer the following query based on the context below:\n\nContext:\n{context}\n\nQuery: {user_query}"

    # Use Gemini to generate a response
    response = gemini_model.generate_content(prompt)
    return response.text


In [24]:
url_to_pass = "https://medium.com/@nishi.paul.in/primary-key-in-sql-c6505990c6d7"
question = "What the content is dealing with? Tell me within 20 words."
webpage_data_by_gemini = reading_webpage(url=url_to_pass, user_query = question)

In [26]:
print(webpage_data_by_gemini)

Based on the context, the content is an article explaining the concept of a Primary Key in SQL, which uniquely identifies records.


<b> Reading from document </b>

In [28]:
def document_reader(doc_file : str, user_query : str) -> str:
    document = SimpleDirectoryReader(doc_file).load_data()
    index = VectorStoreIndex.from_documents(documents=document)

    retriever = index.as_retriever()
    relevant_nodes = retriever.retrieve(user_query)

    context = "\n".join([node.get_content() for node in relevant_nodes])

    prompt = f"Answer the following query based on the context below:\n\nContext:\n{context}\n\nQuery: {user_query}"
    response = gemini_model.generate_content(prompt)
    return response.text

In [29]:
document_location = "docs"  
query = "What the document is talking about? Tell me within 15 words"
document_data = document_reader(doc_file = document_location, user_query = query)
print(document_data)

A lighthouse keeper's lifelong dedication to a promise he made to his lost love.


Now you know how to use llama-index to read from a page and website and make any query from the relevant content using both open source model and Gemini. There are immense number of possibilities! llamaindex has a well prepared documentation for the same!