# Agentic Chunking with LangChain & watsonx.ai

This notebook demonstrates how to use an LLM to dynamically split text into semantically meaningful chunks (Agentic Chunking) to optimize Retrieval Augmented Generation (RAG).

In [None]:
# Install required package for UnstructuredPDFLoader (use %pip in notebook)
%pip install -q unstructured
%pip install pdfminer

In [None]:
# Step 1: Install Dependencies
!pip install -q langchain langchain-ibm langchain_experimental langchain-text-splitters langchain_chroma transformers bs4 langchain_huggingface sentence-transformers

In [2]:
# Step 2: Import Libraries

# 1. LangChain Core - Base abstractions (Prompt, Messages, Documents)
from langchain_core.documents import Document
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
from langchain_core.tools import tool
from langchain_community.document_loaders import UnstructuredPDFLoader


# 2. Partner Packages - Specific integrations (The modern way)
# specific pip install: langchain-huggingface
from langchain_huggingface import HuggingFaceEmbeddings 
# specific pip install: langchain-chroma
from langchain_chroma import Chroma 
# specific pip install: langchain-ibm
from langchain_ibm import WatsonxLLM 

# 3. LangChain Main - Chains, Agents, and Memory
from langchain_classic.agents.agent import (
    Agent,
    AgentExecutor,
    AgentOutputParser,
    BaseMultiActionAgent,
    BaseSingleActionAgent,
    LLMSingleActionAgent,
)
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains.retrieval import create_retrieval_chain
from langchain_classic.memory.buffer import  ConversationBufferMemory

# 4. Text Splitters
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 5. IBM Watsonx SDK & Transformers (Backend utilities)
from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from transformers import AutoTokenizer

In [3]:
# # Step 2: Older Import Libraries
import getpass
import requests
import os
from bs4 import BeautifulSoup
from dotenv import load_dotenv
# from langchain_ibm import WatsonxLLM
# from langchain_huggingface import HuggingFaceEmbeddings
# from langchain_community.document_loaders import WebBaseLoader
# from langchain.schema import SystemMessage, HumanMessage, Document
# from langchain.text_splitter import RecursiveCharacterTextSplitter
# from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
# from langchain.vectorstores import Chroma
# from langchain.tools import tool
# from langchain.agents import AgentExecutor
# from langchain.memory import ConversationBufferMemory
# from langchain.chains.combine_documents import create_stuff_documents_chain
# from langchain.chains import create_retrieval_chain
# from transformers import AutoTokenizer
# from ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes
# from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

In [None]:
# # Step 3: Set Up Credentials
# # Ensure you have a .env file or manually enter your API key and Project ID below.
# load_dotenv(os.getcwd()+"/.env", override=True)

# credentials = {
#     "url": "https://us-south.ml.cloud.ibm.com",
#     "apikey": os.getenv("WATSONX_APIKEY", "YOUR_API_KEY_HERE"),
# }
# project_id = os.getenv("PROJECT_ID", "YOUR_PROJECT_ID_HERE")

In [None]:
# # Step 4: Initialize Language Model (IBM Granite)
# llm = WatsonxLLM(
#     model_id="ibm/granite-3-8b-instruct",
#     url=credentials.get("url"),
#     apikey=credentials.get("apikey"),
#     project_id=project_id,
#     params={
#         GenParams.DECODING_METHOD: "greedy",
#         GenParams.TEMPERATURE: 0,
#         GenParams.MIN_NEW_TOKENS: 5,
#         GenParams.MAX_NEW_TOKENS: 250,
#         GenParams.STOP_SEQUENCES: ["Human:", "Observation"],
#     },
# )

In [None]:
# First activate wsl
!wsl
# Then run the ollama gemini-3-pro-preview model in wsl
!wsl ollama run gemini-3-pro-preview


In [5]:
# from langchain_ollama import ChatOllama
from langchain_community.llms.ollama import Ollama
from langchain_core.prompts import ChatPromptTemplate

# 1. Initialize the model pointing to your local Ollama instance
# Ollama will handle the bridge to Google's Cloud API automatically
llm = Ollama(
    # model="gemini-3-pro-preview",
    model="cogito-2.1:671b-cloud",
    temperature=0.7,
    # Optional: Increase context if needed, Gemini 3 supports large windows
    num_ctx=32000 
)

# 2. Create a prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant."),
    ("human", "{input}")
])

# 3. Create chain
chain = prompt | llm

# 4. Run
response = chain.invoke({"input": "Explain the main advantages of agentic chunking."})

print(response)

I notice that you're asking about "agentic chunking," but that specific term doesn't appear to be a widely recognized concept in mainstream AI or cognitive science. I'd be happy to help, but first I want to clarify what you mean by this term.

Could you provide more context about where you encountered this phrase or what aspects of it you're interested in? For example, are you referring to:
- Chunking in cognitive psychology?
- Agent-based modeling?
- Task decomposition in AI systems?
- Or perhaps a specific framework or paper you've come across?

This will help me give you a more accurate and useful explanation.


In [None]:
# Step 5: Load Document Function
def get_text_from_url(url):
    response = requests.get(url)
    if response.status_code != 200:
        raise ValueError(f"Failed to fetch the page, status code: {response.status_code}")
    
    soup = BeautifulSoup(response.text, "html.parser")
    for script in soup(["script", "style"]):
        script.decompose()
    return soup.get_text(separator="\n", strip=True)

# Load example text
url = "https://www.ibm.com/think/topics/machine-learning"
web_text = get_text_from_url(url)
# print(web_text[:500]) # Optional: verify text loaded

In [None]:
print(f"Loaded {len(web_text)} characters from the web page. First 500 characters:\n{web_text[:500]}")

In [None]:


# Extracting The pdf conetnt using langchain_community which might contain image + text

from langchain_community.document_loaders import UnstructuredPDFLoader
import os

file = r"files\Bandhan.pdf"
if not os.path.exists(file):
    print(f"File {file} does not exist. Please check the path.")
else:
    print(f"File {file} found. Proceeding to load.")


loader = UnstructuredPDFLoader(
    file_path=file,
    mode="elements"
)

print(f"loader contents: {loader}")

pages = loader.load()

print(f"Loaded {len(pages)} pages from the PDF document. First page content:\n{pages[0].page_content[:500]}")


# try:
#     data = loader.load()
# except ImportError as e:
#     # If installation did not take effect in the current kernel, inform the user.
#     raise ImportError("The 'unstructured' package is required but not available. "
#                       "Please ensure the cell above ran successfully and then restart the kernel.") from e

print(f"Loaded {len(data)} pages from the PDF document. First page content:\n{data[0].page_content[:500]}")


In [6]:
# file from path
import os

# file_path = r"files\Bandhan.pdf"
file_path = r"files\ES Mod1@AzDOCUMENTS2.pdf"
if not os.path.exists(file_path):
    print(f"File {file_path} does not exist. Please check the path.")
else:
    print(f"File {file_path} found. Proceeding to load.")

File files\ES Mod1@AzDOCUMENTS2.pdf found. Proceeding to load.


In [7]:
from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader(file_path)
pages = loader.load()

print(f"Loaded {len(pages)} pages from the PDF document. First page content:\n{pages[0].page_content[:500]}")

Loaded 20 pages from the PDF document. First page content:
Embedded Systems-18EC62 
Azdocuments.in
www.azdocuments.in 
Page 1 
 
 
 
EMBEDDED SYSTEMS 
 Course Code : 18EC62 
 CIE Marks :40 
 Lecture Hours/Week : 03 + 2 (Tutorial) 
 SEE marks :60 
 Total Number of Lecture Hours : 50 (10 Hrs / Module)  
 Exam Hours : 03 
 CREDITS : 04 
MODULE -1 
ARM-32 bit Microcontroller: Thumb-2 technology and applications of ARM,Architecture of 
ARM Cortex M3, Various Units in the architecture, Debuggingsupport, General Purpose 
Registers, Special Registers, ex


In [8]:
# Step 6: Agentic Chunking Function
def agentic_chunking(text):
    """
    Dynamically splits text into meaningful chunks using LLM.
    """
    system_message = SystemMessage(content="You are an AI assistant helping to split text into meaningful chunks based on topics.")
    human_message = HumanMessage(content=f"Please divide the following text into semantically different, separate and meaningful chunks:\n\n{text}")
    
    # Invoke LLM
    response = llm.invoke([system_message, human_message])
    return response.split("\n\n") # Split based on meaningful sections

# Execute chunking
chunks = agentic_chunking(pages)

# Print chunks
for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}:\n{chunk}\n{'-'*40}")

Chunk 1:
I'll help you split this text into meaningful chunks based on topics. Here's the organized breakdown:
----------------------------------------
Chunk 2:
**Chunk 1: Course Information and Module Overview**
- Embedded Systems course details (18EC62)
- Course structure including marks distribution, credits, and hours
- Module 1 overview: ARM-32 bit Microcontroller topics
----------------------------------------
Chunk 3:
**Chunk 2: Introduction to ARM Cortex-M3**
- Market context for microcontrollers
- ARM Cortex-M3 processor introduction
- Key requirements addressed by Cortex-M3
----------------------------------------
Chunk 4:
**Chunk 3: Features and Benefits of Cortex-M3**
- Performance efficiency
- Low power consumption
- Determinism
- Code density
- Ease of use
- Cost reduction
- Development tools
----------------------------------------
Chunk 5:
**Chunk 4: ARM History and Architecture Versions**
- ARM formation and early history
- Evolution of ARM processors
- ARM architectur

In [9]:
# Step 7: Create Vector Store and RAG Chain

# Initialize Embeddings
embeddings_model = HuggingFaceEmbeddings(model_name="ibm-granite/granite-embedding-30m-english")

# Create Vector Database
vector_db = Chroma(
    collection_name="example_collection",
    embedding_function=embeddings_model
)

# Add Documents
documents = [Document(page_content=chunk) for chunk in chunks]
vector_db.add_documents(documents)

# Create RAG Chain
prompt_template = """<|start_of_role|>user<|end_of_role|>Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {input}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>"""

qa_chain_prompt = PromptTemplate.from_template(prompt_template)
combine_docs_chain = create_stuff_documents_chain(llm, qa_chain_prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

In [10]:
# Step 8: Execute RAG Query
rag_output = rag_chain.invoke({"input": "Write a short note on the applicatios of Cortex-M3 "})
print(rag_output['answer'])

The ARM Cortex-M3 processor is widely used across various applications due to its efficient performance, low power consumption, and cost-effectiveness. Key application areas include:

1. **Low-cost microcontrollers**: Ideal for cost-sensitive embedded systems where performance and power efficiency are crucial.

2. **Automotive applications**: Used in automotive control systems, body electronics, and sensor interfaces due to its reliability and determinism.

3. **Data communications**: Supports communication protocols and network interfaces in networking equipment.

4. **Industrial control**: Employed in automation systems, motor control, and real-time monitoring due to its deterministic behavior and robustness.

5. **Consumer products**: Found in devices like home appliances, wearables, and IoT devices, benefiting from its low power consumption and ease of integration.

These applications leverage the Cortex-M3's 32-bit Harvard architecture, efficient code density, and comprehensive de