<a href="https://colab.research.google.com/github/kenjihilasak/agenticRAG/blob/main/AgenticRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installation


In [1]:
# Installation
! pip install smolagents
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/smolagents.git

Collecting smolagents
  Downloading smolagents-1.14.0-py3-none-any.whl.metadata (15 kB)
Collecting markdownify>=0.14.1 (from smolagents)
  Downloading markdownify-1.1.0-py3-none-any.whl.metadata (9.1 kB)
Collecting duckduckgo-search>=6.3.7 (from smolagents)
  Downloading duckduckgo_search-8.0.1-py3-none-any.whl.metadata (16 kB)
Collecting python-dotenv (from smolagents)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Collecting primp>=0.15.0 (from duckduckgo-search>=6.3.7->smolagents)
  Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading smolagents-1.14.0-py3-none-any.whl (114 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading duckduckgo_search-8.0.1-py3-none-any.whl (18 kB)
Downloading markdownify-1.1.0-py3-none-any.whl (13 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Downloading primp-0.15.0-cp38-abi3

In [10]:
!pip install smolagents langchain langchain-community sentence-transformers python-dotenv rank_bm25 -q
!pip install pandas==2.2.2 fsspec==2024.12.0 -q
!pip install datasets -q
!pip install --upgrade --no-deps gcsfs
!pip install pypdf

Collecting pypdf
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Downloading pypdf-5.4.0-py3-none-any.whl (302 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.4.0


In [1]:
from dotenv import load_dotenv
load_dotenv()

False

## Loading Order Database

In [2]:
import pandas as pd
from langchain.docstore.document import Document

# 1. Read your CSV (or query your DB)
df = pd.read_csv("order_status.csv", parse_dates=["timestamp"])

# 2. Sort by (order_id, timestamp) so events are chronological
df = df.sort_values(["order_id", "timestamp"])

# 3. Create one Document per status‐change event
docs = []
for _, row in df.iterrows():
    content = (
        f"Order {row.order_id} changed status to '{row.status}' "
        f"at {row.timestamp.isoformat()}. Details: {row.details}"
    )
    metadata = {
        "order_id": str(row.order_id),
        "status": row.status,
        "timestamp": row.timestamp.isoformat(),
    }
    docs.append(Document(page_content=content, metadata=metadata))


In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    add_start_index=True,
    separators=["\n", ".", " ", ""],
)
docs_processed = text_splitter.split_documents(docs)


## Order Retriever Tool

In [14]:
from langchain_community.retrievers import BM25Retriever
from smolagents import Tool
from datetime import datetime

class OrderRetrieverTool(Tool):
    name = "order_retriever"
    description = (
        "Retrieve order‐history events by order ID, status, or timestamp. "
        "Useful to answer questions like 'What is the status of order 12345?' "
        "or 'When did order 67890 get cancelled?'"
    )
    inputs = {"query": {"type": "string", "description": "Your search terms"}}
    output_type = "string"

    def __init__(self, docs, **kwargs):
        super().__init__(**kwargs)
        self.retriever = BM25Retriever.from_documents(docs, k=5)

    def forward(self, query: str) -> str:
        hits = self.retriever.invoke(query)
        results = []
        for doc in hits:
            timestamp_str = doc.metadata['timestamp']
            timestamp = datetime.fromisoformat(timestamp_str)
            formatted_timestamp = timestamp.strftime("%d %B %Y %I:%M %p")  # Customize format as needed
            results.append(f"🔎 {formatted_timestamp} — {doc.page_content}")
        return "\n".join(results)

# instantiate with your processed docs
order_tool = OrderRetrieverTool(docs_processed or docs)


## Loading Cancellation policy PDF

In [13]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 1. Load the PDF
loader = PyPDFLoader("Cancellation Policy.pdf")
policy_docs = loader.load()

# 2. Chunk it (tune chunk_size/chunk_overlap as you like)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", " ", ""],
)
policy_chunks = text_splitter.split_documents(policy_docs)


## Cancelation Policy tool

In [20]:
from smolagents import Tool
from langchain_community.retrievers import BM25Retriever

class CancellationPolicyTool(Tool):
    name = "cancellation_policy"
    description = (
        "Answer questions about the company’s cancellation policy "
        "(e.g. “How many days do I have to cancel for a full refund?”, "
        "“What conditions void the refund?”)."
    )
    inputs = {
        "query": {
            "type": "string",
            "description": "Your question about cancellation and refund rules"
        }
    }
    output_type = "string"

    def __init__(self, docs, **kwargs):
        super().__init__(**kwargs)
        # Build BM25 retriever over your PDF chunks
        self.retriever = BM25Retriever.from_documents(docs, k=5)

    def forward(self, query: str) -> str:
        # Fetch top-k policy snippets
        hits = self.retriever.invoke(query)
        if not hits:
            return "⚠️ No relevant policy information found."
        # Return each chunk as a quoted snippet
        return "\n\n".join(
            f"🔎 “{chunk.page_content.strip()}”"
            for chunk in hits
        )

# Instantiate it with your pre‐computed policy_chunks:
policy_tool = CancellationPolicyTool(policy_chunks)


## Cancellation eligibility tool

In [15]:
import re
from datetime import datetime, timezone
from smolagents import Tool
from langchain_community.retrievers import BM25Retriever

class OrderCancellationEligibilityTool(Tool):
    name = "order_cancellation_eligibility"
    description = (
        "Determine if an order can still be cancelled under the policy. "
        "Returns yes/no plus details of the cutoff and current status date."
    )
    inputs = {
        "order_id": {
            "type": "string",
            "description": "The ID of the order to check"
        }
    }
    output_type = "string"

    def __init__(self, order_tool: OrderRetrieverTool, policy_docs, **kwargs):
        super().__init__(**kwargs)
        # reuse your existing order retriever
        self.order_tool = order_tool
        # build a retriever over the PDF chunks
        self.policy_retriever = BM25Retriever.from_documents(policy_docs, k=5)

    def forward(self, order_id: str) -> str:
        # 1) Get raw order history hits
        history = self.order_tool.forward(order_id)
        # 2) Extract the latest timestamp from history lines
        #    assume lines like "🔎 05 April 2025 03:30 PM — Order 12345 changed status to 'delivered'…"
        timestamps = re.findall(r"🔎\s+(.+?)\s+—", history)
        if not timestamps:
            return f"⚠️ Could not find any status event for order {order_id}."
        # parse the last one
        last_ts_str = timestamps[-1]
        last_dt = datetime.strptime(last_ts_str, "%d %B %Y %I:%M %p").replace(tzinfo=timezone.utc)

        # 3) Retrieve cancellation policy text
        policy_hits = self.policy_retriever.invoke("cancel")
        policy_text = "\n".join(chunk.page_content for chunk in policy_hits)

        # 4) Extract the numeric window (e.g. “14 days”) via regex
        m = re.search(r"(\d+)\s+days", policy_text, re.IGNORECASE)
        if not m:
            return "⚠️ Unable to parse the cancellation window (number of days) from the policy."
        window_days = int(m.group(1))

        # 5) Compute cutoff
        now = datetime.now(timezone.utc)
        cutoff_dt = last_dt  # technically cutoff is from original placement, but we’ll interpret as from last status
        elapsed = (now - last_dt).days

        # 6) Build response
        if elapsed <= window_days:
            return (
                f"✅ Order {order_id} last changed status on {last_dt.strftime('%d %B %Y %I:%M %p')} UTC. "
                f"You are within the {window_days}-day cancellation window (elapsed: {elapsed} days), "
                "so you **can** still cancel."
            )
        else:
            return (
                f"❌ Order {order_id} last changed status on {last_dt.strftime('%d %B %Y %I:%M %p')} UTC. "
                f"You are now {elapsed} days past that, which exceeds the {window_days}-day window, "
                "so you **cannot** cancel under the standard policy."
            )


## Instantiate & wire into agent

In [21]:
# assuming `order_tool` is already your OrderRetrieverTool(docs_processed)
eligibility_tool = OrderCancellationEligibilityTool(
    order_tool=order_tool,
    policy_docs=policy_chunks
)

from smolagents import CodeAgent, InferenceClientModel

agent = CodeAgent(
    tools=[order_tool, policy_tool, eligibility_tool],
    model=InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct"),
    max_steps=3,
)


## Test agent

In [18]:
print(agent.run("What is the status of order 12345?"))
# → “Your order 12345 was delivered on 05 April 2025 at 03:30 PM.”

print(agent.run("How many days do I have to cancel and get a full refund?"))
# → “You can cancel up to 14 days after purchase for a full refund.”




delivered


1


Refunds are prorated under the following conditions: the customer terminates a service or subscription before the end of the billing cycle, the service or subscription is not pro-rated at the start of the billing cycle, the company has a policy of prorating refunds for partial service usage, and the customer can provide evidence of reduced service usage.


In [19]:
print(agent.run("can i still cancel my order 12345 ?"))
# → “You can cancel up to 14 days after purchase for a full refund.”

Yes, you can still cancel your order 12345.


In [23]:
print(agent.run("Will the original shipping costs be refunded if I return an item?"))

Shipping charges may not be refunded unless the return is due to our error.


In [22]:
print(agent.run("Will the original shipping costs be refunded if I return an item that isn't faulty?"))

No, the original shipping costs will not be refunded if you return an item that isn't faulty, unless the return is due to the company's error.


## Earlier test of agent

In [8]:
from smolagents import InferenceClientModel, CodeAgent

model_name = "Qwen/Qwen2.5-Coder-32B-Instruct"

agent = CodeAgent(
    tools=[order_tool],
    model=InferenceClientModel(model_id=model_name),
    max_steps=4,
    verbosity_level=2,
)

# Now you can ask:
#print(agent.run("What is the status of my order 12345?"))
print(agent.run("Can you please tell me the status of my order 12345 and when it was updated?"))
#print(agent.run("Can you please tell me the status of my order 12345?"))
# → Should retrieve the most recent ‘status’ event for 12345.


The current status of order 12345 is delivered and it was last updated on 2025-04-05T15:30:00.
