### üì¶ Installing Required Libraries

This cell installs the core libraries needed for semantic search.

- **faiss-cpu**  
  FAISS is a high-performance vector similarity search library.  
  It allows us to store embeddings and retrieve the most similar vectors efficiently.

- **sentence-transformers**  
  This library provides pre-trained embedding models that convert text into dense numerical vectors.
  We use it to generate semantic embeddings for documents and queries.

The `-q` flag keeps installation output minimal.


In [50]:
!pip install -q faiss-cpu sentence-transformers


### üîë Environment Variable Setup

This cell loads the Gemini API key from Google Colab's secure storage.

- `userdata.get("RAGAGENTKEY")` fetches the API key you saved in Colab secrets.
- The key is stored as an environment variable so it can be accessed anywhere in the notebook.
- Even though Gemini is not used in this pipeline, keeping the key configured allows easy extension later.

This is a best practice for managing API keys securely.


In [51]:
import os
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get("RAGAGENTKEY")


### üß† Core Imports and Shared State

This cell imports core libraries and initializes a shared memory dictionary.

- **faiss** ‚Üí vector indexing and similarity search
- **numpy** ‚Üí numerical operations on embeddings
- **SentenceTransformer** ‚Üí text embedding model

The `shared` dictionary acts as global memory across PocketFlow nodes.
Each node reads from and writes to this dictionary, allowing clean data flow without global variables.

Stored items:
- documents ‚Üí raw text
- chunks ‚Üí processed text units
- embeddings ‚Üí vector representations
- index ‚Üí FAISS vector index


In [52]:
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer

shared = {
    "documents": [],
    "chunks": [],
    "embeddings": [],
    "index": None
}


### üî¢ Embedding Model Initialization

This cell loads a pre-trained sentence embedding model.

- **all-MiniLM-L6-v2** is lightweight, fast, and accurate.
- It converts text into 384-dimensional vectors.
- These vectors capture semantic meaning rather than exact words.

This model runs locally and does not require an API call.


In [53]:
embedder = SentenceTransformer("all-MiniLM-L6-v2")


### ‚ú® Text to Vector Conversion Function

This function converts a text string into a numerical vector.

Steps:
1. The text is passed to the SentenceTransformer model.
2. The output embedding is converted to a NumPy array.
3. The data type is forced to float32 because FAISS requires it.

This function is reused for both documents and search queries.


In [54]:
def embed_text(text):
    return embedder.encode(text, convert_to_numpy=True).astype("float32")


# **POCKETFLOW CODE**

In [63]:
import asyncio, warnings, copy, time

class BaseNode:
    def __init__(self): self.params,self.successors={},{}
    def set_params(self,params): self.params=params
    def next(self,node,action="default"):
        if action in self.successors: warnings.warn(f"Overwriting successor for action '{action}'")
        self.successors[action]=node; return node
    def prep(self,shared): pass
    def exec(self,prep_res): pass
    def post(self,shared,prep_res,exec_res): pass
    def _exec(self,prep_res): return self.exec(prep_res)
    def _run(self,shared): p=self.prep(shared); e=self._exec(p); return self.post(shared,p,e)
    def run(self,shared):
        if self.successors: warnings.warn("Node won't run successors. Use Flow.")
        return self._run(shared)
    def __rshift__(self,other): return self.next(other)
    def __sub__(self,action):
        if isinstance(action,str): return _ConditionalTransition(self,action)
        raise TypeError("Action must be a string")

class _ConditionalTransition:
    def __init__(self,src,action): self.src,self.action=src,action
    def __rshift__(self,tgt): return self.src.next(tgt,self.action)

class Node(BaseNode):
    def __init__(self,max_retries=1,wait=0): super().__init__(); self.max_retries,self.wait=max_retries,wait
    def exec_fallback(self,prep_res,exc): raise exc
    def _exec(self,prep_res):
        for self.cur_retry in range(self.max_retries):
            try: return self.exec(prep_res)
            except Exception as e:
                if self.cur_retry==self.max_retries-1: return self.exec_fallback(prep_res,e)
                if self.wait>0: time.sleep(self.wait)

class BatchNode(Node):
    def _exec(self,items): return [super(BatchNode,self)._exec(i) for i in (items or [])]

class Flow(BaseNode):
    def __init__(self,start=None): super().__init__(); self.start_node=start
    def start(self,start): self.start_node=start; return start
    def get_next_node(self,curr,action):
        nxt=curr.successors.get(action or "default")
        if not nxt and curr.successors: warnings.warn(f"Flow ends: '{action}' not found in {list(curr.successors)}")
        return nxt
    def _orch(self,shared,params=None):
        curr,p,last_action =copy.copy(self.start_node),(params or {**self.params}),None
        while curr: curr.set_params(p); last_action=curr._run(shared); curr=copy.copy(self.get_next_node(curr,last_action))
        return last_action
    def _run(self,shared): p=self.prep(shared); o=self._orch(shared); return self.post(shared,p,o)
    def post(self,shared,prep_res,exec_res): return exec_res

class BatchFlow(Flow):
    def _run(self,shared):
        pr=self.prep(shared) or []
        for bp in pr: self._orch(shared,{**self.params,**bp})
        return self.post(shared,pr,None)

class AsyncNode(Node):
    async def prep_async(self,shared): pass
    async def exec_async(self,prep_res): pass
    async def exec_fallback_async(self,prep_res,exc): raise exc
    async def post_async(self,shared,prep_res,exec_res): pass
    async def _exec(self,prep_res):
        for self.cur_retry in range(self.max_retries):
            try: return await self.exec_async(prep_res)
            except Exception as e:
                if self.cur_retry==self.max_retries-1: return await self.exec_fallback_async(prep_res,e)
                if self.wait>0: await asyncio.sleep(self.wait)
    async def run_async(self,shared):
        if self.successors: warnings.warn("Node won't run successors. Use AsyncFlow.")
        return await self._run_async(shared)
    async def _run_async(self,shared): p=await self.prep_async(shared); e=await self._exec(p); return await self.post_async(shared,p,e)
    def _run(self,shared): raise RuntimeError("Use run_async.")

class AsyncBatchNode(AsyncNode,BatchNode):
    async def _exec(self,items): return [await super(AsyncBatchNode,self)._exec(i) for i in items]

class AsyncParallelBatchNode(AsyncNode,BatchNode):
    async def _exec(self,items): return await asyncio.gather(*(super(AsyncParallelBatchNode,self)._exec(i) for i in items))

class AsyncFlow(Flow,AsyncNode):
    async def _orch_async(self,shared,params=None):
        curr,p,last_action =copy.copy(self.start_node),(params or {**self.params}),None
        while curr: curr.set_params(p); last_action=await curr._run_async(shared) if isinstance(curr,AsyncNode) else curr._run(shared); curr=copy.copy(self.get_next_node(curr,last_action))
        return last_action
    async def _run_async(self,shared): p=await self.prep_async(shared); o=await self._orch_async(shared); return await self.post_async(shared,p,o)
    async def post_async(self,shared,prep_res,exec_res): return exec_res

class AsyncBatchFlow(AsyncFlow,BatchFlow):
    async def _run_async(self,shared):
        pr=await self.prep_async(shared) or []
        for bp in pr: await self._orch_async(shared,{**self.params,**bp})
        return await self.post_async(shared,pr,None)

class AsyncParallelBatchFlow(AsyncFlow,BatchFlow):
    async def _run_async(self,shared):
        pr=await self.prep_async(shared) or []
        await asyncio.gather(*(self._orch_async(shared,{**self.params,**bp}) for bp in pr))
        return await self.post_async(shared,pr,None)

### üìö LoadNode ‚Äî Loading Source Documents

This node loads large text documents related to Indian culture.

Key points:
- Each document is a long paragraph covering a specific cultural aspect.
- Topics include diversity, religion, family, language, food, clothing, art, and modernization.
- The data is stored in `shared["documents"]`.

This node does not process data; it only loads raw content into memory.


In [55]:
class LoadNode(Node):
    def exec(self, _):
        shared["documents"] = [
             """
India is a country known for its immense cultural diversity, shaped by thousands
of years of history, geography, religion, and social traditions. Indian culture is
not a single unified system but a collection of regional cultures that coexist and
influence one another. From the Himalayan regions in the north to the coastal belts
in the south, and from the deserts of Rajasthan to the forests of the northeast,
each region has developed its own language, customs, food habits, clothing styles,
and social practices. Despite these differences, a shared sense of identity rooted
in values such as family, respect, spirituality, and community binds the nation
together.
            """,

            """
Religion plays a significant role in shaping Indian culture. Hinduism, Buddhism,
Jainism, and Sikhism originated in India, while Islam, Christianity, Judaism, and
Zoroastrianism have also flourished here for centuries. Religious festivals are an
integral part of everyday life and are often celebrated with great enthusiasm.
Diwali symbolizes the victory of light over darkness, Eid emphasizes charity and
brotherhood, Christmas celebrates peace and goodwill, and Guru Nanak Jayanti honors
the teachings of Sikhism. These festivals are frequently celebrated across religious
boundaries, reflecting India's tradition of coexistence and tolerance.
            """,

            """
Indian society traditionally places strong emphasis on family and social
relationships. Joint families, where multiple generations live together under one
roof, have historically been common, especially in rural areas. Elders are deeply
respected, and their guidance often influences important life decisions such as
education, marriage, and career. Even in modern urban settings, family bonds remain
strong, and cultural values such as hospitality, mutual support, and collective
responsibility continue to shape social behavior.
            """,

            """
Languages form one of the most visible expressions of Indian cultural diversity.
India is home to hundreds of languages and dialects belonging to different language
families, including Indo-Aryan, Dravidian, Austroasiatic, and Tibeto-Burman.
Languages such as Hindi, Bengali, Telugu, Marathi, Tamil, Urdu, and Kannada are
spoken by millions, while many regional and tribal languages preserve local
traditions and oral histories. Multilingualism is common, and many Indians grow up
speaking more than one language fluently.
            """,

            """
Indian food culture is deeply connected to geography, climate, and tradition.
Northern Indian cuisine often includes wheat-based breads and rich gravies, while
southern Indian food emphasizes rice, lentils, and coconut. Coastal regions rely
heavily on seafood, whereas desert regions use dry spices and preserved foods.
Spices play a central role not just in flavor but also in traditional medicine.
Meals are often considered sacred, and sharing food is seen as an expression of
love, respect, and social bonding.
            """,

            """
Traditional clothing in India varies widely by region and occasion. Sarees, salwar
kameez, lehengas, dhotis, and kurtas reflect local climate and cultural aesthetics.
Clothing often carries symbolic meaning, with colors, patterns, and fabrics chosen
for festivals, weddings, and religious ceremonies. Handloom textiles such as silk,
cotton, and wool have been produced for centuries, and traditional crafts continue
to support millions of artisans across the country.
            """,

            """
Indian art, music, and dance have evolved over centuries and remain deeply connected
to spiritual and philosophical traditions. Classical dance forms such as Bharatanatyam,
Kathak, Odissi, and Kathakali tell stories through gesture, rhythm, and expression.
Indian music includes both classical traditions and folk forms that vary by region.
These art forms are not merely entertainment but are often considered forms of
devotion and storytelling passed down through generations.
            """,

            """
Despite rapid modernization and globalization, Indian culture continues to adapt
without losing its core identity. Urbanization, technology, and global influences
have transformed lifestyles, especially among younger generations. However,
traditional values, rituals, and cultural practices remain relevant and are often
blended with modern ways of living. This ability to absorb change while preserving
heritage is one of the defining characteristics of Indian culture.
            """
        ]
        return "default"



### ‚úÇÔ∏è ChunkNode ‚Äî Preparing Text Chunks

This node prepares text for embedding.

In this example:
- Each document is treated as a single chunk.
- No splitting logic is applied.

Why this exists:
- In real systems, documents are often split into smaller chunks.
- Keeping this node makes the pipeline extensible.

Output is stored in `shared["chunks"]`.


In [56]:
class ChunkNode(Node):
    def exec(self, _):
        shared["chunks"] = shared["documents"]
        return "default"


### üî¨ EmbedNode ‚Äî Generating Semantic Embeddings

This node converts text chunks into vectors.

Process:
1. Iterates over all text chunks.
2. Calls `embed_text()` for each chunk.
3. Stores the resulting vectors in `shared["embeddings"]`.

Each embedding represents the semantic meaning of a text chunk.


In [57]:
class EmbedNode(Node):
    def exec(self, _):
        shared["embeddings"] = [embed_text(t) for t in shared["chunks"]]
        return "default"


### üóÑÔ∏è StoreNode ‚Äî Building the FAISS Vector Index

This node creates and stores the FAISS index.

Steps:
1. All embeddings are stacked into a single NumPy matrix.
2. The vector dimension is detected automatically.
3. A FAISS `IndexFlatL2` index is created.
4. All vectors are added to the index.
5. The index is stored in `shared["index"]`.

This enables fast semantic similarity search.


In [58]:
class StoreNode(Node):
    def exec(self, _):
        vectors = np.vstack(shared["embeddings"])
        dim = vectors.shape[1]

        index = faiss.IndexFlatL2(dim)
        index.add(vectors)

        shared["index"] = index
        return "default"


### üîÅ Indexing Flow Execution

This flow connects all indexing nodes in sequence:

LoadNode ‚Üí ChunkNode ‚Üí EmbedNode ‚Üí StoreNode

Running this flow:
- Loads documents
- Prepares chunks
- Generates embeddings
- Stores vectors in FAISS

After this cell runs, the system is ready for semantic search.


In [None]:
index_flow = Flow()
index_flow.start(LoadNode()) >> ChunkNode() >> EmbedNode() >> StoreNode()

index_flow.run(shared)


### üîç QueryNode ‚Äî Semantic Search Execution

This node performs semantic search.

Steps:
1. Reads the user query from flow parameters.
2. Converts the query into an embedding.
3. Searches the FAISS index for the top-K closest vectors.
4. Retrieves matching text chunks.
5. Stores results in `shared["results"]`.

The `exec()` method controls logic.
The `post()` method returns the final search results.


In [None]:
class QueryNode(Node):
    def exec(self, _):
        query = self.params["query"]

        q_vec = embed_text(query).reshape(1, -1)
        D, I = shared["index"].search(q_vec, k=4)

        shared["results"] = [shared["chunks"][i] for i in I[0]]
        return "default"   # transition signal ONLY

    def post(self, shared, prep_res, exec_res):
        return shared["results"]


### üöÄ Running Semantic Search Query

This cell executes the search flow.

- A query is passed using `set_params`.
- The query is embedded and compared against stored vectors.
- The most semantically relevant Indian culture texts are returned.

Printing the results shows how semantic search retrieves meaning-based matches,
not keyword matches.


In [62]:
query_flow = Flow()
query_flow.start(QueryNode())

query_flow.set_params({
    "query": "what is india"
})

results = query_flow.run(shared)
for item in results:
    print(item)


India is a country known for its immense cultural diversity, shaped by thousands
of years of history, geography, religion, and social traditions. Indian culture is
not a single unified system but a collection of regional cultures that coexist and
influence one another. From the Himalayan regions in the north to the coastal belts
in the south, and from the deserts of Rajasthan to the forests of the northeast,
each region has developed its own language, customs, food habits, clothing styles,
and social practices. Despite these differences, a shared sense of identity rooted
in values such as family, respect, spirituality, and community binds the nation
together.
            

Religion plays a significant role in shaping Indian culture. Hinduism, Buddhism,
Jainism, and Sikhism originated in India, while Islam, Christianity, Judaism, and
Zoroastrianism have also flourished here for centuries. Religious festivals are an
integral part of everyday life and are often celebrated with great enth