## Setup

First, you must install the packages and set the necessary environment variables.

### Installation

Install LangChain's Python library, `langchain` and LangChain's integration package for Gemini, `langchain-google-genai`. Next, install LangChain's integration package for the new version of Pinecone, `langchain-pinecone` and the `pinecone-client`, which is Pinecone's Python SDK. Finally, install `langchain-community` to access the `WebBaseLoader` module later.

In [None]:
%pip install --quiet langchain-core
%pip install --quiet langchain
%pip install --quiet langchain-google-genai
%pip install --quiet -U langchain-community
%pip install --quiet pinecone

## Configure your API key

To run the following cell, your API key must be stored in a Colab Secret named `GOOGLE_API_KEY` and `COHERE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.


In [29]:
import os

GOOGLE_API_KEY=os.environ.get('GOOGLE_API_KEY')
COHERE_API_KEY=os.environ.get('COHERE_API_KEY')
PINECONE_API_KEY=os.environ.get('PINECONE_API_KEY')

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
os.environ["COHERE_API_KEY"] = COHERE_API_KEY
os.environ["PINECONE_API_KEY"] = PINECONE_API_KEY

## Import the required libraries

In [None]:
from langchain import hub
from langchain import PromptTemplate
from langchain.docstore.document import Document
from langchain.document_loaders import WebBaseLoader
from langchain.schema import StrOutputParser
from langchain.schema.prompt_template import format_document
from langchain.schema.runnable import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_pinecone import PineconeVectorStore

from pinecone import PodSpec

In [31]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-exp-03-07")


### Setup VectorStore

In [44]:
from pinecone import Pinecone as pc

PINECONE_HOST="DENSE_HOST"
PINECONE_INDEX="DENSE_INDEX"

# Initialize Pinecone client
pine_client = pc(api_key=PINECONE_API_KEY)
index_name = PINECONE_INDEX
index = pine_client.Index(index_name, PINECONE_HOST)

vectorstore = PineconeVectorStore(
    namespace="NAMESPACE",
    index=index,
    embedding=gemini_embeddings
)


In [37]:
from pinecone import Pinecone as pc

PINECONE_SPARSE_HOST="SPARSE_HOST"
PINECONE_SPARSE_INDEX="SPARSE_INDEX"

# Initialize Pinecone client
pine_sparse_client = pc(api_key=PINECONE_API_KEY)
sparse_index_name = PINECONE_SPARSE_INDEX
sparse_index = pine_sparse_client.Index(sparse_index_name, PINECONE_SPARSE_HOST)

In [46]:
from langchain_cohere import CohereRerank

reranker = CohereRerank(
    cohere_api_key=COHERE_API_KEY,
    model="rerank-multilingual-v3.0"  # latest model
)

In [50]:
from langchain_google_genai import ChatGoogleGenerativeAI

# To configure model parameters use the `generation_config` parameter.
# eg. generation_config = {"temperature": 0.7, "topP": 0.8, "topK": 40}
# If you only want to set a custom temperature for the model use the
# "temperature" parameter directly.

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash-preview-05-20")

### Hybrid Embedding

### Add dense documents

In [None]:
import requests

books = {
    "quran-37": {
        "name": "The Holy Quran with Five Volume Commentary (Vol 1)",
        "start_page": 377,
        "end_page": 816
    }
}

documents = []

for book_id, book_data in books.items():
    for i in range(book_data["start_page"], book_data["end_page"] + 1):
        url = f"https://new.alislam.org/api/books/text?id={book_id}&pages={i}"
        response = requests.get(url)
        entry = response.json()[0]
        text = entry['content']
        page_number = entry.get('printPageNum', '')
        documents.extend([Document(page_content=text, metadata={"chunk_index": i, "volume": "1","title": book_data["name"], "page": page_number, "link": f"https://new.alislam.org/library/books/quran-english-five-volume-1?option=options&page={entry.get('pageNum', '')}"})])


In [None]:
import time
from random import uniform
from time import sleep

def add_with_backoff(documents_batch, max_retries=10):
    retry = 0
    while retry <= max_retries:
        try:
            vectorstore.add_documents(documents_batch)
            return  # Success: exit function
        except Exception as e:
            if '429' in str(e):  # or check e.status_code == 429 if it's a ResponseError
                wait = 2 ** retry + uniform(0, 1)  # jitter to avoid thundering herd
                print(f"429 Too Many Requests. Retrying in {wait:.2f} seconds...")
                time.sleep(wait)
                retry += 1
            else:
                raise  # re-raise other exceptions

# Batch processing loop with delay and backoff
for i in range(0, len(documents), 50):
    batch = documents[i:i+50]
    add_with_backoff(batch)

### Add sparse documents

In [None]:
%pip install sentence-transformers

In [3]:
# coding=utf-8
# Copyright 2024 The GTE Team Authors and Alibaba Group.
# Licensed under the Apache License, Version 2.0 (the "License");

from collections import defaultdict
from typing import Dict, List, Tuple

import numpy as np
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers.utils import is_torch_npu_available


class GTEEmbeddidng(torch.nn.Module):
    def __init__(self,
                 model_name: str = None,
                 normalized: bool = True,
                 use_fp16: bool = True,
                 device: str = None
                ):
        super().__init__()
        self.normalized = normalized
        if device:
            self.device = torch.device(device)
        else:
            if torch.cuda.is_available():
                self.device = torch.device("cuda")
            elif torch.backends.mps.is_available():
                self.device = torch.device("mps")
            elif is_torch_npu_available():
                self.device = torch.device("npu")
            else:
                self.device = torch.device("cpu")
                use_fp16 = False
        self.use_fp16 = use_fp16
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForTokenClassification.from_pretrained(
            model_name, trust_remote_code=True, torch_dtype=torch.float16 if self.use_fp16 else None
        )
        self.vocab_size = self.model.config.vocab_size
        self.model.to(self.device)

    def _process_token_weights(self, token_weights: np.ndarray, input_ids: list):
        # conver to dict
        result = defaultdict(int)
        unused_tokens = set([self.tokenizer.cls_token_id, self.tokenizer.eos_token_id, self.tokenizer.pad_token_id,
                             self.tokenizer.unk_token_id])
        # token_weights = np.ceil(token_weights * 100)
        for w, idx in zip(token_weights, input_ids):
            if idx not in unused_tokens and w > 0:
                token = self.tokenizer.decode([int(idx)])
                if w > result[token]:
                    result[token] = w
        return result

    @torch.no_grad()
    def encode(self,
               texts: None,
               dimension: int = None,
               max_length: int = 8192,
               batch_size: int = 16,
               return_dense: bool = True,
               return_sparse: bool = False):
        if dimension is None:
            dimension = self.model.config.hidden_size
        if isinstance(texts, str):
            texts = [texts]
        num_texts = len(texts)
        all_dense_vecs = []
        all_token_weights = []
        for n, i in enumerate(range(0, num_texts, batch_size)):
            batch = texts[i: i + batch_size]
            resulst = self._encode(batch, dimension, max_length, batch_size, return_dense, return_sparse)
            if return_dense:
                all_dense_vecs.append(resulst['dense_embeddings'])
            if return_sparse:
                all_token_weights.extend(resulst['token_weights'])
        all_dense_vecs = torch.cat(all_dense_vecs, dim=0)
        return {
            "dense_embeddings": all_dense_vecs,
            "token_weights": all_token_weights 
        }

    @torch.no_grad()
    def _encode(self,
                texts: Dict[str, torch.Tensor] = None,
                dimension: int = None,
                max_length: int = 1024,
                batch_size: int = 16,
                return_dense: bool = True,
                return_sparse: bool = False):

        text_input = self.tokenizer(texts, padding=True, truncation=True, return_tensors='pt', max_length=max_length)
        text_input = {k: v.to(self.model.device) for k,v in text_input.items()}
        model_out = self.model(**text_input, return_dict=True)

        output = {}
        if return_dense:
            dense_vecs = model_out.last_hidden_state[:, 0, :dimension]
            if self.normalized:
                dense_vecs = torch.nn.functional.normalize(dense_vecs, dim=-1)
            output['dense_embeddings'] = dense_vecs
        if return_sparse:
            token_weights = torch.relu(model_out.logits).squeeze(-1)
            token_weights = list(map(self._process_token_weights, token_weights.detach().cpu().numpy().tolist(),
                                                    text_input['input_ids'].cpu().numpy().tolist()))
            output['token_weights'] = token_weights

        return output

    def _compute_sparse_scores(self, embs1, embs2):
        scores = 0
        for token, weight in embs1.items():
            if token in embs2:
                scores += weight * embs2[token]
        return scores

    def compute_sparse_scores(self, embs1, embs2):
        scores = [self._compute_sparse_scores(emb1, emb2) for emb1, emb2 in zip(embs1, embs2)]
        return np.array(scores)

    def compute_dense_scores(self, embs1, embs2):
        scores = torch.sum(embs1*embs2, dim=-1).cpu().detach().numpy()
        return scores

    @torch.no_grad()
    def compute_scores(self, 
        text_pairs: List[Tuple[str, str]], 
        dimension: int = None,
        max_length: int = 1024,
        batch_size: int = 16,
        dense_weight=1.0,
        sparse_weight=0.1):
        text1_list = [text_pair[0] for text_pair in text_pairs]
        text2_list = [text_pair[1] for text_pair in text_pairs]
        embs1 = self.encode(text1_list, dimension, max_length, batch_size, return_dense=True, return_sparse=True)
        embs2 = self.encode(text2_list, dimension, max_length, batch_size, return_dense=True, return_sparse=True)
        scores = self.compute_dense_scores(embs1['dense_embeddings'], embs2['dense_embeddings']) * dense_weight + \
            self.compute_sparse_scores(embs1['token_weights'], embs2['token_weights']) * sparse_weight
        scores = scores.tolist()
        return scores


In [None]:
# remove empty documents

updated_documents = []
for document in documents:
    if document.page_content != "":
        updated_documents.append(document)

In [35]:
import random
from time import sleep

model_name_or_path = 'Alibaba-NLP/gte-multilingual-base'
model = GTEEmbeddidng(model_name_or_path)

BATCH_SIZE = 25

# Helper to chunk data
def chunked(iterable, size):
    for i in range(0, len(iterable), size):
        yield iterable[i:i + size]

for batch_num, doc_batch in enumerate(chunked(updated_documents, BATCH_SIZE), start=1):
    # Prepare records
    records = [
        {
            "id": str(random.randint(1, 1000000)),
            "text": document.page_content,
            **document.metadata
        }
        for document in doc_batch
    ]

    # Get documents and embed
    docs = [r["text"] for r in records]
    embs = model.encode(docs, return_sparse=True)

    # Prepare vectors for upsert
    vectors = []
    for i, record in enumerate(records):
        token_weights = embs["token_weights"][i]
        merged = {}

        for token, weight in token_weights.items():
            token_id = model.tokenizer.convert_tokens_to_ids(token)
            merged[token_id] = merged.get(token_id, 0.0) + float(weight)

        indices = list(merged.keys())
        values = list(merged.values())

        vectors.append({
            "id": record["id"],
            "sparse_values": {
                "indices": indices,
                "values": values
            },
            "metadata": {
                **record,
                "text": record["text"]
            }
        })

    # Upsert batch
    sparse_index.upsert(vectors=vectors, namespace="NAMESPACE")
    print(f"✅ Upserted batch {batch_num}")
    sleep(3)

### Hybrid Agent

### Commentary Search Function

In [121]:
def search_commentary(query):
    """
    Search the commentary for the given query.

    Args:
        query (str): The query to search the commentary for.

    Returns:
        list: A list of matches from the commentary.
    """
    dense_results = index.query(
        namespace="NAMESPACE",
        vector=gemini_embeddings.embed_query(query), 
        top_k=40,
        include_metadata=True,
        include_values=False
    )

    # Encode the query and extract sparse token weights

    q_tw = model.encode([query], return_sparse=True)["token_weights"][0]

    # Merge duplicate token indices by summing their weights

    merged = {}
    for token, weight in q_tw.items():
        token_id = model.tokenizer.convert_tokens_to_ids(token)
        if token_id in merged:
            merged[token_id] += float(weight)
        else:
            merged[token_id] = float(weight)

    q_indices = list(merged.keys())
    q_values = list(merged.values())

    # Perform sparse query

    sparse_results = sparse_index.query(
        top_k=40,
        sparse_vector={
            "indices": q_indices,
            "values": q_values
        },
        namespace="NAMESPACE",
        include_metadata=True,
    )

    def merge_chunks(dense_results, sparse_results):
        def normalize_hit(hit, source='dense'):
            if source == 'dense':
                return {
                    'chunk_text': hit['metadata'].get('text', '').strip(),
                    '_score': hit.get('score', 0),
                    'link': hit['metadata'].get('link'),
                    'page': hit['metadata'].get('page'),
                    'title': hit['metadata'].get('title'),
                    'source': 'dense'
                }
            elif source == 'sparse':
                return {
                    'chunk_text': hit['metadata'].get('text', '').strip(),
                    '_score': hit.get('score', 0),
                    'link': hit['metadata'].get('link'),
                    'page': hit['metadata'].get('page'),
                    'title': hit['metadata'].get('title'),
                    'source': 'sparse'
                }

        # Normalize both result sets

        dense_hits = [normalize_hit(hit, source='dense') for hit in dense_results.get('matches', [])]
        sparse_hits = [normalize_hit(hit, source='sparse') for hit in sparse_results.get('matches', [])]
        
        # Deduplicate based on `chunk_text`

        unique_hits = {}
        for hit in dense_hits + sparse_hits:
            text_key = hit['chunk_text']
            if text_key and text_key not in unique_hits:
                unique_hits[text_key] = hit
            elif text_key and text_key in unique_hits:
                # Keep the one with higher score
                if hit['_score'] > unique_hits[text_key]['_score']:
                    unique_hits[text_key] = hit

        # Sort by score

        sorted_hits = sorted(unique_hits.values(), key=lambda x: x['_score'], reverse=True)

        return sorted_hits

    merged_results = merge_chunks(dense_results, sparse_results)

    reranked_results = reranker.rerank(merged_results, query, top_n=5)

    final_results = []

    for result in reranked_results:
        final_results.append(merged_results[result['index']])
    
    # enhance results with context from nearby pages
    
    enhanced_results = []

    for result in final_results:
        pages = [str(int(result['page']) - 1), str(int(result['page'])), str(int(result['page']) + 1)]

        dummy_vector = [0.0] * 3072  # Adjust to your index's dimensionality

        result = index.query(
            vector=dummy_vector,
            namespace="NAMESPACE",
            top_k=3,
            filter={
                "page": { "$in": pages }
            },
            include_metadata=True
        )

        enhanced_results.extend(result['matches'])
    
    return enhanced_results

### Commentary Agent

In [None]:
from langgraph.prebuilt import create_react_agent

graph = create_react_agent(
    llm,
    tools=[search_commentary],
    prompt=f"""You are an Ahmadi scholar who answers questions from the Holy Quran 5 Volume Commentary by Hazrat Musleh Maud R.A.

Guidelines:

SEARCH AND TRANSLATION:
- ALWAYS translate the user's query before searching:
  * If query contains English words, translate them to Arabic equivalents
  * If query contains Arabic words, also try English equivalents
- Use the search_commentary tool with BOTH original query AND translated versions
- Try multiple search variations systematically:
  1. Search with original query
  2. Search with Arabic translation
  3. Search with English translation
  4. Search with alternative spellings/transliterations

REFERENCING AND CITATIONS:
- Always add page references in format: (Pg. X) or (Pages X-Y) 
- Include Quran verse references as clickable links: https://www.alislam.org/quran/app/CHAPTER:VERSE
- Example: https://www.alislam.org/quran/app/2:255 for Ayat-ul-Kursi (Chapter 2, Verse 255)
- When referencing multiple verses, provide separate links for each

TRANSLATION EXAMPLES:
- "prayer" → search: "prayer", "salah", "salat", "صلاة"
- "fasting" → search: "fasting", "sawm", "صوم"
- "charity" → search: "charity", "zakat", "زكاة"

SEARCH STRATEGY:
- Always perform multiple searches with different terms
- Don't stop after first search - try at least 2-3 variations
- If searching for Arabic concepts, include both Arabic and English terms
- If a specific verse number is queried but doesn't exist, respond: "Verse doesn't exist. Please check the chapter and verse number and try again."
- If no relevant information is found after searching, respond: "I didn't find information about this topic in the commentary. Please try rephrasing your question or ask about a different topic."
- If the query is unclear, ask for clarification before searching

ANSWER FORMATTING:
- Provide comprehensive answers when information is found
- Structure responses with clear explanations
- Include both the Arabic term and its meaning when discussing Arabic words
- Quote relevant passages from the commentary when applicable
- Maintain scholarly tone appropriate for religious discourse

QUALITY ASSURANCE:
- Verify verse numbers before creating links
- Cross-reference information when multiple sources are mentioned
- Ensure accuracy of Arabic transliterations
- Cross reference the translation with the Quran to ensure accuracy"""
)

await graph.ainvoke({"messages": [{"role": "user", "content": f"""inamal amalu biniat what is the meaning of this phrase?"""}]})