# BridgeMind: Agentic Hybrid RAG,
Bu bölüm, sisteminin çalışması için gerekli temel kütüphaneleri yükler. 
- Sistem, hem yerel cihazlarda hem de **Google Colab** üzerinde sorunsuz çalışacak şekilde yapılandırılmıştır.
- `LangChain`, `ChromaDB` ve `LlamaCpp`gibi kritik bileşenler burada tanımlanır.

In [None]:
import sys
import os
import requests
import warnings
import gc
import time

# --- STEP 1: COLAB ENVIRONMENT SETUP & REPO CLONING ---
# This block ensures that when the professor opens the link, files are pulled automatically.
if 'google.colab' in sys.modules:
    print("Colab environment detected. Cloning repository and installing dependencies...")
    
    # Clone the repo to get PDFs and other files into Colab's local storage
    !git clone https://github.com/serifeeroglu/BridgeMind-RAG.git .
    
    # Install required libraries specifically for Colab environment
    !pip install -q duckduckgo-search langchain-community langchain-huggingface chromadb llama-cpp-python pypdf sentence-transformers
    
    FOLDER_PATH = "/content"
else:
    # Local path for VS Code environment
    FOLDER_PATH = r"C:\Users\MSI\OneDrive\Masaüstü\ragproje"

# --- STEP 2: LIBRARY IMPORTS ---
try:
    from duckduckgo_search import DDGS
except ImportError:
    print("ERROR: 'duckduckgo-search' library not found.")

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.llms import LlamaCpp

# --- STEP 3: SYSTEM CONFIGURATIONS & LOGGING ---
warnings.filterwarnings("ignore")
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

# Initialize performance stats for the final report
perf_stats = {"total_queries": 0, "pdf_hits": 0, "web_hits": 0, "total_time": 0}

print(f"System Ready. Working Directory: {FOLDER_PATH}")

## 2. Dinamik Yol Yapılandırması ve Otomatik Model Yükleyici
Projenin farklı ortamlarda (Colab/Yerel) hatasız çalışması için:
- Kod, çalışma ortamını otomatik olarak algılar ve dosya yollarını buna göre ayarlar.
- Eğer **Phi-3-mini GGUF** modeli mevcut değilse, Hugging Face üzerinden otomatik olarak indirilir.

In [None]:
#  STEP 2: Path Configuration & Automatic Model Download 
# Dynamic path selection: Uses local path for VS Code, root for Colab
if 'google.colab' in sys.modules:
    FOLDER_PATH = "/content/"
else:
    FOLDER_PATH = r"C:\Users\MSI\OneDrive\Masaüstü\ragproje"

MODEL_PATH = os.path.join(FOLDER_PATH, "Phi-3-mini-4k-instruct-q4.gguf")

# Automated model downloader (Ensures the model exists for the evaluator)
if not os.path.exists(MODEL_PATH):
    print("Model file not found. Downloading Phi-3-mini (approx. 2.2GB)...")
    import requests
    url = "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"
    response = requests.get(url, stream=True)
    with open(MODEL_PATH, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print("Model downloaded successfully.")

## 3. Harici Bilgi Kaynağı: Web Arama Aracı
Sistemin 'Ajan' (Agentic) özelliğini sağlayan bu araç, yerel PDF dökümanlarında bulunmayan bilgileri getirmekle görevlidir.
- Eğer yerel veritabanı sorguya cevap veremezse, sistem otonom olarak **DuckDuckGo API** üzerinden güncel web verilerini tarar.

In [None]:
#  STEP 3: Web Search Tool 
def web_search_tool(query):
    print(f"\n AGENT: Knowledge not found in local PDFs. Searching the web for: {query}")
    try:
        with DDGS() as ddgs:
            search_query = f"{query} detailed explanation and steps"
            results = []
            raw_results = ddgs.text(search_query, max_results=5)
            for r in raw_results:
                results.append(f"Source: {r['title']}\nSnippet: {r['body']}")
            return "\n\n".join(results)
    except Exception as e:
        return f"Web search failed: {str(e)}"

## 4. Akademik PDF İndeksleme ve Vektör Veritabanı (RAG)
Bu aşamada yüklenen akademik makaleler işlenerek sorgulanabilir hale getirilir:
- **Recursive Character Text Splitting:** Metinler, anlamsal bütünlüğü korumak adına 800 karakterlik parçalara bölünür.
- **HuggingFace Embeddings:** Cümleler vektörel temsillere dönüştürülür ve hızlı benzerlik araması için **ChromaDB** içerisinde saklanır.

In [None]:
#  STEP 4: Vector Database (PDF Indexing) 
print("INITIALIZING SYSTEM & INDEXING LOCAL DOCUMENTS...")
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Check if directory exists
if not os.path.exists(FOLDER_PATH):
    print(f"ERROR: Directory {FOLDER_PATH} not found!")
else:
    pdf_files = [f for f in os.listdir(FOLDER_PATH) if f.endswith(".pdf")]
    if not pdf_files:
        print("WARNING: No PDF files found! Please ensure PDFs are in the folder.")
    else:
        all_docs = []
        for pdf in pdf_files:
            loader = PyPDFLoader(os.path.join(FOLDER_PATH, pdf))
            all_docs.extend(loader.load())

        # Optimized chunking for academic papers
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
        texts = text_splitter.split_documents(all_docs)
        
        # Build Vector Store
        vectorstore = Chroma.from_documents(documents=texts, embedding=embeddings)
        print(f"SUCCESS: {len(pdf_files)} PDFs indexed into {len(texts)} chunks.")

INITIALIZING SYSTEM & INDEXING LOCAL DOCUMENTS...
SUCCESS: 10 PDFs indexed into 1441 chunks.


## 5. Ajan Mantığı ve Karar Mekanizması (Prompt Engineering)
Sistemin karar verici katmanı burada devreye girer:
- **Phi-3-mini** modeli, `temperature=0.0` ayarı ile en yüksek doğrulukta çalışacak şekilde yapılandırılmıştır.
- **Strict Evaluator:** Model, sorulan soruyu önce dökümanlarda arar; eğer bilgi dökümanda yoksa uydurmak yerine `SEARCH_WEB` komutunu döndürür.

In [None]:
#  STEP 5: LLM Initialization & Strict Agent Logic 
llm = LlamaCpp(
    model_path=MODEL_PATH,
    n_ctx=4096,
    max_tokens=1024, 
    temperature=0.0, # Set to 0 for factual accuracy
    n_gpu_layers=-1 # Use GPU acceleration
)

def ask_agentic_rag(query):
    start_time = time.time()
    
    # Retrieve top 4 relevant context snippets
    docs = vectorstore.similarity_search(query, k=4)
    context_text = "\n".join([d.page_content for d in docs])
    
    # Strict prompt to force a binary decision: Answer or Search Web
    decision_prompt = f"""<|system|> 
    You are a strict academic researcher. 
    1. Use ONLY the Context below to answer. 
    2. If the answer is NOT in the Context, reply ONLY with the word: SEARCH_WEB
    3. Do NOT mention the web or search in your initial check.
    <|end|>
    <|user|> Context: {context_text} \n Question: {query} <|end|>
    <|assistant|>"""
    
    decision = llm.invoke(decision_prompt).strip()
    
    # Logic to switch between Local PDF and Web Search
    if "SEARCH_WEB" in decision.upper() or not decision:
        web_info = web_search_tool(query)
        final_prompt = f"<|system|> Summarize web results. <|end|> <|user|> Web: {web_info} \n Q: {query} <|end|> <|assistant|>"
        response = llm.invoke(final_prompt).strip()
        source = "Web Search (Live Data)"
        perf_stats["web_hits"] += 1
    else:
        response = decision
        source = "Local Academic PDFs"
        perf_stats["pdf_hits"] += 1
    
    perf_stats["total_time"] += (time.time() - start_time)
    perf_stats["total_queries"] += 1
    return response, source

llama_model_loader: loaded meta data with 24 key-value pairs and 195 tensors from C:\Users\MSI\OneDrive\Masaüstü\ragproje\Phi-3-mini-4k-instruct-q4.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi3
llama_model_loader: - kv   1:                               general.name str              = Phi3
llama_model_loader: - kv   2:                        phi3.context_length u32              = 4096
llama_model_loader: - kv   3:                      phi3.embedding_length u32              = 3072
llama_model_loader: - kv   4:                   phi3.feed_forward_length u32              = 8192
llama_model_loader: - kv   5:                           phi3.block_count u32              = 32
llama_model_loader: - kv   6:                  phi3.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi

## 6. Sorgu Döngüsü ve Performans Analizi
Sistemin etkileşimli çalışma alanıdır:
- Kullanıcıdan gelen soruları alır ve cevabın kaynağını (**Yerel PDF** veya **Web**) belirtir.
- Döngü sonunda; toplam sorgu sayısı, PDF/Web isabet oranları ve ortalama gecikme süresi raporlanır.

In [None]:
def ask_agentic_rag(query):
    start_time = time.time()
    docs = vectorstore.similarity_search(query, k=3)
    context_text = "\n".join([d.page_content for d in docs])
    
    decision_prompt = f"""<|system|> Use the context below to answer. If not found, respond ONLY with: SEARCH_WEB <|end|>
    <|user|> Context: {context_text} \n Question: {query} <|end|>
    <|assistant|>"""
    
    decision = llm.invoke(decision_prompt).strip()
    
    if "SEARCH_WEB" in decision.upper() or not decision:
        web_info = web_search_tool(query)
        final_prompt = f"<|system|> Extract answer from web info. <|end|> <|user|> Web: {web_info} \n Q: {query} <|end|> <|assistant|>"
        response = llm.invoke(final_prompt).strip()
        source = "Web Search (Live Data)"
        perf_stats["web_hits"] += 1
    else:
        response = decision
        source = "Local Academic PDFs"
        perf_stats["pdf_hits"] += 1
    
    perf_stats["total_time"] += (time.time() - start_time)
    perf_stats["total_queries"] += 1
    return response, source

# Run the test loop
print("AGENTIC RAG SYSTEM READY (LOCAL + WEB)")
for i in range(1, 6):
    user_query = input(f"\n[{i}/5] Enter Question: ")
    if not user_query.strip(): continue
    ans, ref = ask_agentic_rag(user_query)
    print(f"\n ANALYSIS {i} \n ANSWER: {ans} \n SOURCE: {ref}")

# Display Stats
avg_latency = perf_stats["total_time"] / perf_stats["total_queries"] if perf_stats["total_queries"] > 0 else 0
print("\n FINAL PERFORMANCE EVALUATION ")
print(f"Total Queries: {perf_stats['total_queries']} | PDF Hits: {perf_stats['pdf_hits']} | Web Hits: {perf_stats['web_hits']} | Latency: {avg_latency:.2f}s")


AGENTIC RAG SYSTEM READY (LOCAL + WEB)


llama_perf_context_print:        load time =   16934.47 ms
llama_perf_context_print: prompt eval time =   16934.38 ms /   590 tokens (   28.70 ms per token,    34.84 tokens per second)
llama_perf_context_print:        eval time =   16266.64 ms /   225 runs   (   72.30 ms per token,    13.83 tokens per second)
llama_perf_context_print:       total time =   33290.68 ms /   815 tokens
llama_perf_context_print:    graphs reused =        271



 ANALYSIS 1 
 ANSWER: In the RAG framework, the 'retriever' and 'generator' play specific roles in processing user queries and generating responses. The retriever component (pη(z|x)) is responsible for returning distributions over text passages given a query x. It uses parameters η to retrieve relevant information from external data sources. This retrieved information populates the prompt template, which is then passed to the generator component.

The 'generator' component (pθ(yi|x,z,y1:i−1)) takes in the query x, the retrieved passages z, and previous generated sequences y1:i-1 as input parameters. It parametrizes a model that generates the target sequence y based on this information. The generator uses the context provided by the retriever to produce an appropriate response to the user's query.

In summary, the 'retriever' gathers relevant data from external sources and populates the prompt template, while the 'generator' creates a coherent and accurate response using this informati

Llama.generate: 29 prefix-match hit, remaining 425 prompt tokens to eval
llama_perf_context_print:        load time =   16934.47 ms
llama_perf_context_print: prompt eval time =   12106.41 ms /   425 tokens (   28.49 ms per token,    35.11 tokens per second)
llama_perf_context_print:        eval time =   12647.87 ms /   183 runs   (   69.11 ms per token,    14.47 tokens per second)
llama_perf_context_print:       total time =   24824.05 ms /   608 tokens
llama_perf_context_print:    graphs reused =        217



 ANALYSIS 2 
 ANSWER: GraphRAG uses community summaries by recursively creating increasingly global summaries. It does this by using the LLM to create summaries spanning a hierarchy of nested modular communities of closely related nodes, which are partitioned into these communities based on their inherent modularity. Specifically, it generates community summaries by adding various element summaries (for nodes, edges, and related claims) to a community summary template. Community summaries from lower-level communities are used to generate summaries for higher-level communities in the following way: leaf-level communities have their element summaries prioritized and iteratively added to the LLM context window until the token limit is reached. This approach allows GraphRAG to create more comprehensive and globally relevant summaries compared to standard RAG methods, which may not utilize community hierarchies or modularity in the same way. 
 SOURCE: Local Academic PDFs


Llama.generate: 29 prefix-match hit, remaining 290 prompt tokens to eval
llama_perf_context_print:        load time =   16934.47 ms
llama_perf_context_print: prompt eval time =    8312.44 ms /   290 tokens (   28.66 ms per token,    34.89 tokens per second)
llama_perf_context_print:        eval time =     411.20 ms /     6 runs   (   68.53 ms per token,    14.59 tokens per second)
llama_perf_context_print:       total time =    8725.95 ms /   296 tokens
llama_perf_context_print:    graphs reused =         31
  with DDGS() as ddgs:



 AGENT: Knowledge not found in local PDFs. Searching the web for: What is the current price of Bitcoin in USD right now?


Llama.generate: 2 prefix-match hit, remaining 731 prompt tokens to eval
llama_perf_context_print:        load time =   16934.47 ms
llama_perf_context_print: prompt eval time =   21412.22 ms /   731 tokens (   29.29 ms per token,    34.14 tokens per second)
llama_perf_context_print:        eval time =    4473.58 ms /    67 runs   (   66.77 ms per token,    14.98 tokens per second)
llama_perf_context_print:       total time =   25906.83 ms /   798 tokens
llama_perf_context_print:    graphs reused =        132



 ANALYSIS 3 
 ANSWER: I'm unable to provide real-time data. However, you can check the current price of Bitcoin in USD by visiting financial news websites like CoinDesk or CNBC, using cryptocurrency market tracking apps, or checking reputable financial news sources online for up-to-date information. 
 SOURCE: Web Search (Live Data)


Llama.generate: 2 prefix-match hit, remaining 521 prompt tokens to eval
llama_perf_context_print:        load time =   16934.47 ms
llama_perf_context_print: prompt eval time =   14839.71 ms /   521 tokens (   28.48 ms per token,    35.11 tokens per second)
llama_perf_context_print:        eval time =    7832.27 ms /   117 runs   (   66.94 ms per token,    14.94 tokens per second)
llama_perf_context_print:       total time =   22712.08 ms /   638 tokens
llama_perf_context_print:    graphs reused =        162



 ANALYSIS 4 
 ANSWER: In Self-RAG, 'critique tokens' are special reflection tokens that help the model generate and evaluate its own text generation. These tokens enable the LM to predict not only the next token from its original vocabulary but also critique or assess its own generated content. By using these tokens, Self-RAG can provide a more comprehensive evaluation of answers by incorporating self-reflection into the process. This allows for better alignment with human annotators' assessments and enhances the quality and factuality of LLMs through retrieval on demand. 
 SOURCE: Local Academic PDFs


Llama.generate: 29 prefix-match hit, remaining 847 prompt tokens to eval
llama_perf_context_print:        load time =   16934.47 ms
llama_perf_context_print: prompt eval time =   24971.34 ms /   847 tokens (   29.48 ms per token,    33.92 tokens per second)
llama_perf_context_print:        eval time =     408.06 ms /     6 runs   (   68.01 ms per token,    14.70 tokens per second)
llama_perf_context_print:       total time =   25381.49 ms /   853 tokens
llama_perf_context_print:    graphs reused =         83
  with DDGS() as ddgs:



 AGENT: Knowledge not found in local PDFs. Searching the web for: Can you give me a recipe for making a traditional Italian pizza?


Llama.generate: 2 prefix-match hit, remaining 894 prompt tokens to eval
llama_perf_context_print:        load time =   16934.47 ms
llama_perf_context_print: prompt eval time =   26314.37 ms /   894 tokens (   29.43 ms per token,    33.97 tokens per second)
llama_perf_context_print:        eval time =   43395.12 ms /   625 runs   (   69.43 ms per token,    14.40 tokens per second)
llama_perf_context_print:       total time =   70082.32 ms /  1519 tokens
llama_perf_context_print:    graphs reused =        688



 ANALYSIS 5 
 ANSWER: A traditional Italian pizza recipe typically includes the following ingredients and steps:

Ingredients:
- 2 cups all-purpose flour (plus extra for dusting)
- 1/2 teaspoon salt
- 3/4 cup warm water
- 1 tablespoon olive oil
- 1/2 teaspoon sugar
- 1 packet of active dry yeast (about 2 and 1/4 teaspoons)
- 1/2 cup tomato sauce
- 8 ounces fresh mozzarella cheese, sliced or shredded
- Fresh basil leaves
- Salt to taste
- Olive oil for brushing the pizza crust

Steps:
1. In a small bowl, dissolve sugar in warm water and sprinkle yeast over it. Let it sit until the mixture becomes frothy (about 5 minutes).
2. In a large mixing bowl, combine flour and salt. Make a well in the center and pour in the yeast-water mixture along with olive oil. Mix everything together to form a dough.
3. Knead the dough on a lightly floured surface for about 5 minutes until it becomes smooth and elastic. Add more flour if necessary, but be careful not to overwork the dough.
4. Place the dough