# üè• Adaptive Learning Assistant (Offline/Local Version)

## üìã Project Overview
This notebook implements a **Guideline-Based Adaptive RAG System** for Antimicrobial Stewardship.
It converts a User Query into a Verified Answer using a strict 10-Phase Pipeline.

## üîÑ The Linear Workflow (Execution Path)
This is the exact path every query takes through this notebook:

1.  **User Query**
    ‚Üì
2.  **Orchestrator Loop** (Phase 10 - Starts here)
    ‚Üì
3.  **Central Control Node** (Phase 4)
    ‚Üì
4.  **Query Analysis & Restructuring** (Phase 1)
    ‚Üì
5.  **Relevance Check** (Phase 2) ‚Üí Uses Embedding Similarity
    ‚Üì
6.  **Safety Validation** (Phase 3)
    ‚Üì
7.  **Decide Strategy** (Rewritten vs Original)
    ‚Üì
8.  **Retriever** (Phase 5) ‚Üí FAISS Vector Search
    ‚Üì
9.  **Retrieval Grader** (Phase 6)
    ‚Üì
10. **Generator** (Phase 7) ‚Üí Tone-Aware LLM
    ‚Üì
11. **Hallucination Checker** (Phase 8)
    ‚Üì
12. **Final Relevance Checker** (Phase 9)
    ‚Üì
13. **FINAL ANSWER**

---


## üõ†Ô∏è Infrastructure Setup

### üîπ Cell 1: Install & Import Libraries
**What it does**: Installs the necessary tools (FAISS for search, Sentence-Transformers for embeddings, etc.) and imports them.


In [1]:
# @title üì¶ Install & Import
!pip install -q faiss-cpu gradio ipykernel jupyter numpy opencv-python pdf2image pickle-mixin pillow pytesseract requests scikit-learn sentence-transformers tqdm

import faiss
import numpy as np
import pickle
import json
import os
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import gradio as gr
import requests

print("‚úÖ Libraries installed and imported.")


‚úÖ Libraries installed and imported.


### üîπ Cell 2: LLM API Configuration
**What it does**: Sets up the connection to the Large Language Model (LLM). This is the engine that does the thinking (analysis, writing, checking).


In [None]:
# @title üß† LLM API Configuration
API_KEY = ""
 # @param {type:"string"}
BASE_URL = "https://api.groq.com/openai/v1" # @param {type:"string"}
MODEL_NAME = "llama-3.3-70b-versatile"

import time

def call_llm(messages, temperature=0.1):
    if not API_KEY:
        return None
    headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" }
    payload = { "model": MODEL_NAME, "messages": messages, "temperature": temperature }
    
    max_retries = 2
    for i in range(max_retries):
        try:
            response = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
            response.raise_for_status()
            return response.json()['choices'][0]['message']['content']
        except Exception as e:
            # Check for Rate Limit (429)
            is_rate_limit = False
            if hasattr(e, 'response') and e.response is not None:
                if e.response.status_code == 429:
                    is_rate_limit = True
            
            if is_rate_limit:
                wait_time = 2 * (i + 1)
                print(f"‚ö†Ô∏è Rate Limit (429). Retrying in {wait_time}s...")
                time.sleep(wait_time)
                continue
            
            print(f"‚ùå LLM Call Failed: {e}")
            return None
    return None


### üîπ Cell 3: Load Knowledge Base (Vector Store)
**What it does**: Loads the medical guidelines we indexed offline.
- **FAISS Index**: Fast search engine.
- **Embedder**: Converts text to numbers (vectors).
- **Domain Text**: Defines what topics we cover (Antimicrobial Stewardship).


In [3]:
# @title üìÇ Load Artifacts & Model
VECTOR_STORE_DIR = './vector_store'
EMBEDDING_MODEL_NAME = 'sentence-transformers/all-MiniLM-L6-v2'

try:
    print("Loading FAISS Index...")
    index = faiss.read_index(os.path.join(VECTOR_STORE_DIR, 'index.faiss'))
    with open(os.path.join(VECTOR_STORE_DIR, 'metadata.pkl'), 'rb') as f: metadata_store = pickle.load(f)
    with open(os.path.join(VECTOR_STORE_DIR, 'texts.pkl'), 'rb') as f: text_store = pickle.load(f)
    print(f"‚úÖ Vector Store Loaded. Total Vectors: {index.ntotal}")
    
    print(f"Loading Embedding Model: {EMBEDDING_MODEL_NAME}...")
    embedder = SentenceTransformer(EMBEDDING_MODEL_NAME)
    print("‚úÖ Embedding Model Loaded.")
    
    DOMAIN_TEXT = "Rational antibiotic use, antimicrobial resistance, stewardship, microbiology, guideline-based reasoning"
    domain_embedding = embedder.encode(DOMAIN_TEXT)
    
except Exception as e:
    print(f"‚ùå Critical Error: {e}")
    print("‚ö†Ô∏è PLEASE UPLOAD THE 'vector_store' DIRECTORY ‚ö†Ô∏è")
    index, metadata_store, text_store = None, None, None


Loading FAISS Index...
‚úÖ Vector Store Loaded. Total Vectors: 1076
Loading Embedding Model: sentence-transformers/all-MiniLM-L6-v2...




Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


‚úÖ Embedding Model Loaded.


---
## üöÄ The 10-Phase Execution Pipeline
Here starts the actual logic corresponding to your workflow diagram.


### Phase 2 Helper: Embedding Calculation
**What it does**: Calculates the mathematical similarity between the user's query and our medical domain.
**Why**: To quickly flag irrelevant queries (spam filter).


In [4]:
# @title üõ†Ô∏è Embed Helper
def get_embedding_relevance(query_text):
    if not query_text: return 0.0
    q_emb = embedder.encode(query_text)
    return cosine_similarity([q_emb], [domain_embedding])[0][0]


### Phase 2: Relevance Check (Domain Filter)
**What it does**: The Gatekeeper. It combines the Math Score (from above) with the LLM's opinion.
**Decision**: If both say "Irrelevant", we stop immediately.


In [5]:
# @title ‚ö° Phase 2: Relevance Check
def check_relevance(query, llm_analysis_result):
    emb_score = get_embedding_relevance(query)
    llm_relevant = llm_analysis_result.get('is_relevant', False)
    if not llm_relevant:
        return False, f"LLM deemed irrelevant. (Emb Score: {emb_score:.2f})"
    return True, f"Relevant (Emb Score: {emb_score:.2f})"


### Phase 3: Safety Validation (Rewrite Checker)
**What it does**: Checks the LLM's work.
**Logic**: "Did the rewritten query add fake details or change the meaning?"
**Output**: Risk Level (Low, Medium, or High).


In [6]:
# @title üõ°Ô∏è Phase 3: Safety Validator
def agent_validate_rewrite(original, rewritten):
    system_prompt = """
You are a query rewrite validator.
Check for ADDED entities, CHANGED constraints, or HALLUCINATIONS.
Output JSON: { "risk_level": "low" | "medium" | "high" }
    """
    response = call_llm([
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Original: {original}\nRewritten: {rewritten}"}
    ], temperature=0.0)
    try:
        return json.loads(response.replace('```json', '').replace('```', ''))
    except:
        return {"risk_level": "high"}


### Phase 1: Query Analysis & Restructuring (LLM)
**What it does**: The Brain. It analyzes your question to understand:
1. **Intent**: What do you want?
2. **Category**: What guideline applies?
3. **Tone**: Educational or Clinical?
4. **Rewrite**: What is the best keywords to search for?
**Output**: A JSON plan.


In [7]:
# @title üß† Phase 1: Analysis Agent
def agent_analyze_query_structured(user_query):
    system_prompt = """
You are a query analysis and restructuring engine for an Adaptive RAG system.
CRITICAL RULES:
- Query rewriting MUST be LOSSLESS.
- Do NOT add entities, tools, datasets, years, domains, or assumptions.
- Output VALID JSON ONLY.

REQUIRED JSON OUTPUT CONTRACT:
{
  "is_relevant": true,
  "category": "<Infection Context Explanation|Antibiotic Class Reasoning|Resistance Mechanism|Stewardship Principle|Safety / Adverse Effects|Guideline Explanation>",
  "answer_tone": "<Simplified Educational|Structured Clinical>",
  "original_query": "...",
  "rewritten_query": "...",
  "rewrite_rationale": "..."
}
    """
    response = call_llm([
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Query: {user_query}"}
    ], temperature=0.1)
    
    try:
        if not response: return None
        clean_resp = response.replace('```json', '').replace('```', '')
        return json.loads(clean_resp)
    except json.JSONDecodeError:
        return {
            "is_relevant": True, "category": "General", "answer_tone": "Simplified Educational",
            "original_query": user_query, "rewritten_query": user_query
        }


### Phase 4: Central Control Node (Pipeline)
**What it does**: Ties Phases 1, 2, and 3 together.
1.  Calls Phase 1 (Analysis).
2.  Calls Phase 2 (Relevance).
3.  Calls Phase 3 (Validator).
4.  **Decides**: Should we use the *Rewritten Query* or fallback to the *Original*?
**Result**: The final, safe query string ready for retrieval.


In [8]:
# @title üéõÔ∏è Phase 4: Central Control Node
def decide_query_strategy(original, rewritten, validation):
    if original.strip().lower() == rewritten.strip().lower(): return original, "Identical"
    risk = validation.get("risk_level", "high").lower()
    if risk in ["medium", "high"]:
        return original, f"Fallback (Risk: {risk})"
    return rewritten, "Rewrite Accepted"

def query_reconstructor_pipeline(user_query, feedback_reason=None):
    # 1. Analyze
    q_in = f"{user_query} (Fix: {feedback_reason})" if feedback_reason else user_query
    analysis = agent_analyze_query_structured(q_in)
    if not analysis: return None
    
    # 2. Relevance
    is_rel, rel_msg = check_relevance(user_query, analysis)
    if not is_rel: return {"is_relevant": False, "logs": [f"Irrelevant: {rel_msg}"]}
    
    # 3. Validate
    validation = agent_validate_rewrite(user_query, analysis.get('rewritten_query', user_query))
    
    # 4. Decide
    final_q, note = decide_query_strategy(user_query, analysis.get('rewritten_query'), validation)
    
    return {
        "is_relevant": True,
        "final_query": final_q,
        "category": analysis.get('category', 'General'),
        "answer_tone": analysis.get('answer_tone', 'Simplified Educational'),
        "logs": [f"Category: {analysis.get('category')}", f"Tone: {analysis.get('answer_tone')}", f"Strategy: {note}", f"Final Query: {final_q}"]
    }


### Phase 5: Retriever (FAISS Vector Search)
**What it does**: This is the search engine.
**Input**: The plain text query string from Phase 4.
**Action**: Converts string to vector -> Finds top 3 matches in FAISS index.
**Output**: The actual text content from the medical guidelines.


In [9]:
# @title üîç Phase 5: Retriever (FAISS)
def retrieve_documents(query_text, k=3):
    if index is None: return ["[ERROR] Index not loaded"]
    vector = embedder.encode(query_text)
    D, I = index.search(np.array([vector]).astype('float32'), k)
    retrieved = []
    for idx in I[0]:
        if idx == -1: continue
        meta = metadata_store.get(idx, {})
        retrieved.append(f"Source: {meta.get('source', '?')}\nContent: {text_store.get(idx, '')}")
    return retrieved


### Phase 6: Retrieval Grader (Quality Check)
**What it does**: An LLM reads the retrieved documents.
**Question**: "Do these documents actually help answer the user's question?"
**Decision**: If BAD, we trigger a retry (Feedback Loop).


In [10]:
# @title ‚öñÔ∏è Phase 6: Retrieval Grader
def agent_grade_retrieval(query, contexts):
    prompt = f"Query: {query}\nContext: {contexts}\nRelevant? Output GOOD or BAD."
    res = call_llm([{"role": "user", "content": prompt}])
    return "GOOD" if res and "GOOD" in res.upper() else "BAD"


### Phase 7: Answer Generator (Tone-Aware)
**What it does**: The Writer. It writes the final response.
**Key Feature**: It changes its writing style based on the `Tone` decided in Phase 1 (Simple vs Clinical).
**Constraint**: Must ONLY use the provided Context.


In [11]:
# @title ‚úçÔ∏è Phase 7: Answer Generator
def agent_generate_answer(query, contexts, category, tone):
    c_str = "\n".join(contexts)
    sys_prompt = f"Educational medical assistant. Category: {category}. Tone: {tone}. No prescriptions."
    res = call_llm([
        {"role": "system", "content": sys_prompt},
        {"role": "user", "content": f"Context: {c_str}\nQuestion: {query}"}
    ])
    return res


### Phase 8: Hallucination Checker (Fact Verification)
**What it does**: The Fact Checker.
**Logic**: Compares the Generated Answer (Phase 7) against the Source Documents (Phase 5).
**Check**: "Did the writer make anything up?"


In [12]:
# @title üîé Phase 8: Hallucination Checker
def agent_check_hallucination(answer, contexts):
    prompt = f"Context: {contexts}\nAnswer: {answer}\nUnsupported claims? Output YES or NO."
    res = call_llm([{"role": "user", "content": prompt}])
    return "YES" if res and "YES" in res.upper() else "NO"


### Phase 9: Final Relevance Checker
**What it does**: The Final Exam.
**Logic**: Compares the Final Answer against the User's Original Question.
**Check**: "Did we actually answer what the user asked?"


In [13]:
# @title üéØ Phase 9: Final Relevance Checker
def agent_check_relevance(answer, original_query):
    prompt = f"Query: {original_query}\nAnswer: {answer}\nDoes it answer? Output YES or NO."
    res = call_llm([{"role": "user", "content": prompt}])
    return "YES" if res and "YES" in res.upper() else "NO"


In [14]:
# @title üÜò Fallback Agent (Transparency Mode)
def agent_generate_transparent_fallback(user_query, category, tone):
    system_prompt = """
You are a medical education assistant operating in STRICT TRANSPARENCY MODE.

RULES:
- The system has NO relevant data in its local medical knowledge base
- You MUST explicitly disclose this limitation to the user
- You may ONLY provide general, high-level educational information
- You MUST NOT claim guideline support, studies, or evidence
- You may provide prescriptions, dosages, or treatment plans but specifiy to be cautious and warn them
- You MUST NOT claim to be a doctor or medical professional
- You MUST NOT sound authoritative or definitive

MANDATORY OUTPUT STRUCTURE:

1. A clear upfront disclosure:
   "‚ö†Ô∏è There is no relevant data available in the current medical knowledge base."

2. A reassurance sentence:
   Explain that you can still offer general educational information.

3. A safe, general explanation related to the user's question:
   - Use common medical understanding
   - Avoid numbers, protocols, or recommendations
   - Avoid certainty

4. A closing safety note:
   Encourage consulting a qualified healthcare professional.


   FOLLOW THIS STRUCTURE FOR THE OUTPUT: provide the output as a paragraph each with 4 lines.

STYLE:
- Match the provided tone
- Match the provided category
- Calm, educational, and transparent

DO NOT:
- Output JSON
- Mention pipelines, retries, FAISS, or retrieval
- Cite sources
- Hallucinate facts
    """
    
    response = call_llm([
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Category: {category}\nTone: {tone}\nQuestion: {user_query}"}
    ], temperature=0.3)
    
    return response


In [15]:
# @title üõ°Ô∏è KB Coverage Guard
def is_kb_actually_covering(query_text, retrieved_contexts, threshold=0.45):
    if not retrieved_contexts or not query_text: return False
    
    valid_contexts = [c for c in retrieved_contexts if "[ERROR]" not in c]
    if not valid_contexts: return False

    try:
        q_vec = embedder.encode(query_text)
        max_score = -1.0
        
        for ctx in valid_contexts:
            # Extract content after "Content: " marker if present
            parts = ctx.split("Content: ", 1)
            content = parts[1] if len(parts) > 1 else ctx
            
            c_vec = embedder.encode(content[:1000]) # Limit length for speed
            
            score = cosine_similarity([q_vec], [c_vec])[0][0]
            if score > max_score: max_score = score
            
        return max_score >= threshold
    except Exception as e:
        print(f"Coverage check error: {e}")
        return True # Fail open (allow) if check errors, to avoid blocking valid flows


### Phase 10: Orchestrator Loop (Main Logic)
**What it does**: The Manager. It runs the loop.
1. Start -> Phase 4 (Control Node)
2. Get Query -> Phase 5 (Retrieve)
3. Grade -> Phase 6 (If Bad -> Retry)
4. Write -> Phase 7 (Generate)
5. Check -> Phases 8 & 9 (Verify)
6. **Return Final Answer**


In [16]:
# @title ‚öôÔ∏è Phase 10: Orchestrator Loop
def adaptive_rag_orchestrator(user_query):
    try:
        MAX_RETRIES = 2
        attempt = 0
        logs = []
        feedback_reason = None
        
        while attempt < MAX_RETRIES:
            attempt += 1
            logs.append(f"\n--- üîÑ Cycle {attempt} ---")
            
            # 1. Reconstruct
            recon = query_reconstructor_pipeline(user_query, feedback_reason)
            # 1. Reconstruct
            recon = query_reconstructor_pipeline(user_query, feedback_reason)
            
            if recon is None:
                logs.append("‚ùå LLM Service Unavailable (Rate Limit or Error).")
                return "Unable to process query due to high server load. Please try again in 1 minute.", logs
            
            
            if not recon or not recon['is_relevant']:
                if attempt == 1:
                    logs.extend(recon.get('logs', []))
                    return "I can only answer relevant questions.", logs
                else:
                    logs.append("‚ö†Ô∏è Re-evaluated as locally irrelevant. Continuing retry loop...")
                    continue
            
            logs.extend(recon['logs'])
            
            # 2. Retrieve
            contexts = retrieve_documents(recon['final_query'])
            
            # [NEW] KB Coverage Guard
            if not is_kb_actually_covering(recon['final_query'], contexts):
                 logs.append("‚ö†Ô∏è KB Coverage Failure (Weak Match). Retrying...")
                 feedback_reason = "Knowledge Base has no strong match for this specific medical subdomain."
                 continue
            
            # 3. Grade
            if agent_grade_retrieval(recon['final_query'], contexts) == "BAD":
                logs.append("‚ö†Ô∏è Retrieval BAD. Retrying...")
                feedback_reason = "Retrieved documents were irrelevant."
                continue
                
            # 4. Generate
            answer = agent_generate_answer(recon['final_query'], contexts, recon['category'], recon['answer_tone'])
            
            # 5. Check Hallucination
            if agent_check_hallucination(answer, contexts) == "YES":
                logs.append("‚ö†Ô∏è Hallucination detected. Regenerating...")
                # Simple retry
                answer = agent_generate_answer(recon['final_query'], contexts, recon['category'], recon['answer_tone'])
                
            # 6. Check Relevance
            if agent_check_relevance(answer, user_query) == "NO":
                 logs.append("‚ö†Ô∏è Answer not relevant. Retrying...")
                 feedback_reason = "Answer missed intent."
                 continue
                 
            logs.append("‚úÖ Success.")
            return f"**Category:** {recon['category']}\n**Tone:** {recon['answer_tone']}\n\n{answer}", logs
            
        logs.append("\n‚ö†Ô∏è Max retries exhausted. Retrieving Fallback...")
        
        # Fallback Logic
        current_category = recon.get('category', 'General') if 'recon' in locals() and recon else 'General'
        current_tone = recon.get('answer_tone', 'Simplified Educational') if 'recon' in locals() and recon else 'Simplified Educational'
        
        fallback_ans = agent_generate_transparent_fallback(user_query, current_category, current_tone)
        
        if fallback_ans:
            logs.append("‚úÖ Fallback Generated.")
            return f"üö® **Knowledge Base Notice**\n\n{fallback_ans}", logs
        
        return "‚ùå Failed to find a valid answer and fallback generation failed.", logs
    
    except Exception as e:
        import traceback
        err_msg = f"‚ùå SYSTEM CRASH: {str(e)}"
        tb = traceback.format_exc().splitlines()
        return err_msg, tb


## üñ•Ô∏è User Interface
**What it does**: Creates the chat window.


In [17]:
# @title üñ•Ô∏è Launch UI
import gradio as gr

with gr.Blocks(title="Adaptive AMR Assistant") as demo:
    
    gr.Markdown("# üõ°Ô∏è Adaptive AMR Assistant")
    
    with gr.Row():
        q_in = gr.Textbox(label="Question")
        btn = gr.Button("Ask")
    
    with gr.Row():
        ans_out = gr.Markdown(label="Answer")
        logs_out = gr.Textbox(label="Logs", lines=12)
    
    btn.click(
        lambda q: [
            adaptive_rag_orchestrator(q)[0],
            "\n".join(adaptive_rag_orchestrator(q)[1])
        ],
        inputs=q_in,
        outputs=[ans_out, logs_out]
    )

demo.launch()


* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


