# Local RAG Agent with Causal LLM (No API, Token Budgeting, Infinite Loop Prevention)
This notebook demonstrates a **retrieval-augmented generation agent** that uses a **local causal LLM**, manages token usage, and avoids infinite loops.

In [ ]:
# Install required packages
!pip install sentence-transformers faiss-cpu transformers datasets

In [ ]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# 1. Load local causal LLM (small model, no API cost)
model_name = 'gpt2-medium'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 2. Load embedding model for retrieval
embed_model = SentenceTransformer('all-MiniLM-L6-v2')

In [ ]:
# 3. Sample documents
documents = [
    "The capital of France is Paris.",
    "Python is a programming language used for AI and web development.",
    "RAG agents combine retrieval with generative LLMs."
]

# 4. Create embeddings
doc_embeddings = embed_model.encode(documents)
dim = doc_embeddings.shape[1]

# 5. Build FAISS index
index = faiss.IndexFlatL2(dim)
index.add(np.array(doc_embeddings))

In [ ]:
def retrieve(query, top_k=2):
    q_emb = embed_model.encode([query])
    D, I = index.search(np.array(q_emb), top_k)
    return [documents[i] for i in I[0]]

In [ ]:
def generate_answer(prompt, max_tokens=100):
    inputs = tokenizer(prompt, return_tensors='pt')
    outputs = model.generate(**inputs, max_new_tokens=max_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [ ]:
# 6. Agentic loop with infinite-loop prevention
query = "Explain RAG agents in simple terms."
history = []
max_steps = 5  # prevent infinite loops

for step in range(max_steps):
    retrieved_docs = retrieve(query, top_k=2)
    context = ' '.join(retrieved_docs)
    prompt = f"Answer the question using the context: {context}\nQuestion: {query}\nAnswer:"
    answer = generate_answer(prompt, max_tokens=100)
    print(f"Step {step+1} Answer: {answer}")
    
    if answer in history:  # avoid loops
        print("Repeating answer detected. Stopping loop.")
        break
    history.append(answer)

    # Optionally refine query or stop if answer is sufficient
    if 'RAG' in answer:
        print("Sufficient answer found.")
        break

### âœ… Key Features in This Notebook
- Uses **local causal LLM** (`AutoModelForCausalLM`) to avoid API cost.
- Uses **FAISS + sentence-transformers** for retrieval.
- **Token budgeting** via `max_new_tokens`.
- **Infinite loop prevention** via max steps and repeated-answer check.
- **Cost-saving tips:** small model, top-K retrieval, summarized context.