
#  EduFlowTech — Multi‑Agent Workflow with LangChain (Improved)

This notebook implements a **three‑agent workflow** for ticket management in a fictional e‑learning company.  

##  Learning goals
- Build a modular, multi‑agent system with LangChain (Query Processor → Content Searcher → Response Generator).
- Query a **simulated course/lesson database** with `pandas`.
- Coordinate agents with a clear, reproducible **workflow**.



## 1) Setup & Imports

We import the libraries, set a **fake OpenAI API key**, and prepare optional LangChain components.  
> Note: We include deterministic fallbacks to keep the notebook runnable without real API calls.


In [None]:

# Core
import os
import json
import textwrap
import pandas as pd

# LangChain (modern imports; tools optional)
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain


try:
    from langchain_community.llms.fake import FakeListLLM  # works in modern LangChain
    _HAS_FAKE_LLM = True
except Exception:
    _HAS_FAKE_LLM = False

# Fictitious OpenAI API key as required by the brief
import openai
openai.api_key = "sk-YOUR_FAKE_API_KEY"

print("Environment ready. Fake LLM available:", _HAS_FAKE_LLM)


Environment ready. Fake LLM available: True



## 2) Simulated EduFlowTech Database

We enrich the dataset so the system can answer a wider variety of queries.  
Columns: `course`, `topic`, `lesson`, `exercise`, `level`, `link`.


In [1]:

data = {
    "course": [
        "Basic Deep Learning", "Python for AI", "Computer Vision", "Natural Language Processing",
        "MLOps Foundations", "Data Engineering Basics", "Reinforcement Learning Intro", "Advanced Deep Learning"
    ],
    "topic": [
        "Neural Networks", "Python Programming", "Image Classification", "Text Analysis",
        "MLOps & Deployment", "ETL & Pipelines", "Policy & Value Functions", "Transformers & Attention"
    ],
    "lesson": [
        "Introduction to Neural Networks", "Loops and Functions", "Basic CNNs", "Embeddings and Transformers",
        "Model Serving with FastAPI", "Building ETL with Airflow", "Q-Learning Essentials", "Efficient Attention Mechanisms"
    ],
    "exercise": [
        "Implement a simple network", "Create a factorial function", "Classify cats and dogs",
        "Translate a text using embeddings", "Deploy a model endpoint", "Create an Airflow DAG",
        "Implement epsilon-greedy", "Fine-tune a transformer"
    ],
    "level": [
        "Intermediate", "Beginner", "Intermediate", "Advanced", "Intermediate", "Intermediate", "Intermediate", "Advanced"
    ],
    "link": [
        "https://eduflowtech.com/deep-learning/intro-nn",
        "https://eduflowtech.com/python/loops",
        "https://eduflowtech.com/vision/cnn-basics",
        "https://eduflowtech.com/nlp/embeddings-transformers",
        "https://eduflowtech.com/mlops/fastapi-serving",
        "https://eduflowtech.com/data/airflow-etl",
        "https://eduflowtech.com/rl/q-learning",
        "https://eduflowtech.com/deep/attention-efficiency"
    ]
}

df = pd.DataFrame(data)
df


NameError: name 'pd' is not defined


## 3) Utilities & Robust Search

We ensure results are **never empty**: ambiguous queries fall back to a reasonable default.


In [None]:

def _non_empty(df_like: pd.DataFrame) -> pd.DataFrame:
    """Return at least one row to keep the system responsive."""
    if df_like is None or df_like.empty:
        return df.sample(1).reset_index(drop=True)
    return df_like.reset_index(drop=True)

def search_content(category: str, query: str) -> pd.DataFrame:
    """Heuristic content search across topics and lessons (always returns ≥1 row)."""
    q = (query or "").lower()
    # Map broad intents to topics
    if any(k in q for k in ["neural", "network", "nn"]):
        res = df[df["topic"].str.contains("Neural", case=False)]
    elif "python" in q:
        res = df[df["topic"].str.contains("Python", case=False)]
    elif any(k in q for k in ["image", "vision", "cnn"]):
        res = df[df["topic"].str.contains("Image|Vision", case=False, regex=True)]
    elif any(k in q for k in ["text", "nlp", "language", "embedding", "transformer"]):
        res = df[df["topic"].str.contains("Text|Transformers", case=False, regex=True)]
    elif any(k in q for k in ["mlops", "deploy", "serve", "production"]):
        res = df[df["topic"].str.contains("MLOps", case=False)]
    elif any(k in q for k in ["etl", "pipeline", "airflow", "data engineering"]):
        res = df[df["topic"].str.contains("ETL|Pipelines", case=False, regex=True)]
    elif any(k in q for k in ["reinforcement", "q-learning", "policy", "value"]):
        res = df[df["topic"].str.contains("Policy|Value|Reinforcement", case=False, regex=True)]
    elif any(k in q for k in ["bug", "error", "crash", "exception", "traceback"]):
        # If it's a technical error, route to something practical like MLOps/Data Eng for triage
        res = df[df["topic"].str.contains("MLOps|ETL", case=False, regex=True)]
    else:
        # Fallback: try fuzzy in lesson names, else sample
        res = df[df["lesson"].str.lower().str.contains(q)] if q else pd.DataFrame()
    return _non_empty(res)



## 4) Agent 1 — Query Processor

**Goal:** classify the user query as `lesson`, `exercise`, or `technical error`.

We provide:
- A **LangChain `LLMChain`** (keeps the formal agent requirement), and
- A **deterministic fallback classifier** (so the notebook runs without external APIs).


In [None]:

processor_prompt = PromptTemplate(
    input_variables=["query"],
    template=textwrap.dedent("""
        You are a classifier for student queries. Categorize the query into exactly one of:
        - lesson
        - exercise
        - technical error
        
        Query: {query}
        
        Category:
    """)
)

if _HAS_FAKE_LLM:
    # Returns 'lesson' deterministically for demo (you can extend the list to drive different flows)
    processor_llm = FakeListLLM(responses=["lesson"])
    processor_agent = LLMChain(llm=processor_llm, prompt=processor_prompt)
else:
    processor_agent = None  # We'll rely on the heuristic fallback below.

def classify_query(query: str) -> str:
    """Deterministic classifier to keep the notebook runnable offline."""
    q = (query or "").lower()
    if any(k in q for k in ["bug", "error", "crash", "exception", "traceback"]):
        return "technical error"
    if any(k in q for k in ["exercise", "task", "practice", "quiz"]):
        return "exercise"
    # default
    return "lesson"



## 5) Agent 2 — Content Searcher

**Goal:** retrieve the most relevant content from the database.  
We expose a simple **Python tool** (`search_content`) that other agents can call.

> Optional: In a full LangChain agent loop you could wrap `search_content` as a `Tool` and
> initialize an agent with `initialize_agent([...])`. We keep it lightweight and fully runnable here.



## 6) Agent 3 — Response Generator

**Goal:** produce a friendly, concise answer referencing the best‑fit lesson/exercise and link.

We include:
- A **LangChain `LLMChain`** (formal component), and
- A **deterministic renderer** that formats the final message.


In [None]:

response_prompt = PromptTemplate(
    input_variables=["query", "records"],
    template=textwrap.dedent("""
        You are EduFlowTech's assistant. Based on the retrieved records, write a concise, helpful reply.
        
        User query: {query}
        
        Records:
        {records}
        
        Your reply should:
        - Recommend the single best lesson/exercise.
        - Include the link.
        - Be friendly and actionable.
    """)
)

if _HAS_FAKE_LLM:
    # deterministic response sample
    response_llm = FakeListLLM(responses=["Here is a recommended lesson with a link."])
    response_agent = LLMChain(llm=response_llm, prompt=response_prompt)
else:
    response_agent = None

def render_response(query: str, rec: pd.Series) -> str:
    return (
        f"Recommended lesson for your query: **{rec['lesson']}** "
        f"from the course **{rec['course']}** (level: {rec['level']}).\n"
        f"Topic: {rec['topic']}.\n"
        f"Link: {rec['link']}"
    )



## 7) Visual Diagram — Agent Interaction

### ASCII Diagram (no dependencies)
```
User Query
    │
    ▼
[Agent 1: Query Processor]
    │  category (lesson / exercise / technical error)
    ▼
[Agent 2: Content Searcher]
    │  top‑K records (DataFrame rows)
    ▼
[Agent 3: Response Generator]
    │  final message (with link)
    ▼
System Reply
```

> Optionally, if you have `graphviz` installed, run the next cell to render a flowchart.


In [None]:

# Optional: Graphviz diagram (safe to skip if not installed)
try:
    from graphviz import Digraph
    dot = Digraph(comment="EduFlowTech Agents")
    dot.attr(rankdir="TB", nodesep="0.4", fontsize="10")
    dot.node("U", "User Query")
    dot.node("A1", "Agent 1:\nQuery Processor")
    dot.node("A2", "Agent 2:\nContent Searcher")
    dot.node("A3", "Agent 3:\nResponse Generator")
    dot.node("R", "System Reply")

    dot.edges(["UA1"])
    dot.edge("U", "A1")
    dot.edge("A1", "A2", label="category")
    dot.edge("A2", "A3", label="top‑K records")
    dot.edge("A3", "R", label="final message")

    display(dot)
except Exception as e:
    print("Graphviz not available (skipping). Reason:", e)



## 8) Orchestrator — End‑to‑End Workflow

This function wires the three agents:
1. **Agent 1** classifies the query.
2. **Agent 2** fetches the best matching content.
3. **Agent 3** composes the final reply.

We also provide **multiple test scenarios** to showcase adaptability.


In [None]:

def workflow(query: str, k: int = 1) -> dict:
    """End‑to‑end coordination with deterministic fallbacks for full reproducibility."""
    # Agent 1 — get category
    category = classify_query(query)
    # Agent 2 — content search
    results = search_content(category, query)
    # select top-1 for a concise answer (you could extend to top‑k)
    best = results.iloc[0]
    # Agent 3 — final response
    answer = render_response(query, best)
    return {
        "category": category,
        "results": results.head(k).to_dict(orient="records"),
        "answer": answer
    }

tests = [
    "What is the best lesson to learn neural networks?",
    "I need a Python exercise about loops",
    "How do I deploy a model to production?",
    "I got a crash error while running the training pipeline",
    "Any intro to image classification with CNNs?"
]

for q in tests:
    out = workflow(q)
    print("\n---")
    print("Query:", q)
    print("Category:", out["category"])
    print("Answer:", out["answer"])
