# Theoretical Report – Joni Eats Chatbot (Groq + Streamlit)
This section answers the guideline’s questions (1–6) in narrative form for reporting and submission.

## 1) Use Case, Target Users, Benefits, Limitations
Use case: A customer-facing cafe assistant for Joni Eats. The bot answers menu questions, dietary info (vegan/halal/gluten‑free), opening hours, location, specials, wait times, and simple ordering guidance.

- Target users: Walk‑in customers, regulars, and first‑time visitors who want quick answers without waiting for staff during rush hours.
- Expected benefits: Faster service, fewer repetitive questions for staff, consistent information, and better discovery of items (e.g., seasonal specials).
- Limitations: Not a payment system, no real‑time inventory unless integrated, may not reflect last‑second menu changes, and should not give medical or legal advice.

## 2) Justification
A chatbot is well‑suited because cafe inquiries are short, frequent, and repetitive. Compared to browsing a website/menu PDF, chat supports quick follow‑ups (e.g., “any nut‑free desserts?”) and clarifications. Versus human staff, the bot can answer 24/7 and reduce queue pressure while keeping tone consistent. It will perform retrieval‑augmented Q&A from a curated knowledge base (menu, policies, specials) and follow a conversation flow for greetings, clarifying questions, and goodbyes.

## 3) Groq Model Selection
Primary model: llama3‑8b‑8192 on Groq for fast, low‑latency chat. Rationale:
- Size/speed: 8B is cost‑effective and responsive for real‑time interactions; upgrade path to 70B exists for tougher reasoning.
- Language: Good English fluency; multilingual support acceptable for basic hospitality queries.
- Latency: Groq’s inference speed improves user experience during live chat.

## 4) Conversation Flow (with samples)
Flow:
1. Greeting → “Hi, welcome to Joni Eats! How can I help?”
2. Understand intent → Identify if the user asks about menu, hours, location, dietary needs, or specials.
3. Retrieve context → Pull top KB snippets relevant to the query.
4. Respond → Provide concise, friendly answer; list 3–5 options if appropriate.
5. Clarify → Offer follow‑ups: “Want vegan or gluten‑free options?”
6. Goodbye → Close politely or handoff to staff if needed.

Sample inputs/responses:
- Q: “Do you have vegan lunch options under $12?”
  A: “Yes—try our Vegan Power Bowl and Roasted Veggie Wrap (both under $12). Would you like a protein add‑on?”
- Q: “What time do you open on Saturdays?”
  A: “We open at 8:00 AM on Saturdays and close at 8:00 PM.”

## 5) Prompting Technique
Technique: Few‑shot prompting (examples from the corpus’s CHAT_PATTERNS section) plus zero‑shot fallback. The system prompt defines the Joni Eats persona and scope. Structured, bulleted outputs keep answers skimmable. The prompt asks the model to only answer cafe‑related questions and to gracefully decline out‑of‑scope queries while steering back to menu/hours/specials.

## 6) Frontend
A Streamlit UI makes the bot easy to try: chat interface, settings (model, temperature, top‑k), and an expander that shows retrieved snippets for transparency. This supports quick feedback from staff and customers before deeper integrations (ordering, inventory).

# Joni Eats Chatbot – Project Report
A Groq LLM chatbot with a Streamlit frontend for the cafe "Joni Eats." This notebook documents design choices, implementation steps, and validation.

## 1) Verify Environment and Install Dependencies
Use %pip to install into the active kernel (ensure your .venv is selected in VS Code). We'll install: groq, streamlit, python-dotenv, scikit-learn, numpy, pandas, tiktoken (optional), pytest. Then quick import check and versions.

## 2) Load Secrets and Paths
Load GROQ_API_KEY from the workspace .env using dotenv. Define paths for Week-03 assets and output directories. Validate existence.

## 3) Ingest Knowledge Base and Prompt Assets
Read KB chunks and load prompt assets for templating; preview a few lines.

## 4) Build Lightweight Retriever (TF‑IDF cosine)
Implement TF‑IDF vectorization and cosine similarity retrieval. Cache artifacts with joblib to accelerate reruns.

## 5) Define System Prompt and Prompting Strategy
Persona: You are Joni Eats’ friendly, concise, and helpful cafe assistant. Use few‑shot patterns from the corpus’s CHAT_PATTERNS section and context flow guidance from the CONTEXT_FLOW section; fall back to zero‑shot if needed.

## 6) Initialize Groq Client and Sanity Check
Create a client and send a tiny test to verify credentials.

## 7) Compose RAG Pipeline: retrieve‑then‑generate
Combine retriever with Groq to answer queries using KB snippets and few‑shot patterns.

## 8) Chat Memory and Safety Guards
Maintain rolling history and simple guardrails to keep the bot on-topic and safe.

In [16]:
# Final Backend Code (single cell)

from pathlib import Path

from dotenv import load_dotenv

import os

import re

from groq import Groq

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity

import joblib



# --- Paths & Secrets ---

workspace_root = Path("d:/fellowship/Buildables-Artificial-Intelligence-Fellowship").resolve()

env_path = workspace_root / ".env"

load_dotenv(env_path)

GROQ_API_KEY = os.getenv("GROQ_API_KEY")

week03 = workspace_root / "Week-03"

corpus_path = week03 / "joni_eats_corpus.txt"

cache_dir = week03 / ".cache"

cache_dir.mkdir(parents=True, exist_ok=True)



# --- Helpers: IO & Parsing ---

def read_text(path: Path) -> str:

    with path.open("r", encoding="utf-8") as f:

        return f.read()



def parse_corpus(corpus: str) -> tuple[str, str, str]:

    # Split the combined corpus into sections

    sections = re.split(r"^### SECTION: (.+)$", corpus, flags=re.M)

    mapping = {"RESTAURANT_KB": "", "CHAT_PATTERNS": "", "CONTEXT_FLOW": ""}

    for i in range(1, len(sections), 2):

        name = sections[i].strip().upper()

        body = sections[i+1] if i+1 < len(sections) else ""

        if name in mapping:

            mapping[name] = body.strip()

    return mapping["RESTAURANT_KB"], mapping["CHAT_PATTERNS"], mapping["CONTEXT_FLOW"]



def split_into_chunks(text: str, min_len: int = 250) -> list[str]:

    parts = [p.strip() for p in re.split(r"\n\s*\n+", text) if p.strip()]

    chunks: list[str] = []

    for p in parts:

        if len(p) >= min_len:

            chunks.append(p)

        else:

            if chunks and len(chunks[-1]) < min_len:

                chunks[-1] = chunks[-1] + "\n" + p

            else:

                chunks.append(p)

    return chunks



def build_dietary_index(kb_text: str) -> dict:

    idx = {"vegan": set(), "vegetarian": set(), "gluten-free": set(), "halal": set()}

    for line in [l.strip() for l in kb_text.splitlines() if l.strip()]:

        low = line.lower()

        if any(k in low for k in idx.keys()) or "veggie" in low or "gluten free" in low:

            if "vegan" in low:

                idx["vegan"].add(line)

            if "vegetarian" in low or "veggie" in low:

                idx["vegetarian"].add(line)

            if "gluten-free" in low or "gluten free" in low:

                idx["gluten-free"].add(line)

            if "halal" in low:

                idx["halal"].add(line)

    return {k: sorted(v) for k, v in idx.items()}



# --- Load corpus & prepare chunks ---

corpus_text = read_text(corpus_path)

kb_text, chat_patterns, context_flow = parse_corpus(corpus_text)

kb_chunks = split_into_chunks(kb_text, min_len=250)

diet_index = build_dietary_index(kb_text)



# --- Retriever (TF-IDF + cosine) with caching ---

def build_tfidf_retriever(chunks: list[str], ngram_range=(1, 2)) -> tuple[TfidfVectorizer, any]:

    vectorizer = TfidfVectorizer(ngram_range=ngram_range, stop_words="english")

    matrix = vectorizer.fit_transform(chunks)

    return vectorizer, matrix



cache_vec = cache_dir / "tfidf_vectorizer.joblib"

cache_mat = cache_dir / "tfidf_matrix.joblib"

if cache_vec.exists() and cache_mat.exists():

    vectorizer = joblib.load(cache_vec)

    matrix = joblib.load(cache_mat)

else:

    vectorizer, matrix = build_tfidf_retriever(kb_chunks)

    joblib.dump(vectorizer, cache_vec)

    joblib.dump(matrix, cache_mat)



def _expand_query(query: str) -> str:

    ql = query.lower()

    if any(t in ql for t in ["veg ", " veg", "vegan", "vegetarian", "veggie", "plant based", "plant-based"]):

        query = query + " vegan vegetarian veggie plant-based meatless dairy-free"

    return query



def retrieve(query: str, top_k: int = 5):

    q_vec = vectorizer.transform([_expand_query(query)])

    sims = cosine_similarity(q_vec, matrix).ravel()

    idxs = sims.argsort()[::-1][:top_k]

    return [(int(i), float(sims[i]), kb_chunks[int(i)]) for i in idxs]



# --- Prompting ---

def build_system_prompt() -> str:

    return (

        "You are Joni Eats’ cafe assistant. Be friendly, concise, and accurate. "

        "Answer only cafe-related questions (menu, hours, location, dietary info, specials, ordering, events). "

        "If a question is outside scope, briefly decline and steer back to cafe topics. "

        "If the user asks about vegan/vegetarian and the KB lists such items, do not say we don't have them. "

        "Cite menu items or policies when relevant. Use bullet points when listing options."

    )



def render_prompt(system_prompt: str, query: str, retrieved_context: list[tuple[int, float, str]], few_shots: str | None = None, dietary_hint: str | None = None):

    context_block = "\n\n".join([f"[Snippet {i}]\n{txt}" for i, _s, txt in retrieved_context])

    msgs = [

        {"role": "system", "content": system_prompt},

        {"role": "system", "content": "Context from cafe knowledge base:\n" + (context_block or "(no relevant context)")},

    ]

    if dietary_hint:

        msgs.append({"role": "system", "content": dietary_hint})

    if few_shots and few_shots.strip():

        msgs.append({"role": "system", "content": "Example interactions:\n" + few_shots.strip()})

    msgs.append({"role": "user", "content": query})

    return msgs



# --- Groq Client & Model Aliases ---

client = Groq(api_key=GROQ_API_KEY)

MODEL_ALIASES = {

    "llama3-8b-8192": "llama-3.1-8b-instant",

    "llama3-70b-8192": "llama-3.1-70b-versatile",

}

def normalize_model(model: str) -> str:

    return MODEL_ALIASES.get(model, model)



def groq_chat(messages: list[dict], model: str = "llama-3.1-8b-instant", temperature: float = 0.2) -> str:

    model = normalize_model(model)

    resp = client.chat.completions.create(

        model=model,

        messages=messages,

        temperature=temperature,

    )

    return resp.choices[0].message.content



# --- Dietary Hint ---

def _dietary_hint_for(query: str) -> str | None:

    ql = query.lower()

    if any(t in ql for t in ["vegan", "plant-based", "plant based", "veggie", "vegetarian"]):

        if diet_index.get("vegan") or diet_index.get("vegetarian"):

            items = (diet_index.get("vegan") or [])[:3] + (diet_index.get("vegetarian") or [])[:3]

            items = [f"- {it}" for it in items[:5]]

            return "Dietary guidance: The menu lists vegan/vegetarian options. Prefer offering them if relevant.\n" + "\n".join(items)

    return None



# --- Memory & Safety ---

MAX_HISTORY = 16

PROFANITY_RE = re.compile(r"\b(fuck|shit|bitch)\b", re.I)



def filter_input(text: str) -> str:

    if len(text) > 1000:

        text = text[:1000] + "…"

    if PROFANITY_RE.search(text):

        text = PROFANITY_RE.sub("(redacted)", text)

    return text



def roll_history(history: list[dict], new_user: str, new_assistant: str | None = None):

    history.append({"role": "user", "content": new_user})

    if new_assistant:

        history.append({"role": "assistant", "content": new_assistant})

    if len(history) > MAX_HISTORY:

        del history[:-MAX_HISTORY]

    return history



# --- Public API ---

def answer_query(query: str, top_k: int = 5, temperature: float = 0.2, model: str = "llama-3.1-8b-instant") -> dict:

    query = filter_input(query)

    hits = retrieve(query, top_k=top_k)

    system_p = build_system_prompt()

    few_shots = chat_patterns

    dietary_hint = _dietary_hint_for(query)

    messages = render_prompt(system_p, query, hits, few_shots, dietary_hint=dietary_hint)

    try:

        text = groq_chat(messages, model=model, temperature=temperature)

    except Exception as e:

        text = "I'm not sure about that right now. Please ask about our menu, hours, or specials."

    return {

        "text": text,

        "retrieved": hits,

        "messages": messages,

        "model": model,

    }
