# 04 ‚Äî Gradio UI for FIA IBMS Chatbot
**Interface:** Dark-themed chat UI with smart routing, token-by-token streaming.

**Routing:** Database queries ‚Üí NL2SQL pipeline | General questions ‚Üí qwen3 direct

**Qwen3 parameters:** Official recommended settings for non-thinking mode.

**Gradio:** 6.4.0 | **Port:** 7861

## Cell 1: Imports & Configuration

In [1]:
import oracledb
from sqlalchemy import create_engine, text
import pandas as pd
import re
import time
import json
import ollama
import gradio as gr
from pathlib import Path

# === Paths ===
PROJECT_DIR = Path.home() / "ml-projects" / "python-projects" / "IBMS_LLM"
CONFIG_DIR  = PROJECT_DIR / "Config"

# === Models ===
SQL_MODEL       = "qwen2.5-coder:14b"
NARRATION_MODEL = "qwen3-14b-fixed"
CHAT_MODEL      = "qwen3-14b-fixed"

# === Official Qwen3 Non-Thinking Mode Parameters ===
# Source: https://ollama.com/dengcao/Qwen3-14B
QWEN3_OPTIONS = {
    "temperature": 0.7,
    "top_p": 0.8,
    "top_k": 20,
    "min_p": 0,
    "repeat_penalty": 1.5,   # Critical for quantized models ‚Äî stops repetition
    "num_predict": 2048,
}

print(f"Gradio version: {gr.__version__}")
print(f"SQL model:       {SQL_MODEL}")
print(f"Chat/Narration:  {CHAT_MODEL}")
print(f"Qwen3 options:   {QWEN3_OPTIONS}")

Gradio version: 6.4.0
SQL model:       qwen2.5-coder:14b
Chat/Narration:  qwen3-14b-fixed
Qwen3 options:   {'temperature': 0.7, 'top_p': 0.8, 'top_k': 20, 'min_p': 0, 'repeat_penalty': 1.5, 'num_predict': 2048}


## Cell 2: Oracle Connection

In [2]:
engine = create_engine(
    "oracle+oracledb://ibms_user:ibms_pass@localhost:1521/?service_name=FREEPDB1",
    pool_pre_ping=True,
    pool_size=3,
)

with engine.connect() as conn:
    result = conn.execute(text("SELECT 1 FROM dual"))
    print("Oracle connection OK:", result.fetchone())

Oracle connection OK: (1,)


## Cell 3: Load SQL Prompt Template

In [3]:
PROMPT_TEMPLATE = (CONFIG_DIR / "prompt_template.txt").read_text()
print(f"SQL prompt loaded: {len(PROMPT_TEMPLATE):,} chars")

SQL prompt loaded: 13,474 chars


## Cell 4: Query Classifier

In [4]:
CLASSIFIER_PROMPT = """Classify this message as DATABASE or GENERAL.

DATABASE = needs data from IBMS database (counts, lists, lookups, comparisons, statistics)
GENERAL = greeting, follow-up, explanation, opinion, or anything NOT needing a new database query

Reply with one word only: DATABASE or GENERAL\n/no_think

Message: {message}"""


def classify_query(message: str, history: list[dict]) -> str:
    prompt = CLASSIFIER_PROMPT.replace("{message}", message)
    if history:
        recent = history[-4:]
        context = "\n".join(f"{m.get('role','')}: {m.get('content','')[:150]}" for m in recent)
        prompt += f"\n\nContext:\n{context}"
    
    try:
        response = ollama.chat(
            model=CHAT_MODEL,
            messages=[{"role": "user", "content": prompt}],
            options={"temperature": 0.0, "num_predict": 10, "repeat_penalty": 1.5},
        )
        result = response["message"]["content"].strip().upper()
        return "DATABASE" if "DATABASE" in result else "GENERAL"
    except Exception:
        return "DATABASE"


# Quick test
for msg in ["How many travelers?", "Hello", "Why is that so high?", "Show top 10"]:
    print(f"  [{classify_query(msg, []):>8}] {msg}")

  [DATABASE] How many travelers?
  [ GENERAL] Hello
  [ GENERAL] Why is that so high?
  [DATABASE] Show top 10


## Cell 5: Prompts ‚Äî General Chat & Narration

In [5]:
SYSTEM_PROMPT_GENERAL = """You are an AI assistant for FIA (Federal Investigation Agency) Pakistan, specializing in the IBMS (Integrated Border Management System).

Guidelines:
- Match your response length to the question. Short questions get short answers. Detailed questions get detailed answers.
- For greetings (hello, hi, etc.), respond briefly and warmly. Introduce yourself in 1-2 sentences and ask how you can help.
- For follow-up questions about previous answers, provide relevant analysis or clarification.
- For FIA/IBMS concept questions, explain clearly with relevant context.
- If the officer needs specific data, suggest they ask a data question.
- Be professional but conversational. Do NOT pad responses with unnecessary information.
- Do NOT invent stories or hypothetical scenarios unless explicitly asked."""


NARRATION_PROMPT = """You are a senior FIA intelligence analyst. An officer asked a question and the system queried the IBMS database. Below are the results.

Write a professional intelligence briefing based ONLY on the data provided.

Rules:
- Lead with the key finding that directly answers the question.
- Include all important numbers exactly as shown (counts, percentages, dates).
- If results have rankings/tables, present and analyze them.
- If 0 rows returned, say "No records found" and suggest why.
- Do NOT invent data. Do NOT mention SQL or databases.
- Match response length to complexity: simple counts get 2-3 sentences, complex analyses get detailed paragraphs.
- End with a brief operational insight when the data warrants it.
- Do NOT pad your response to fill space. Be thorough but not verbose.

QUESTION: {question}

RESULTS:
{results}

Briefing:"""


print("Prompts defined.")

Prompts defined.


## Cell 6: NL2SQL Pipeline Functions

In [6]:
def extract_sql(raw: str) -> str:
    raw = raw.strip()
    match = re.search(r'```(?:sql)?\s*\n?(.*?)\n?```', raw, re.DOTALL | re.IGNORECASE)
    if match:
        sql = match.group(1).strip()
    else:
        match = re.search(r'(SELECT\b.*)', raw, re.DOTALL | re.IGNORECASE)
        sql = match.group(1).strip() if match else raw
    if ';' in sql:
        sql = sql[:sql.index(';')].strip()
    return sql


def generate_sql(question: str) -> tuple[str, str, float]:
    prompt = PROMPT_TEMPLATE.replace("{question}", question)
    t0 = time.time()
    response = ollama.chat(
        model=SQL_MODEL,
        messages=[{"role": "user", "content": prompt}],
        options={"temperature": 0.0, "num_predict": 1024},
    )
    latency = time.time() - t0
    raw = response["message"]["content"]
    return raw, extract_sql(raw), latency


IBMS_TABLES = {
    "countries", "ports_of_entry", "visa_categories", "sponsors",
    "travelers", "document_registry", "visa_applications", "travel_records",
    "asylum_claims", "removal_orders", "detention_records",
    "family_relationships", "watchlist", "ecl_entries",
    "trafficking_cases", "illegal_crossings", "offloading_records",
    "risk_profiles", "suspect_networks", "audit_log",
}

BLOCKED_KEYWORDS = [
    r"\bINSERT\b", r"\bUPDATE\b", r"\bDELETE\b", r"\bMERGE\b",
    r"\bCREATE\b", r"\bDROP\b", r"\bALTER\b", r"\bTRUNCATE\b", r"\bRENAME\b",
    r"\bGRANT\b", r"\bREVOKE\b",
    r"\bDBMS_", r"\bUTL_", r"\bSYS\.", r"\bDBA_",
    r"\bV\$", r"\bEXECUTE\s+IMMEDIATE\b",
    r"\bBEGIN\b", r"\bDECLARE\b", r"\bEXEC\b",
]


def validate_sql(sql: str) -> tuple[bool, str]:
    if not sql or not sql.strip():
        return False, "Empty SQL"
    sql_upper = sql.strip().upper()
    if not (sql_upper.startswith("SELECT") or sql_upper.startswith("WITH")):
        return False, f"Must start with SELECT/WITH. Got: {sql_upper[:30]}"
    if ';' in sql:
        return False, "Multiple statements detected"
    for pattern in BLOCKED_KEYWORDS:
        match = re.search(pattern, sql, re.IGNORECASE)
        if match:
            return False, f"Blocked: {match.group()}"
    if '--' in sql or '/*' in sql:
        return False, "SQL comments not allowed"
    sql_lower = sql.lower()
    if not any(t in sql_lower for t in IBMS_TABLES):
        return False, "No known IBMS table referenced"
    return True, "OK"


def execute_sql(sql: str) -> tuple[bool, pd.DataFrame | None, str, float]:
    is_valid, reason = validate_sql(sql)
    if not is_valid:
        return False, None, f"Validation failed: {reason}", 0.0
    try:
        t0 = time.time()
        with engine.connect() as conn:
            df = pd.read_sql(text(sql), conn)
        exec_time = time.time() - t0
        return True, df, f"{len(df)} rows in {exec_time:.2f}s", exec_time
    except Exception as e:
        return False, None, f"Execution error: {str(e)[:200]}", 0.0


print("Pipeline functions loaded.")

Pipeline functions loaded.


## Cell 7: Streaming Helper
Uses `ollama.chat(stream=True)` with official Qwen3 parameters for token-by-token output.

In [7]:
def clean_qwen3_output(text: str) -> str:
    """Clean up common qwen3 artifacts."""
    text = re.sub(r'<think>.*?</think>\s*', '', text, flags=re.DOTALL)
    text = re.sub(r'<think>(?:(?!</think>).)*$', '', text, flags=re.DOTALL)
    text = re.sub(r'^(A:\s*\n?)+', '', text)
    text = re.sub(r'(Okay,.*?(done|ready|complete|wrap it up|finalize|all set)[.\s]*)+$', '', text, flags=re.DOTALL)
    return text.strip()


def stream_llm_response(model: str, messages: list[dict], options: dict = None):
    """Generator yielding accumulated cleaned text token-by-token.
    
    NOTE: Ollama think=False is unreliable for qwen3.
    Workaround: System prompt embedded in user message + /no_think appended.
    """
    opts = dict(QWEN3_OPTIONS)
    if options:
        opts.update(options)
    
    accumulated = ""
    
    stream = ollama.chat(
        model=model,
        messages=messages,
        options=opts,
        stream=True,
    )
    
    for chunk in stream:
        token = chunk.get("message", {}).get("content", "")
        if token:
            accumulated += token
            cleaned = clean_qwen3_output(accumulated)
            if cleaned:
                yield cleaned
    
    yield clean_qwen3_output(accumulated)


print("stream_llm_response() defined.")



stream_llm_response() defined.


## Cell 8: Main Chat Handler

In [8]:
def chat_handler_streaming(message: str, history: list[dict]):
    """Smart routing + streaming. Ollama workarounds applied."""
    if not message.strip():
        yield "Please enter a question."
        return
    
    yield "üîç **Analyzing your question...**"
    query_type = classify_query(message, history)
    
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # GENERAL PATH
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    if query_type == "GENERAL":
        context = ""
        if history:
            for msg in history[-10:]:
                role = msg.get("role", "")
                content = msg.get("content", "")[:500]
                context += f"{role}: {content}\n"
        
        # Embed system prompt in user message (Ollama ignores system role for qwen3)
        combined = SYSTEM_PROMPT_GENERAL + "\n\n"
        if context:
            combined += f"Previous conversation:\n{context}\n\n"
        combined += f"Officer's message: {message}\n\nRespond now. /no_think"
        
        messages = [{"role": "user", "content": combined}]
        
        t0 = time.time()
        last_text = ""
        
        for partial_text in stream_llm_response(CHAT_MODEL, messages):
            last_text = partial_text
            yield partial_text
        
        latency = time.time() - t0
        
        footer = f"\n\n---\n"
        footer += f"<details>\n"
        footer += f"<summary>‚ÑπÔ∏è Response Info</summary>\n\n"
        footer += f"**Mode:** General conversation\n\n"
        footer += f"**Model:** {CHAT_MODEL}\n\n"
        footer += f"**Time:** {latency:.1f}s\n"
        footer += f"</details>"
        
        yield last_text + footer
        return
    
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    # DATABASE PATH
    # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
    pipeline_start = time.time()
    
    yield "‚è≥ **Generating SQL...**"
    try:
        raw, sql, gen_time = generate_sql(message)
    except Exception as e:
        yield f"‚ùå SQL generation failed: {str(e)[:200]}"
        return
    
    yield f"‚úÖ SQL generated ({gen_time:.1f}s)\n‚è≥ **Validating...**"
    is_valid, val_msg = validate_sql(sql)
    if not is_valid:
        yield f"‚ö†Ô∏è Query blocked: {val_msg}\n\nPlease rephrase your question."
        return
    
    yield f"‚úÖ SQL generated ({gen_time:.1f}s)\n‚úÖ Validated\n‚è≥ **Executing on Oracle...**"
    try:
        exec_success, df, exec_msg, exec_time = execute_sql(sql)
    except Exception as e:
        yield f"‚ùå Database error: {str(e)[:200]}"
        return
    if not exec_success:
        yield f"‚ö†Ô∏è Execution failed: {exec_msg}\n\nPlease rephrase."
        return
    
    row_count = len(df) if df is not None else 0
    
    yield (
        f"‚úÖ SQL generated ({gen_time:.1f}s)\n"
        f"‚úÖ Validated\n"
        f"‚úÖ Executed ({row_count} rows, {exec_time:.2f}s)\n\n"
        f"‚è≥ **Narrating results...**"
    )
    
    if df is None or df.empty:
        results_text = "(No results ‚Äî 0 rows returned)"
    else:
        display_df = df.head(50)
        results_text = display_df.to_string(index=False)
        if len(df) > 50:
            results_text += f"\n\n... ({len(df)} total rows, showing first 50)"
    
    nar_prompt = NARRATION_PROMPT.replace("{question}", message).replace("{results}", results_text)
    nar_prompt += "\n/no_think"
    nar_messages = [{"role": "user", "content": nar_prompt}]
    
    nar_start = time.time()
    last_text = ""
    
    for partial_narration in stream_llm_response(NARRATION_MODEL, nar_messages):
        last_text = partial_narration
        yield partial_narration
    
    nar_time = time.time() - nar_start
    total_time = time.time() - pipeline_start
    
    footer = f"\n\n---\n"
    footer += f"<details>\n"
    footer += f"<summary>üìä Query Details (click to expand)</summary>\n\n"
    footer += f"**Mode:** Database query (NL2SQL)\n\n"
    footer += f"**Generated SQL:**\n```sql\n{sql}\n```\n\n"
    footer += f"**Execution:** {row_count} rows in {exec_time:.2f}s\n\n"
    footer += f"**Timings:** SQL Gen: {gen_time:.1f}s ‚îÇ Exec: {exec_time:.2f}s ‚îÇ Narration: {nar_time:.1f}s ‚îÇ **Total: {total_time:.1f}s**\n"
    footer += f"</details>"
    
    yield last_text + footer


print("Chat handler defined.")



Chat handler defined.


## Cell 9: Custom CSS

In [9]:
CUSTOM_CSS = """
.gradio-container {
    max-width: 1000px !important;
    margin: auto !important;
}
details summary {
    cursor: pointer;
    font-weight: 600;
    opacity: 0.8;
    font-size: 0.85rem;
}
details summary:hover {
    opacity: 1.0;
}
"""

print("CSS defined.")

CSS defined.


## Cell 10: Build & Launch

In [10]:
EXAMPLE_QUERIES = [
    "How many travelers are in the system?",
    "List all off-loaded passengers at Islamabad Airport in 2025",
    "Which airlines have the highest off-loading rate?",
    "How many watchlist alerts are currently active?",
    "Top 10 most frequent travelers this year",
    "Compare off-loading rates across all airports",
    "What is IBMS?",
    "What does offloading mean in FIA context?",
]

app = gr.ChatInterface(
    fn=chat_handler_streaming,
    title="üîê FIA IBMS Intelligence Assistant",
    description="Ask data questions (auto-queries the database) or general questions (answered directly).",
    examples=EXAMPLE_QUERIES,
    chatbot=gr.Chatbot(height=650),
)

app.launch(
    server_name="0.0.0.0",
    server_port=7861,
    share=False,
    show_error=True,
    theme=gr.themes.Soft(primary_hue="slate", neutral_hue="slate"),
    css=CUSTOM_CSS,
)

* Running on local URL:  http://0.0.0.0:7861
* To create a public link, set `share=True` in `launch()`.




In [11]:
app.close()

Closing server running on port: 7861
