            User Input
               ↓
            Policy (mutable, persisted)
               ↓
            Prompt Builder
               ↓
            LLM
               ↓
            Response
               ↓
            Human Feedback
               ↓
            Evaluator (LLM-as-judge)
               ↓
            Policy Update (runtime)




      Policy controls behavior (verbosity, tone, structure)

      Evaluator converts free-text feedback → structured policy changes

      Prompt Builder reads policy on every run

## Define a Runtime Policy (mutable behavior)

In [1]:
# policy.py
class ResponsePolicy:
    def __init__(self):
        self.verbosity = "medium"   # short | medium | detailed
        self.tone = "neutral"       # neutral | friendly | formal

    def update(self, updates: dict):
        for k, v in updates.items():
            setattr(self, k, v)

    def __repr__(self):
        return f"Policy(verbosity={self.verbosity}, tone={self.tone})"


In [2]:
rp = ResponsePolicy()

In [4]:
rp.update({"verbosity": "detailed", "tone": "friendly"})

# This is NOT a prompt.
# This is a control layer.

 * This is NOT a prompt.
 * This is a control layer.

## Prompt Builder (policy → system instruction)

In [None]:
# prompt_builder.py
def build_system_prompt(policy):
    return f"""
You are an assistant.
Response style:
- Verbosity: {policy.verbosity}
- Tone: {policy.tone}

Follow these strictly.
"""


## Simulated LLM Call

In [5]:
# llm.py
def call_llm(system_prompt, user_input):
    return f"""
[{system_prompt.strip()}]

Answering user query:
{user_input}

(pretend this is an LLM response)
"""


## Runtime Evaluator (feedback → policy delta)

In [6]:
# evaluator.py
def evaluate_feedback(feedback: str):
    """
    Convert human feedback into structured policy updates
    """

    feedback = feedback.lower()

    updates = {}

    if "too long" in feedback or "verbose" in feedback:
        updates["verbosity"] = "short"

    if "too short" in feedback:
        updates["verbosity"] = "detailed"

    if "too formal" in feedback:
        updates["tone"] = "friendly"

    if "too casual" in feedback:
        updates["tone"] = "formal"

    return updates


## Runtime Loop (this is adaptation)

In [None]:
# main.py
from policy import ResponsePolicy
from prompt_builder import build_system_prompt
from llm import call_llm
from evaluator import evaluate_feedback

policy = ResponsePolicy()

# ---- Interaction 1 ----
print("Initial policy:", policy)

system_prompt = build_system_prompt(policy)
response = call_llm(system_prompt, "Explain what Reinforcement Learning is")
print(response)

# ---- Human feedback at runtime ----
feedback = "This is too verbose, just give key points"
print("\nHuman feedback:", feedback)

updates = evaluate_feedback(feedback)
policy.update(updates)

print("Updated policy:", policy)

# ---- Interaction 2 (no redeploy, no retrain) ----
system_prompt = build_system_prompt(policy)
response = call_llm(system_prompt, "Explain what Reinforcement Learning is")
print(response)
