## Objective

This notebook explains how an Agentic AI system executes step by step.

We will explicitly see:
- How policy is defined
- How policy is injected into prompts using `partial`
- How user input is applied later
- How feedback changes policy
- How the updated policy changes behavior


In [1]:
from dotenv import load_dotenv
import os
# Load .env (safe to call even if another cell calls it)
load_dotenv()

# main_safe.py
from policy import ResponsePolicy
from prompt import build_prompt
from llm import llm, evaluator_llm
from langchain.prompts import ChatPromptTemplate
import json

In [2]:
# Step 1: Initialize Policy

policy = ResponsePolicy()
print("Initial policy:", policy)



Initial policy: Policy(verbosity=medium, tone=neutral)


In [3]:
# Build prompt (LangChain ChatPromptTemplate.partial used elsewhere)
prompt = build_prompt(policy)
prompt

ChatPromptTemplate(input_variables=['input'], input_types={}, partial_variables={'verbosity': 'medium', 'tone': 'neutral'}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['tone', 'verbosity'], input_types={}, partial_variables={}, template='You are a helpful assistant.\n\nResponse rules:\n- Verbosity: {verbosity}\n- Tone: {tone}\n\nFollow these strictly.\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

In [4]:
chain = prompt | llm
resp = chain.invoke("Explain Reinforcement Learning")

In [5]:
agent_text = f"[FALLBACK] Verbosity={policy.verbosity}, Tone={policy.tone} -- RL is learning from rewards."
agent_text

'[FALLBACK] Verbosity=medium, Tone=neutral -- RL is learning from rewards.'

In [6]:
agent_text = resp.content
agent_text

"Reinforcement Learning (RL) is a subfield of machine learning where an agent learns to make decisions by interacting with an environment. The agent takes actions in the environment and receives feedback in the form of rewards or penalties, which it uses to learn optimal behaviors over time.\n\nThe key components of reinforcement learning include:\n\n1. **Agent**: The learner or decision maker that takes actions to achieve a goal.\n2. **Environment**: The external system with which the agent interacts. The environment responds to the agent's actions.\n3. **State**: A representation of the current situation in the environment, which the agent uses to decide on actions.\n4. **Action**: A decision made by the agent that affects the state of the environment.\n5. **Reward**: A numerical feedback signal received after taking an action, indicating the success of that action in achieving the desired goal.\n\nThe learning process typically involves exploring different actions to discover their 

In [13]:
feedback = "This is too long. I just want crisp bullet points."

In [20]:
from feedback_interpreter import interpret_feedback

In [22]:
interpret_feedback(feedback,llm)

{'overall_sentiment': 'negative',
 'dimensions': {'verbosity': 'decrease', 'tone': 'no_change'}}

In [14]:
evaluator_prompt = ChatPromptTemplate.from_messages([
    ("system",
     """You are a policy evaluator.
Return valid JSON with allowed fields:
- verbosity: short | medium | detailed
- tone: neutral | friendly | formal
Return ONLY JSON."""),
    ("human", "{feedback}")
])

evaluator_chain = evaluator_prompt | evaluator_llm

In [15]:
evaluator_chain

ChatPromptTemplate(input_variables=['feedback'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a policy evaluator.\nReturn valid JSON with allowed fields:\n- verbosity: short | medium | detailed\n- tone: neutral | friendly | formal\nReturn ONLY JSON.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['feedback'], input_types={}, partial_variables={}, template='{feedback}'), additional_kwargs={})])
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x13679f5f0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x13679fce0>, root_client=<openai.OpenAI object at 0x135a65490>, root_async_client=<openai.AsyncOpenAI object at 0x13679f650>, model_name='gpt-4o-mini', model_kwargs={}, openai_api_key=SecretStr('**********'), max_tokens=150)

In [16]:
eval_resp = evaluator_chain.invoke({"feedback": feedback})
delta = json.loads(eval_resp.content)

In [13]:
delta

{'verbosity': 'short', 'tone': 'neutral'}

In [17]:
print("Policy delta:", delta)
policy.update(delta)
print("Updated policy:", policy)

Policy delta: {'verbosity': 'short', 'tone': 'neutral'}
Updated policy: Policy(verbosity=short, tone=neutral)


In [18]:
# Next interaction (shows adapted behavior)
prompt = build_prompt(policy)
chain = prompt | llm
resp = chain.invoke("Explain Reinforcement Learning")

In [19]:
print(resp.content)

Reinforcement Learning (RL) is a machine learning paradigm where an agent learns to make decisions by taking actions in an environment to maximize cumulative rewards. The agent receives feedback in the form of rewards or penalties based on its actions and uses this feedback to improve its future decisions. Key components include:

1. **Agent**: The learner or decision maker.
2. **Environment**: The context in which the agent operates.
3. **Actions**: Choices made by the agent.
4. **States**: The current situation of the agent in the environment.
5. **Rewards**: Feedback received after taking actions, guiding learning.

The agent typically uses algorithms like Q-learning or policy gradients to learn optimal strategies over time.
