## Objective

This notebook explains how an Agentic AI system executes step by step.

We will explicitly see:
- How policy is defined
- How policy is injected into prompts using `partial`
- How user input is applied later
- How feedback changes policy
- How the updated policy changes behavior


In [3]:
from dotenv import load_dotenv
import os
# Load .env (safe to call even if another cell calls it)
load_dotenv()

# main_safe.py
from policy import ResponsePolicy
from prompt import build_prompt
from llm import llm, evaluator_llm
from langchain.prompts import ChatPromptTemplate
import json

In [4]:
# Step 1: Initialize Policy

policy = ResponsePolicy()
print("Initial policy:", policy)



Initial policy: Policy(verbosity=medium, tone=neutral)


In [6]:
# Build prompt (LangChain ChatPromptTemplate.partial used elsewhere)
prompt = build_prompt(policy)
chain = prompt | llm
resp = chain.invoke("Explain Reinforcement Learning")

In [None]:
agent_text = f"[FALLBACK] Verbosity={policy.verbosity}, Tone={policy.tone} -- RL is learning from rewards."
agent_text

'[FALLBACK] Verbosity=medium, Tone=neutral ##-- RL is learning from rewards.'

In [8]:
agent_text = resp.content
agent_text

'Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. The goal of the agent is to maximize a cumulative reward through a process of trial and error. Here are the key components and concepts involved in reinforcement learning:\n\n1. **Agent and Environment**: The agent is the learner or decision-maker, while the environment is everything the agent interacts with. The agent takes actions in the environment and receives feedback in the form of rewards.\n\n2. **States and Actions**: The environment can be in various states, and the agent can take different actions. A state represents a specific configuration of the environment, while an action is a choice made by the agent that affects the state.\n\n3. **Reward**: After taking an action and transitioning to a new state, the agent receives a reward, which is a numerical value. This reward serves as feedback for the agent to evaluate the effectiveness of its act

In [9]:
feedback = "This is too long. I just want crisp bullet points."

In [10]:
evaluator_prompt = ChatPromptTemplate.from_messages([
    ("system",
     """You are a policy evaluator.
Return valid JSON with allowed fields:
- verbosity: short | medium | detailed
- tone: neutral | friendly | formal
Return ONLY JSON."""),
    ("human", "{feedback}")
])

evaluator_chain = evaluator_prompt | evaluator_llm

In [11]:
eval_resp = evaluator_chain.invoke({"feedback": feedback})
delta = json.loads(eval_resp.content)

In [13]:
delta

{'verbosity': 'short', 'tone': 'neutral'}

In [14]:
print("Policy delta:", delta)
policy.update(delta)
print("Updated policy:", policy)

Policy delta: {'verbosity': 'short', 'tone': 'neutral'}
Updated policy: Policy(verbosity=short, tone=neutral)


In [15]:
# Next interaction (shows adapted behavior)
prompt = build_prompt(policy)
chain = prompt | llm
resp = chain.invoke("Explain Reinforcement Learning")

In [16]:
print(resp.content)

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative rewards. The agent receives feedback in the form of rewards or penalties based on its actions. It uses this feedback to improve its strategy over time, typically through trial and error. Key concepts include states, actions, rewards, and policies. RL is widely used in robotics, game playing, and autonomous systems.
