## Let's run our creative juices!
> *How do we improve on the agent's CoT reasoning and tool calling capabilities?*

##### Trial thought process as an informed reader of "How to win friends and influence people"
- Want to use principles in the book to effectively prepare myself for conversation situations, potential questions etc.
- Query: *Harry wants {} from Dumbledore. He knows {}, design a hypothetical scenario on how he should communicate his wants with principles from the resource.*
- Follow Introspective Agent's "Reflection AI agentic pattern" -> And re-design workflow. 

##### Implementation of simple reflective agent -> An upgrade from function calling agent
> *Instead of running straight to stop event, pass to reflective LLM to consolidate more information needed for answer (corrective step).*

> *Try this! After ToolCallEvent, create "reflection loop" to supplement answer with a relevant example.*

In [None]:
import os

os.environ["PHOENIX_API_KEY"] = "ADD YOUR PHOENIX API KEY"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "ADD YOUR PHOENIX HOSTNAME"

# If you created your Phoenix Cloud instance before June 24th, 2025,
# you also need to set the API key as a header
#os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}"

In [15]:
from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="facinating-things", # Default is 'default'
  auto_instrument=True, # See 'Trace all calls made to a library' below
)
tracer = tracer_provider.get_tracer(__name__)

DependencyConflict: requested: "openai-agents >= 0.1.0" but found: "None"


🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: facinating-things
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/s/fascinating-things/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {'authorization': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



In [17]:
from llama_index.core.llms import ChatMessage
from llama_index.core.tools import ToolSelection, ToolOutput
from llama_index.core.workflow import Event


class InputEvent(Event):
    input: list[ChatMessage]


class StreamEvent(Event):
    delta: str


class ToolCallEvent(Event):
    tool_calls: list[ToolSelection]


class FunctionOutputEvent(Event):
    output: ToolOutput

## Additional Event! 
class ReflectEvent(Event): 
    chat_history: list[ChatMessage]

In [36]:
from typing import Any, List

from llama_index.core.llms.function_calling import FunctionCallingLLM
from llama_index.core.chat_engine import SimpleChatEngine
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.tools.types import BaseTool
from llama_index.core.workflow import (
    Context,
    Workflow,
    StartEvent,
    StopEvent,
    step,
)
from llama_index.llms.openai import OpenAI


class FunctionCallingAgent(Workflow):
    def __init__(
        self,
        *args: Any,
        llm: FunctionCallingLLM | None = None,
        tools: List[BaseTool] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(*args, **kwargs)
        self.tools = tools or []

        self.llm = llm or OpenAI()
        assert self.llm.metadata.is_function_calling_model

        ## Define judge chat engine
        self.judge = SimpleChatEngine.from_defaults(llm=OpenAI(model="gpt-4o-mini"))

    @step
    async def prepare_chat_history(
        self, ctx: Context, ev: StartEvent
    ) -> InputEvent:
        # clear sources
        await ctx.store.set("sources", [])

        # check if memory is setup
        memory = await ctx.store.get("memory", default=None)
        if not memory:
            memory = ChatMemoryBuffer.from_defaults(llm=self.llm)

        # get user input
        user_input = ev.input
        user_msg = ChatMessage(role="user", content=user_input)
        memory.put(user_msg)

        # get chat history
        chat_history = memory.get()

        # update context
        await ctx.store.set("memory", memory)

        return InputEvent(input=chat_history)

    @step
    async def handle_llm_input(
        self, ctx: Context, ev: InputEvent
    ) -> ToolCallEvent | StopEvent:
        chat_history = ev.input

        # stream the response
        response_stream = await self.llm.astream_chat_with_tools(
            self.tools, chat_history=chat_history
        )
        async for response in response_stream:
            ctx.write_event_to_stream(StreamEvent(delta=response.delta or ""))

        # save the final response, which should have all content
        memory = await ctx.store.get("memory")
        memory.put(response.message)
        await ctx.store.set("memory", memory)

        # get tool calls
        tool_calls = self.llm.get_tool_calls_from_response(
            response, error_on_no_tool_call=False
        )

        if not tool_calls:
            sources = await ctx.store.get("sources", default=[])
            return StopEvent(
                result={"response": response, "sources": [*sources]}
            )
        else:
            return ToolCallEvent(tool_calls=tool_calls)

    @step
    async def handle_tool_calls(
        self, ctx: Context, ev: ToolCallEvent
    ) -> ReflectEvent:
        tool_calls = ev.tool_calls
        tools_by_name = {tool.metadata.get_name(): tool for tool in self.tools}

        tool_msgs = []
        sources = await ctx.store.get("sources", default=[])

        # call tools -- safely!
        for tool_call in tool_calls:
            tool = tools_by_name.get(tool_call.tool_name)
            additional_kwargs = {
                "tool_call_id": tool_call.tool_id,
                "name": tool.metadata.get_name(),
            }
            if not tool:
                tool_msgs.append(
                    ChatMessage(
                        role="tool",
                        content=f"Tool {tool_call.tool_name} does not exist",
                        additional_kwargs=additional_kwargs,
                    )
                )
                continue

            try:
                tool_output = tool(**tool_call.tool_kwargs)
                sources.append(tool_output)
                tool_msgs.append(
                    ChatMessage(
                        role="tool",
                        content=tool_output.content,
                        additional_kwargs=additional_kwargs,
                    )
                )
            except Exception as e:
                tool_msgs.append(
                    ChatMessage(
                        role="tool",
                        content=f"Encountered error in tool call: {e}",
                        additional_kwargs=additional_kwargs,
                    )
                )

        # update memory
        memory = await ctx.store.get("memory")
        for msg in tool_msgs:
            memory.put(msg)

        await ctx.store.set("sources", sources)
        await ctx.store.set("memory", memory)

        chat_history = memory.get()
        return ReflectEvent(chat_history=chat_history)

    @step ## Define 'reflection step'
    async def reflect(
        self, ctx: Context, ev: ReflectEvent
    ) -> InputEvent:
        chat_history = ev.chat_history
        response = await self.judge.achat(
            f"""
            Given the response to user query with tool calls: {chat_history}, determine if a context-specific example scenario is needed. 
            If yes, return an updated response with a context-specific example scenario. 
            If no, return the original response.
            """
        )
        # format response as chat message
        formatted_response = ChatMessage(role="assistant", content=str(response))

        # update memory
        memory = await ctx.store.get("memory")
        memory.put(formatted_response)
        await ctx.store.set("memory", memory)
        chat_history = memory.get()

        # Return chat history to InputEvent
        return InputEvent(input=chat_history)


In [None]:
@step ## Define 'reflection step'
async def reflect(
    self, ctx: Context, ev: ReflectEvent
) -> InputEvent:
    chat_history = ev.chat_history
    response = await self.judge.achat(
        f"""
        Given the response to user query with tool calls: {chat_history}, determine if a context-specific example scenario is needed. 
        If yes, return an updated response with a context-specific example scenario. 
        If no, return the original response.
        """
    )
    # format response as chat message
    formatted_response = ChatMessage(role="assistant", content=str(response))

    # update memory
    memory = await ctx.store.get("memory")
    memory.put(formatted_response)
    await ctx.store.set("memory", memory)
    chat_history = memory.get()

    # Return chat history to InputEvent
    return InputEvent(input=chat_history)

#### Draw workflow with pyviz

In [7]:
from llama_index.utils.workflow import draw_all_possible_flows
draw_all_possible_flows(FunctionCallingAgent, filename="functionagentv2_workflow.html")

<class 'NoneType'>
<class '__main__.ToolCallEvent'>
<class 'workflows.events.StopEvent'>
<class '__main__.ReflectEvent'>
<class '__main__.InputEvent'>
<class '__main__.InputEvent'>
functionagentv2_workflow.html


#### Run FunctionCallingAgent workflow with Query Engine Tool

In [8]:
# import dependencies
import qdrant_client
from IPython.display import Markdown, display
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings
from llama_index.core.chat_engine import SimpleChatEngine

  from .autonotebook import tqdm as notebook_tqdm


In [9]:
# load documents - "../data" folder
docs = SimpleDirectoryReader("../data/books").load_data(show_progress=True)

# build vector store index - Sync & Async
client = qdrant_client.QdrantClient(host='localhost', port=6333)
aclient = qdrant_client.AsyncQdrantClient(host="localhost", port=6333)

# Set LLM & embedding model
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0.1, max_tokens=1024, streaming=True)
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")

# Initialize Qdrant vector store
vector_store = QdrantVectorStore(
    "interview_notes",
    client=client, 
    aclient = aclient,
    enable_hybrid = True,
    fastembed_sparse_model="Qdrant/bm25"
    )

# Create storage context container
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Build vector store index -> Query Engine
index = VectorStoreIndex.from_documents(
    docs,
    storage_context=storage_context,
    embed_model=Settings.embed_model
)

Loading files: 100%|██████████| 1/1 [00:02<00:00,  2.23s/it]


In [12]:
# get example response - retrieve 2 sparse, 2 dense, and filter down to 3 total hybrid results
query_engine = index.as_query_engine(
    similarity_top_k=2, 
    sparse_top_k=2, 
    hybrid_top_k=3,
    vector_store_query_mode="hybrid", 
    llm=Settings.llm, 
    # use_async=True,
)

In [37]:
tools = [
    QueryEngineTool.from_defaults(
        query_engine=query_engine,
        name="Resource",
        description=("Provides information on fundamental techniques in handling people."
                     "Use a detailed plain text question as input to the tool."
        ),
    ),
]

book_agent = FunctionCallingAgent(
    tools=tools,
    llm = Settings.llm,
    timeout=120,
    verbose = False,
)

In [38]:
handler = book_agent.run(input="With principles from the given resource, how should Harry respond to Dumbledore if he found that the latter said something wrong?")

async for event in handler.stream_events():
    if isinstance(event, StreamEvent):
        print(event.delta, end="", flush=True)

response = await handler

Harry should approach the situation by expressing humility and openness. For example, if Dumbledore made a statement about the importance of following rules, Harry could respond by saying, "Well, I thought otherwise, but I may be wrong. I frequently am. If I am mistaken, I would like to understand better. Let’s examine the facts together." He might add, "I remember when we discussed the importance of standing up against unjust rules, like when we talked about the Triwizard Tournament. Could we look at that in light of what you just said?" This way, he invites a constructive dialogue without directly confronting Dumbledore, while also referencing a specific situation that illustrates his point.