# Talk to your LLM

### Resources

Based on Module 2 of the Deep Learning course below (LangGraph)

* [A simple Python implementation of the ReAct pattern for LLMs](https://arc.net/l/quote/duflzttq)
  * Simon Willison Blog Article
* [Deep Learning AI Course, AI Agents with LangGraph](https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/1/introduction)

Credits to examples here:
* [OpenAI Documents](https://platform.openai.com/docs/overview)

### Assumptions

* Pythons setup (I created a 3.11 venv)
* OpenAI Key (Granite struggled but perhaps not a fair comparions ollama/grantite q4 8b v `gpt-4o`

In [1]:
import openai
import re
import httpx
import os
import rich
import json
from openai import OpenAI
from agents import Agent, ModelSettings, function_tool,Runner
from rich.pretty import pprint


In [2]:
# Boilerplate for swapping in Granite via ollama
#model = "granite3-dense:8b"
#model = "granite3.1-dense:2b"
#client = OpenAI(
#     base_url='http://localhost:11434/v1',
#     api_key='ollama',
# ) 

from dotenv import load_dotenv
load_dotenv()

model = "gpt-4o"
client = OpenAI() 

In [3]:
# Quick test code - verify LLM conenctivity etc (disable via Raw)

chat_completion = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Write a simple Python example class called User"}],
    temperature=0,
)
print(model)
print(f"{chat_completion.choices[0].message.content}")

gpt-4o
Certainly! Below is a simple example of a Python class called `User`. This class includes basic attributes like `username` and `email`, and a method to display user information.

```python
class User:
    def __init__(self, username, email):
        self.username = username
        self.email = email

    def display_user_info(self):
        print(f"Username: {self.username}")
        print(f"Email: {self.email}")

# Example usage:
if __name__ == "__main__":
    user1 = User("john_doe", "john@example.com")
    user1.display_user_info()
```

### Explanation:
- **`__init__` Method**: This is the constructor method that initializes a new instance of the `User` class with a `username` and `email`.
- **Attributes**: `username` and `email` are instance attributes that store the user's information.
- **`display_user_info` Method**: This method prints the user's username and email to the console.
- **Example Usage**: The `if __name__ == "__main__":` block is used to demonstrate how to c

# Chain of Thought

A Chain of Thought (CoT) prompt is a prompting technique used with large language models (LLMs) where you explicitly guide the model to reason step-by-step before arriving at the final answer.

Instead of asking the model to directly give you an answer, a CoT prompt encourages it to "think aloud"—breaking down the problem, analyzing it in stages, and only then giving a conclusion. This helps improve accuracy, especially for complex reasoning, math, logic, or multi-hop questions.

Can be combined with few-shot prompting (i.e., giving multiple CoT examples before the actual question)

In [4]:
# Static example to demonstrate format (few-shot learning)
prompt = """
You are a thoughtful and logical assistant. For every question, you will:
- Think step-by-step under a “Thought” section.
- Then write the final result under “Answer”.
- Always follow the structure shown below.

Use this format:
Question: <the question>
Thought: <your detailed reasoning>
Answer: <final answer>

Here are some examples:

Question: If a train leaves at 2 PM and takes 3 hours to reach its destination, what time does it arrive?
Thought: The train departs at 2 PM. If it travels for 3 hours, it will arrive at 2 + 3 = 5 PM.
Answer: 5 PM

Question: What is the capital of the country whose official language is French and borders Germany?
Thought: France is a country that borders Germany and has French as its official language. The capital of France is Paris.
Answer: Paris

Question: What is the sum of the first three even numbers?
Thought: The first three even numbers are 2, 4, and 6. Their sum is 2 + 4 + 6 = 12.
Answer: 12

Now answer the next question using the same format:
"""

In [5]:
question = "what is elevation range for the area that the eastern sector of colorado orogeny extends into?"
#question = "If Tom has 5 cookies and eats 2, how many does he have left?"

In [6]:
messages=[
        {"role": "system", "content": prompt},
        {"role": "user", "content": question}
    ]

In [7]:
completion = client.chat.completions.create(
    model=model,
    messages=messages,
)

print(completion.choices[0].message.content)

Question: What is the elevation range for the area that the eastern sector of Colorado Orogeny extends into?

Thought: The Colorado Orogeny refers to a series of geological events that formed much of the Rocky Mountains and other surrounding mountainous regions. The eastern sector of this orogeny would extend into parts of the Great Plains, which border the Rocky Mountains to the east. Elevations in the eastern sectors, particularly moving into the Great Plains, would range approximately from 1,000 meters (3,280 feet) at the higher foothills down to about 500 meters (1,640 feet) as you move eastward toward the interior of the United States. It is important to note that specific ranges might vary based on precise geological delineations and current topographical measurements.

Answer: Approximately 500 meters (1,640 feet) to 1,000 meters (3,280 feet).


# ReAct Style
1. Encourages explicit reasoning, not just end answers
1. Allows the model to interleave thoughts and actions
1. Great for use with tools or plugins (e.g., search, code exec, database)
1. Makes model behavior transparent and verifiable

## Template

1. Question: [user question]
1. Thought: [model's internal reasoning]
1. Action: [some action like Search(), Calculator(), API call]
1. Observation: [result of the action]

...repeat Thought → Action → Observation...

Answer: [final response to the question]

In [8]:
# Static example to demonstrate format (few-shot learning)
few_shot_example = (
    "Question: What is the capital of the country that borders Germany and has Vienna as its capital?\n"
    "Thought: I need to find which country has Vienna as its capital.\n"
    "Action:  Lookup('country with capital Vienna')\n"
    "Observation: Austria\n"
    "Thought: Now check if Austria borders Germany.\n"
    "Action:  Lookup('Does Austria border Germany?')\n"
    "Observation: Yes\n"    
    "Answer: The capital of Austria, which borders Germany, is Vienna."
    )


In [9]:
question = "what is elevation range for the area that the eastern sector of colorado orogeny extends into?"
#question = "If Tom has 5 cookies and eats 2, how many does he have left?"

In [10]:
# Combine few-shot example with dynamic question
prompt = f"""{few_shot_example}

Question: {question}
Thought:"""

Note - 
- In the last example for CoT, the system prompt got the examples.
- In this example, we keep pass the example data with each questions.
It is just 2 different styles to play with.

In [11]:
messages=[
        {"role": "system", "content": "Answer questions using a ReAct format: Thought → Action → Observation → Answer."},
        {"role": "user", "content": prompt}
    ]

In [12]:
completion = client.chat.completions.create(
    model=model,
    messages=messages,
)

print(completion.choices[0].message.content)

The Colorado Orogeny refers to a mountain-building event, and the eastern sector typically extends into parts of the Great Plains. I need to determine the elevation range of these areas.

Action: Lookup('elevation range Great Plains')
Observation: The elevation of the Great Plains generally ranges from about 500 to 2,000 meters (1,640 to 6,560 feet) above sea level as they extend eastward from the Rocky Mountains.

Answer: The eastern sector of the Colorado Orogeny extends into the Great Plains, where the elevation ranges from approximately 500 to 2,000 meters (1,640 to 6,560 feet) above sea level.


# Chain of Thought (CoT) vs ReAct (Reasoning + Acting)

## Chain of Thought (CoT)
What it is:
Just reasoning — step-by-step thoughts leading to an answer.


|Strengths|Limitations|
|---|---|
|Good for pure reasoning tasks (math, logic, factual multi-step questions).|Doesn’t interact with external tools or sources.|
|Easy to implement.|Limited when answers need fresh data, search, or database queries.|
|Transparent: you can see the reasoning path.||


## ReAct (Reasoning + Acting)
What it is:
A prompt style that combines reasoning (like CoT) with actions, such as calling tools, web searches, or internal functions. Often used with agents.


|Strengths|Limitations|
|---|---|
|Perfect for agent workflows, e.g., answering based on tool output, RAG systems, browsing, calling APIs.|More complex to implement (you need tool handlers or agents).|
|Enables decision-making with dynamic data.|Harder to debug if the chain gets too long or recursive.|
|You can plug in your own tools, like databases, vector search, etc.||





# ReAct Agent Prompt with real actions and Agents

We spoke that ReAct approach fits in well when some actions are involved. While we showed some examples of lookup actions, below, we get more real with actions.

Once again, we are setting up a system prompt for ReACT.
Notice, we talk about available actions here - which we did not do for CoTs as they were not applicable
And also give examples - as we did for CoTs

In [13]:
prompt = """
You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer
Use Thought to describe your thoughts about the question you have been asked.
Use Action to run through one of the actions available to you - then return PAUSE.
Observation will be the result of running those actions.

Your available actions are:

get_provision_status:
e.g. get_provision_status: guid
returns the status of a cloud deployment such as a virtual machine when gived a guid (globally unique identifier)

log_error:
e.g log_error: status
When a guid has a provision_status ERROR call this with the return value of get_provision_status

log_status:
e.g log_status: status
When a guid does not have an ERROR status call this with the return value of get_provision_status

Example session:

Question: What is the staus of cloud deployment with guid <guid>
Thought: I should look up the status with get_provision_status 
Action: get_provision_status: guid 
PAUSE:

You will be called again with this:

Observation: Guid status "SUCCESS: Completed"

You then call any necessary logging tools before outputing the status:

Answer: Guid status "SUCCESS: Completed"
""".strip()

### PAUSE

It is very important.
- After an Action (e.g., lookup, calculation, API call), the model stops (pause) to wait for the result.
- It does not hallucinate the result.
- It needs real information from the environment (tools, web, database, code, etc.).
- Only after the result (Observation) comes back does it continue reasoning.
- Without the pause, the model would guess the observation — which can lead to wrong or invented answers.


Here is a simple agent definition
Note - the roles: system, user, assistant

In the OpenAI SDK, when you're calling models like gpt-4, gpt-4o, or gpt-3.5-turbo using the chat/completions endpoint, the roles used are:

|Role | Purpose|
|---|---|
|system | Sets the behavior, identity, style, or rules of the assistant. (e.g., "You are a helpful assistant.")|
|user | Represents input/questions from the human user.|
|assistant | Represents the model's previous responses.|


- system: Optional, but powerful. Sets context at the start. You usually have one.
- user: Each time a person talks, you add a user message.
- assistant: Each time the model replies, its response is logged as an assistant message.




In [14]:
class Agent:
    def __init__(self, system=""):
        self.system = system
        self.messages = []
        if self.system:
            self.messages.append({"role": "system", "content": system})

    def __call__(self, message):
        self.messages.append({"role": "user", "content": message})
        result = self.execute()
        self.messages.append({"role": "assistant", "content": result})
        return result

    def execute(self):
        completion = client.chat.completions.create(
            model=model,
            temperature=0,
            messages=self.messages)
        return completion.choices[0].message.content

Next, we define the actions

In [15]:
import random

'''
First of the *fake* functions to test if the LLM/Prompt will ReAct correctly
taking different paths on different results
'''

def get_provision_status(guid):

    #// call to MCP Server AAP2 Controller
    # // foo = bar()
    
    status_messages = [
        "INFO: Initializing",
        "INFO: In progress",
        "ERROR: Failed",
        "ERROR: API Timeout",
        "ERROR: Rate Limited",
        "WARNING: Minor errors",
        "SUCCESS: Completed"
    ]
        # "INFO: Finalizing",
    # return random.choice(f"{guid} status: {status_messages}")
    status = random.choice(status_messages)
    return f"{guid} status: {status}"
    

In [16]:
def log_error(status):
    print(f"{status} Logged stateus to Slack.")
    print(f"{status} Opened Jira Ticket with Status.")
    return 0

def log_status(status):
    print(f"{status} Logged status to Slack.")
    return 0

In [17]:
known_actions = {
   "log_status": log_status,
   "log_error": log_error,
   "get_provision_status": get_provision_status,
}

Now we call the agent with to set up the system prompt

In [18]:
abot = Agent(prompt)
# Run the cell below with this line commented. 
# And then once again run it with this line uncommented
# See where the flow stops ?

#abot("I have a deployments running with guid: 1adr4 what is its status")

In [19]:
for i, message in  enumerate(abot.messages):
    if message["role"] != "assistant":
        print(f"Step {i}: App -> LLM:\n")
    else:
        print(f"Step {i}: App <- LLM:\n")
    print(f"Role: {message['role']}\nContent:\n\n{message['content']}\n\n\n")

Step 0: App -> LLM:

Role: system
Content:

You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer
Use Thought to describe your thoughts about the question you have been asked.
Use Action to run through one of the actions available to you - then return PAUSE.
Observation will be the result of running those actions.

Your available actions are:

get_provision_status:
e.g. get_provision_status: guid
returns the status of a cloud deployment such as a virtual machine when gived a guid (globally unique identifier)

log_error:
e.g log_error: status
When a guid has a provision_status ERROR call this with the return value of get_provision_status

log_status:
e.g log_status: status
When a guid does not have an ERROR status call this with the return value of get_provision_status

Example session:

Question: What is the staus of cloud deployment with guid <guid>
Thought: I should look up the status with get_provision_status 
Action: get_provision_sta

## Now Let's automate all this

In [20]:
action_re = re.compile("^Action: (\w+): (.*)$")

  action_re = re.compile("^Action: (\w+): (.*)$")


In [21]:
def query(question, max_turns=5):
    i = 0
    bot = Agent(prompt)
    next_prompt = question
    print("Step 0")
    while i < max_turns:
        i += 1
        result = bot(next_prompt)
        print(result)
        actions = [
            action_re.match(a)
            for a in result.split('\n')
            if action_re.match(a)
        ]
        if actions:
            print(f"\nStep {i}")
            # print(f"Actions:\n\n{actions}")
            action, action_inputs = actions[0].groups()
            if action not in known_actions:
                raise Exception(f"Unknown action: {action}: -- running {action} {action_inputs}")
            observation = known_actions[action](action_inputs)
            print(f"Observation: {observation}")
            next_prompt = f"Observation: {observation}"
        else:
            return
        

In [22]:
question = """
I have a deployments running with guid: 1adr4 what is its status
"""

query(question)

Step 0
Thought: I should look up the status of the deployment using the provided guid.
Action: get_provision_status: 1adr4 
PAUSE.

Step 1
Observation: 1adr4  status: ERROR: Rate Limited
Thought: Since the status of the deployment is "ERROR: Rate Limited", I should log this error.
Action: log_error: ERROR: Rate Limited
PAUSE.

Step 2
ERROR: Rate Limited Logged stateus to Slack.
ERROR: Rate Limited Opened Jira Ticket with Status.
Observation: 0
Answer: The status of the deployment with guid 1adr4 is "ERROR: Rate Limited".


In [23]:

question = """
I have deployments running with guids: 1adr4, aabf5, 45663, and 45ghb
First get each provision status and log to any services that need to know
Once finished with all deployments output their guids, status, and logging services:

* In a simple table
* As JSON 
"""

query(question, max_turns=10)

Step 0
Thought: I will start by checking the provision status of each deployment using their respective guids. I will then log the status accordingly and finally output the results in both a table and JSON format.

Action: get_provision_status: 1adr4
PAUSE

Step 1
Observation: 1adr4 status: INFO: In progress
Action: log_status: INFO: In progress
PAUSE

Step 2
INFO: In progress Logged status to Slack.
Observation: 0
Action: get_provision_status: aabf5
PAUSE

Step 3
PAUSE

Step 4
Observation: 0
Action: get_provision_status: 45663
PAUSE

Step 5
Observation: 45663 status: INFO: In progress
Action: log_status: INFO: In progress
PAUSE

Step 6
INFO: In progress Logged status to Slack.
Observation: 0
Action: get_provision_status: 45ghb
PAUSE

Step 7
Observation: 45ghb status: INFO: In progress
Action: log_status: INFO: In progress
PAUSE

Step 8
INFO: In progress Logged status to Slack.
Observation: 0
Answer:

Here is the status of each deployment in a table format:

| GUID  | Status           

# Reasoning
Capabilities are improving every day. The o3-mini accepts a reasoning_effort parameter and can reason without sophisticated prompt injections

In [24]:
model = "o3-mini"
prompt = """
Write a bash script that takes a matrix represented as a string with 
format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.
"""

response = client.chat.completions.create(
    model = model,
    reasoning_effort="medium",
    messages=[
        {
            "role": "user", 
            "content": prompt
        }
    ]
)

pprint(response)
print('-----------------')
print(response.choices[0].message.content)

-----------------
Below is one acceptable solution. Save this script (for example as transpose.sh), then run it with the matrix string as an argument.

#!/bin/bash
# This script takes a matrix represented as a string (e.g. "[1,2],[3,4],[5,6]")
# and prints its transpose in the same format.
#
# Usage: ./transpose.sh "[1,2],[3,4],[5,6]"

if [ "$#" -ne 1 ]; then
  echo "Usage: $0 \"[1,2],[3,4],[5,6]\""
  exit 1
fi

# Get the input string.
matrix_string="$1"

# Remove the outermost square brackets from the first and last row.
matrix_string="${matrix_string#\[}"
matrix_string="${matrix_string%\]}"

# The rows are separated by '],[' so we replace that with a semicolon for easier splitting.
matrix_string="${matrix_string//],[/;}"

# Split the rows into an array.
IFS=';' read -r -a rows <<< "$matrix_string"

num_rows=${#rows[@]}

# Assume that the first row gives the number of columns.
IFS=',' read -r -a first_row <<< "${rows[0]}"
num_cols=${#first_row[@]}

# We’ll use a 2D array (associative 