# Developing with OpenAI: AIM Edition

## Exploring LLM Prompting Strategies for Economic Reasoning  
### *Inflation & Interest Rate Case Study*

This notebook investigates how different prompting strategies (zero-shot, few-shot, reasoning vs non-reasoning models) affect the ability of large language models (LLMs) to reason about inflation, interest rates, and overall market dynamics.  

We also retain all the previous instructional structure and code scaffolding to maintain a complete, comprehensive educational example.

## 1. Getting Started

The first thing we'll do is load the [OpenAI Python Library](https://github.com/openai/openai-python/tree/main)!

In [1]:
# Used for Google Colab
#!pip install openai -q


## Discussion and Problem Framing

We aim to answer:  
> *"What is the best prompting approach and model type to understand how the market is performing today?"*  

### Types of LLM Tasks Involved

| Type | Description | Example Output |
|------|--------------|----------------|
| **Retrieval** | Factual recall | ‚ÄúInflation in 2025 is around 3.1% in the U.S.‚Äù |
| **Reasoning** | Logical chain between variables | ‚ÄúHigher inflation led the Fed to raise rates ‚Üí borrowing costs rose ‚Üí slower GDP.‚Äù |
| **Generation** | Narrative creation / summary | ‚ÄúThe market shows cooling signals despite moderate inflation‚Ä¶‚Äù |

Each prompt and model will be evaluated on reasoning depth, factual correctness, and structure quality.


### Used models in this repo

| Rank | Model Name | Primary Purpose | OpenAI's Official Claim |
|------|------------|-----------------|------------------------|
| 1 | **GPT-5** | Advanced reasoning for complex economic analysis | Uses a dynamic router that chooses between quick responses and deeper 'thinking' when needed; performs at PhD-level across domains |
| 2 | **GPT-4.1** | Enhanced coding and long-context comprehension | Offers significant advancements in coding capabilities, long context comprehension (up to 1M tokens), and instruction following |
| 3 | **GPT-4-turbo** | General-purpose non-reasoning model for structured responses | Improved version of GPT-4 with enhanced performance, lower latency, and updated knowledge cutoff |
| 4 | **GPT-4o-mini** | Fast, efficient model for quick responses | Cost-efficient AI model designed to make advanced AI technology more affordable and accessible |


## 2. Setting Environment Variables

As we'll frequently use various endpoints and APIs hosted by others - we'll need to handle our "secrets" or API keys very often.

We'll use the following pattern throughout this bootcamp - but you can use whichever method you're most familiar with.

In [2]:
# For Google Colab
# import os
# import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key")

In [3]:
# For local development
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


## 3. Using the OpenAI Python Library

Let's jump right into it!

> NOTE: You can, and should, reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/authentication?lang=python) whenever you get stuck, have questions, or want to dive deeper.

### Creating a Client

The core feature of the OpenAI Python Library is the `OpenAI()` client. It's how we're going to interact with OpenAI's models, and under the hood of a lot what we'll touch on throughout this course.

> NOTE: We could manually provide our API key here, but we're going to instead rely on the fact that we put our API key into the `OPENAI_API_KEY` environment variable!

In [4]:
from openai import OpenAI

client = OpenAI()

### Using the Client

Now that we have our client - we're going to use the `.chat.completions.create` method to interact with the model.

There's a few things we'll get out of the way first, however, the first being the idea of "roles".

First it's important to understand the object that we're going to use to interact with the endpoint. It expects us to send an array of objects of the following format:

```python
{"role" : "ROLE", "content" : "YOUR CONTENT HERE", "name" : "THIS IS OPTIONAL"}
```

Second, there are three "roles" available to use to populate the `"role"` key:

- `system`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://help.openai.com/en/articles/7042661-moving-from-completions-to-chat-completions-in-the-openai-api).

We'll explore these roles in more depth as they come up - but for now we're going to just stick with the basic role `user`. The `user` role is, as it would seem, the user!

Thirdly, it expects us to specify a model!

We'll use the `gpt-5-mini` model as stated above.

Let's look at an example!



In [5]:
response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

Let's look at the response object.

In [6]:
response

ChatCompletion(id='chatcmpl-Ca8NKpezZMKSlyKLzKhCsTPgbUrqy', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hi there! How can I help you today? (You can ask a question, request help with writing or code, get a summary, or anything else.)', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762728038, model='gpt-5-mini-2025-08-07', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=169, prompt_tokens=8, total_tokens=177, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=128, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

In [7]:
print(response.choices[0].message.content)

Hi there! How can I help you today? (You can ask a question, request help with writing or code, get a summary, or anything else.)


>NOTE: We'll spend more time exploring these outputs later on, but for now - just know that we have access to a tonne of powerful information!

### System Role

Now we can extend our prompts to include a system prompt.

The basic idea behind a system prompt is that it can be used to encourage the behaviour of the LLM, without being something that is directly responded to - let's see it in action!

In the newest OpenAI API, the **system message** still defines the model‚Äôs behavior.  
Sometimes it is referred to as an *instruction block*.

Example system prompt for our economics case:

In [8]:
system_prompt = """
You are an experienced economic analyst explaining how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences. 
"""
print(system_prompt)

user_prompt = "What is the relationship between inflation and interest rates?"
print(user_prompt)

list_of_prompts = [

    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

irate_response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts
)

print(irate_response.choices[0].message.content)


You are an experienced economic analyst explaining how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences. 

What is the relationship between inflation and interest rates?
Inflation and interest rates are closely related economic variables where the central bank, such as the Federal Reserve in the United States, often adjusts interest rates to control inflation. Generally, when inflation is high, central banks may increase interest rates to cool down the economy by making borrowing more expensive, which can reduce spending and slow the rate of inflation. Conversely, if inflation is low, the central bank might lower interest rates to encourage borrowing and spending, thereby stimulating economic activity. As of 2025, if the U.S. experiences higher inflation rates, it is likely that the Federal Reserve would implement higher interest rates as a countermeasure to stabilize the purchasing power of the dollar and 

As you can see - the response we get back is very much in line with the system prompt!

Let's try the same user prompt, but with a different system to prompt to see the difference.

In [9]:
system_prompt = """
You are a cool and fun elementary teacher explaining to 6-year olds how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences.
"""
print(system_prompt)

user_prompt = "What is the relationship between inflation and interest rates?"
print(user_prompt)

list_of_prompts = [

    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

irate_response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts
)

print(irate_response.choices[0].message.content)


You are a cool and fun elementary teacher explaining to 6-year olds how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences.

What is the relationship between inflation and interest rates?
Imagine you have a piggy bank where you save money to buy toys. If things at the store start to cost more than before‚Äîlike your favorite toy car‚Äîit's like inflation, where prices go up. Now, the people who decide on how much money everyone should be able to spend try to control this by changing interest rates, which is like deciding how much it costs to borrow money from your friend. If they think things are getting too expensive, they might make borrowing money cost more, hoping that fewer people will buy stuff and prices won't rise too much. So, if in 2025 things are getting more expensive, these people might raise interest rates to help keep everything affordable for everyone.


With a simple modification of the system prompt - you can see that we got completely different behaviour, and that's the main goal of prompt engineering as a whole.

Also, congrats, you just engineered your first prompt!

### Few-shot Prompting

Now that we have a basic handle on the `system` role and the `user` role - let's examine what we might use the `assistant` role for.

The most common usage pattern is to "pretend" that we're answering our own questions. This helps us further guide the model toward our desired behaviour. While this is a over simplification - it's conceptually well aligned with few-shot learning.

In [10]:
# Zero-shot prompt
prompt_zero = "Explain how inflation affects interest rate decisions."
list_of_prompts = [
    {"role": "user", "content": prompt_zero}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=list_of_prompts
)

print('zero-shot response:', response.choices[0].message.content)

zero-shot response: Inflation significantly influences interest rate decisions made by central banks and financial institutions. Here‚Äôs how it works:

1. **Inflation Expectations**: When inflation is expected to rise, central banks may increase interest rates to curtail spending and borrowing. Higher rates make loans more expensive and encourage saving, which can help control inflation by reducing demand in the economy.

2. **Cost of Borrowing**: If inflation is high, lenders demand higher interest rates to compensate for the declining purchasing power of money over time. This is because the real interest rate (nominal interest rate minus inflation) affects the returns on loans. If inflation is higher than the nominal rate, lenders effectively lose money.

3. **Central Bank Mandates**: Many central banks have an inflation target (usually around 2% for many developed economies). When inflation rises above this target, the central bank may adjust interest rates upwards to restore price

In [11]:
# Few-shot prompt template

question = "Explain how inflation affects interest rate decisions."

few_shot_prompt = f"""
Example 1:
Q: The price of pizza slices jumps from $2 to $4. What might the central bank do?
A: They turn down the oven heat üçïüî• ‚Äî raise interest rates so people buy fewer slices and cool off the price party.

Example 2:
Q: Interest rates drop and borrowing gets cheaper. What happens at Snack City?
A: Everyone's grabbing extra fries and milkshakes üçüü•§‚Äî cheap credit means more spending, which can make prices rise again.

Now answer:
Q: {question}
"""

list_of_prompts = [
    {"role": "user", "content": few_shot_prompt}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=list_of_prompts
)

print('few-shot response:', response.choices[0].message.content)

few-shot response: A: When inflation rises, the central bank tightens the purse strings üí∏‚úã‚Äî they often raise interest rates to curb spending and cool off the economy. Higher rates mean loans cost more, making people think twice before splurging, which helps slow down the inflation spiral. Conversely, if inflation is low, they might lower rates to encourage borrowing and spending üç•üíµ, revving up the economy.


### Helper functions

We're going to create some helper functions to aid in using the OpenAI API - just to make our lives a bit easier.

> NOTE: Take some time to understand these functions between class!

In [34]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: list, model: str = "gpt-4o-mini") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "system", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

Different way we can do prompting -> using the helper's functions

In [13]:
# Now, show the economic example with both user and assistant prompts
few_shot_prompts = [
    user_prompt("Inflation rises fast. How does the central bank react ‚Äî dating analogy please!"),
    assistant_prompt("They play hard to get ‚Äî raise rates ‚Äî to cool off the economy's over-eager spending habits."),

    user_prompt("What happens when interest rates are too low for too long?"),
    assistant_prompt("Everyone gets too comfortable ‚Äî too many relationships (loans) form, and eventually hearts (bubbles) break."),

    user_prompt("Explain deflation using a dating metaphor."),
    assistant_prompt("No one's asking anyone out ‚Äî everyone waits for a better deal, so the economy gets lonely and quiet."),
    # üëá Here's the actual question we want the model to answer
    user_prompt("Describe quantitative easing")
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=few_shot_prompts
)

print(response.choices[0].message.content)


Quantitative easing is like the central bank playing matchmaker, bringing people together by flooding the dating scene with new suitors (money). They buy up lots of those potential partners (assets) to encourage everyone to mingle more, hoping to spark romance (spending and investment) and boost the overall mood (economic growth).


### üèóÔ∏è Activity #1:
Mission:
Experiment with how different prompt structures, system, user, and assistant, plus zero-shot and few-shot prompting, can transform an AI‚Äôs response.
Your goal: craft the most effective prompt and see how GPT-4-Turbo reacts!

You‚Äôll test how GPT-4-Turbo behaves under four different setups:
1. System/User roles only (Zero-shot)
2. System/User roles + examples (Few-shot)
3. No system role at all (User only)
4. Creative system prompt twist



In [35]:
activity_question = "What is the primary impact of high bond yields on long-term corporate investment?"
# Define the two examples for the few-shot prompt
example_user_1 = "If inflation is 10% and my savings interest is 5%, what's happening to my money?"
example_assistant_1 = "Your purchasing power is shrinking by 5%‚Äîyou're losing money in real terms because inflation is eating away at your interest gains."

example_user_2 = "What does a high Price-to-Earnings (P/E) ratio suggest about a stock?"
example_assistant_2 = "It suggests investors have high expectations for future earnings growth, or the stock is overvalued."

system_instruction = "You are a senior financial analyst providing concise, executive-level summaries. Your answers must not exceed two sentences."


In [36]:
# Setup 1: System/User roles only (Zero-shot)


list_of_prompts = [
    {"role": "system", "content": system_instruction},
    {"role": "user", "content": activity_question}
]

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts
)
print("--- 1. System/User Only (Zero-shot) ---")
pretty_print(response)

--- 1. System/User Only (Zero-shot) ---


High bond yields typically increase borrowing costs for corporations, making it more expensive to finance long-term investments. This generally results in reduced capital spending and slower expansion for businesses due to higher interest expenses.

In [37]:
# Setup 2: System/User roles + Examples (Few-shot)

list_of_prompts_2 = [
    system_prompt(system_instruction),
    # Example 1
    user_prompt(example_user_1),
    assistant_prompt(example_assistant_1),
    # Example 2
    user_prompt(example_user_2),
    assistant_prompt(example_assistant_2),
    # The actual question
    user_prompt(activity_question)
]

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts_2
)
print("--- 2. System/User + Examples (Few-shot) ---")
pretty_print(response)

--- 2. System/User + Examples (Few-shot) ---


High bond yields increase borrowing costs for companies, potentially leading them to delay or reduce long-term investments due to higher financing expenses.

In [38]:
# Setup 3: No system role at all (User only)

list_of_prompts_3 = [
    user_prompt(activity_question)
]

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts_3
)

print("--- 3. User Role Only ---")
pretty_print(response)



--- 3. User Role Only ---


The primary impact of high bond yields on long-term corporate investment is that it increases the cost of borrowing for companies. Higher bond yields make it more expensive for corporations to issue new debt in the form of bonds. Here are some of the key ways in which this can affect long-term corporate investment:

1. **Increased Financing Costs**: When bond yields are high, the interest rates on new bonds also increase. This leads to higher interest expenses for companies issuing bonds to finance long-term investments such as infrastructure projects, technology upgrades, or expansion into new markets.

2. **Reduced Capital Expenditure**: As borrowing costs rise, companies may reconsider or scale down planned capital expenditures. High interest expenses can reduce the overall return on investment for new projects, making them less financially viable, particularly if these projects are capital-intensive.

3. **Impact on Profitability**: Higher interest costs can squeeze corporate profits, especially for companies that rely heavily on debt financing. This reduction in profitability can also affect the amount of internal funding available for reinvestment in long-term projects.

4. **Altered Corporate Strategies**: Companies might shift their strategies in response to higher bond yields by delaying or canceling long-term investment projects. Alternatively, they may seek financing through other means such as equity funding, which might not be preferable if it leads to dilution of existing shareholders' equity.

5. **Influence on Stock Prices and Investor Sentiment**: High bond yields can lead to reduced investor confidence in stocks, particularly for highly leveraged companies. This can depress stock prices, making it more difficult and expensive for companies to raise capital through equity markets.

6. **Sector-Specific Effects**: The impact of high bond yields can vary by sector. Industries that require significant upfront capital investment (like utilities or telecommunications) might be disproportionately affected due to their reliance on debt financing.

7. **Overall Economic Influence**: At a macroeconomic level, high bond yields can lead to lower levels of corporate investment across the economy. This can slow economic growth, as investment is a key component of GDP.

In summary, high bond yields can have multiple adverse effects on long-term corporate investment, ranging from increased borrowing costs and reduced project viability to potential shifts in overall corporate strategy and economic growth dynamics.

In [39]:
# Setup 4: Creative system prompt twist

system_instruction_4 = " You are a medical doctor. You don't actually know anything about finance. Your goal is to make up a silly answer to any finance related questions, the problem is, you don't even really know the subject enough to make up something silly. Your responses should be short and direct. You are a busy person!"

list_of_prompts_4 = [
    system_prompt(system_instruction_4),
    user_prompt(activity_question)
]

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts_4
)

print("--- 4. Creative System Prompt Twist ---")
pretty_print(response)

--- 4. Creative System Prompt Twist ---


High bond yields? Oh, they probably make corporations feel more energetic, like a good vitamin boost! More energy, more investment, I guess!

### Chain of Thought Prompting

We'll head one level deeper and explore the world of Chain of Thought prompting (CoT).

This is a process by which we can encourage the LLM to handle slightly more complex tasks.

Let's look at a simple reasoning based example without CoT.

In [14]:
reasoning_problem = """
The central bank increases the policy rate by 1.5 pp in response to 5 % inflation while nominal wage growth is 3 %.
What happens to real wages?
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

To understand what happens to real wages when the central bank increases the policy rate, we need to consider the concepts of nominal wages, inflation, and real wages.

1. **Nominal wages** are the wages that workers receive in current dollars, which in this case is growing at 3%.

2. **Inflation** measures how much prices for goods and services are rising, which is at 5% in this scenario.

3. **Real wages** adjust nominal wages for inflation. The formula to calculate real wages is:

   \[
   \text{Real Wages} = \frac{\text{Nominal Wages}}{(1 + \text{Inflation Rate})}
   \]

In this specific scenario:

- **Nominal wage growth** = 3% (0.03)
- **Inflation rate** = 5% (0.05)

To see how real wages are affected:

1. First, calculate the nominal wage after growth:
   - If we assume an initial nominal wage of 100 (for simplicity), after 3% growth, nominal wages would become 100 √ó (1 + 0.03) = 103.

2. Now consider the effects of 5% inflation:
   - The effective price level after inflation would be 100 √ó (1 + 0.05) = 105.

3. Finally, calculate the real wages:
   - Real wages = Nominal wages / Price level = 103 / 105 ‚âà 0.98095 (or about 98.09% of the original purchasing power).

This indicates that real wages have decreased because nominal wage growth (3%) is less than the inflation rate (5%). In practical terms, this means that even though workers are earning more in nominal terms, their purchasing power is declining due to higher inflation. 

**Conclusion:** Real wages are effectively decreasing because nominal wage growth (3%) is less than inflation (5%). Workers are losing purchasing power. The real change can be represented as a decrease in real wages by roughly 1.1% (since 100 - 98.09 ‚âà 1.1).

Let's see if we can leverage a simple CoT prompt to improve our model's performance on this task:

In [15]:
list_of_prompts = [
    user_prompt(reasoning_problem + "Think step-by-step about how nominal wages, prices, and interest rates interact through the labor market and aggregate demand. Then explain the real wage effect.")
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

To analyze the impact on real wages following a central bank's increase in the policy rate, we need to understand how nominal wages, inflation, and interest rates interact.

### Step 1: Definitions
- **Nominal wages**: The monetary compensation workers receive, not adjusted for inflation.
- **Real wages**: The purchasing power of nominal wages; it is adjusted for inflation. The formula is:
  \[
  \text{Real Wage} = \frac{\text{Nominal Wage}}{1 + \text{Inflation Rate}}
  \]
- **Policy rate**: The interest rate set by the central bank to influence economic activity, which affects borrowing costs and consumer spending.

### Step 2: Current Situation
- **Inflation Rate**: 5% (0.05 in decimal)
- **Nominal Wage Growth**: 3% (0.03 in decimal)
- **Increase in Policy Rate**: 1.5 percentage points

### Step 3: Expected Impact of Higher Policy Rate
By raising the policy rate, the central bank aims to reduce inflation. A higher interest rate typically leads to:
- Increased borrowing costs for both consumers and businesses.
- Reduced spending and investment, which can lead to decreased aggregate demand.
- Slower price increases (or even downward pressure on prices) as spending slows down.

### Step 4: Real Wage Calculation
When we look at real wages after the changes, we have:
- The nominal wage growth is at 3%, which means nominal wages are increasing by this percentage.
- The inflation rate, currently at 5%, affects the purchasing power of these wages.

To understand the effect on real wages, we need to look at how real wages respond to the changes:

1. **Initial Real Wages Calculation**: Assuming an initial nominal wage of 100, at 5% inflation:
   - Real Wage = \( \frac{100}{1 + 0.05} = \frac{100}{1.05} \approx 95.24 \)

2. **Updated Nominal Wage Post-3% Increase**:
   - New Nominal Wage = \( 100 \times (1 + 0.03) = 103 \)

3. **Recalculate Real Wages with 5% Inflation** (assuming no change in inflation immediately):
   - Real Wage = \( \frac{103}{1.05} \approx 98.10 \)

### Step 5: Real Wage Effect Analysis
Despite the increase in nominal wages by 3%, the inflation rate of 5% exceeds the growth in nominal wages. The real wages can be calculated and compared:

- Initially: \( 95.24 \)
- After nominal wage increase: \( 98.10 \)

In this case, real wages have increased from \( 95.24 \) to \( 98.10 \). However, this increase might not last if the central bank‚Äôs policy effectively lowers inflation in the future. If the higher interest rate successfully reduces inflation below the nominal wage growth rate, real wages could improve further.

### Conclusion
In summary, the initial effects of the policy rate increase and nominal wage growth suggest an increase in real wages temporarily. However, the longer-term impact will depend greatly on how the inflation rate evolves in response to the contractionary monetary policy. Should the inflation rate decrease as a result of the policy measures, real wages could improve further. If inflation stays above nominal wage growth, real wages will decline. Therefore, although nominal wages have risen, the implications for real wages depend significantly on the dynamic relationship with inflation.


## 3. Running Comparative Experiment

We'll test combinations of model type (reasoning vs non-reasoning) and prompting style (zero-shot vs few-shot).


In [16]:
# --------------------------------------------------
# üß© Comparing GPT Models: Reasoning vs Non-Reasoning
# --------------------------------------------------

from openai import OpenAI
client = OpenAI()

system_prompt = """
You are an experienced economic analyst.
"""

question = """What is the impact of inflation on real wages? Respond in a concise manner."""

prompt_few = f"""
Use this exact format to answer the question:
Example 1:
{{
  "possible_explanation": "Wage catch-up effect",
  "mechanism": "Workers negotiate higher nominal wages to preserve purchasing power as prices rise.",
  "impact_on_wages": "Nominal wages increase roughly in line with inflation, keeping real wages stable in the short run.",
  "time_frame": "Short to medium run",
  "economic_context": "Inflationary periods with strong labor bargaining power or cost-of-living adjustments."
}}

Example 2:
{{
  "possible_explanation": "Real wage erosion",
  "mechanism": "When nominal wages lag behind price growth, workers lose purchasing power.",
  "impact_on_wages": "Real wages decline despite nominal wage increases, reducing workers‚Äô living standards.",
  "time_frame": "Immediate term",
  "economic_context": "High inflation environments with weak wage indexation or rigid labor contracts."
}}

Now answer:
Q: {question}
"""


# --------------------------------------------------
# MODEL 1: GPT-4-turbo  ‚Üí Non-Reasoning
# --------------------------------------------------
print("\n==============================")
print("MODEL 1: GPT-4-turbo (Non-Reasoning)")
print("==============================\n")

# Zero-shot
answer_nonreasoning_zero_shot = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ],
)
print("Zero-Shot Prompting (no examples):\n")
print("A:", answer_nonreasoning_zero_shot.choices[0].message.content, "\n")



MODEL 1: GPT-4-turbo (Non-Reasoning)

Zero-Shot Prompting (no examples):

A: Inflation erodes the purchasing power of money, thereby diminishing the real value of wages. If nominal wages do not increase at a rate that matches or exceeds the rate of inflation, workers experience a decrease in their real incomes, meaning they can afford fewer goods and services than before. This effectively reduces their economic well-being and standard of living if inflation is not adequately compensated by wage growth. 



In [17]:
# --------------------------------------------------
# MODEL 2: GPT-5  ‚Üí Reasoning
# --------------------------------------------------
print("\n==============================")
print("MODEL 2: GPT-5 (Reasoning-Tuned)")
print("==============================\n")

# Zero-shot
answer_reasoning_zero_shot = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ],
)
print("Zero-Shot Prompting (no examples):\n")
print("A:", answer_reasoning_zero_shot.choices[0].message.content, "\n")



MODEL 2: GPT-5 (Reasoning-Tuned)

Zero-Shot Prompting (no examples):

A: - Real wages equal nominal wages adjusted for prices; real wage growth ‚âà nominal wage growth ‚àí inflation.
- If inflation outpaces nominal wage growth, purchasing power falls.
- Because wages adjust with lags and are often rigid, inflation spikes typically reduce real wages in the short run; catch-up bargaining or indexation can later restore them.
- Unexpected inflation shifts income from workers to firms/debtors; anticipated, indexed inflation has smaller effects on real wages.
- Low, stable inflation can ease relative wage adjustments; persistent high inflation erodes real wages. 



In [18]:
print("\n==============================")
print("MODEL 1: GPT-4-turbo (Non-Reasoning)")
print("==============================\n")

# Few-shot
answer_nonreasoning_few_shot = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_few}
    ],
)
print("Few-Shot Prompting (with examples):\n")
print("A:", answer_nonreasoning_few_shot.choices[0].message.content, "\n")


MODEL 1: GPT-4-turbo (Non-Reasoning)

Few-Shot Prompting (with examples):

A: {
  "possible_explanation": "Inflation's impact on real wages",
  "mechanism": "Inflation reduces the purchasing power of nominal wages if wage growth does not keep pace with rising prices.",
  "impact_on_wages": "Real wages decline if nominal wage increases are smaller than the inflation rate.",
  "time_frame": "Immediate to short term",
  "economic_context": "Periods of accelerating inflation without corresponding adjustments in nominal wages."
} 



In [19]:
print("\n==============================")
print("MODEL 2: GPT-5 (Reasoning-Tuned)")
print("==============================\n")

# Few-shot
answer_reasoning_few_shot = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_few}
    ],
)
print("Few-Shot Prompting (with examples):\n")
print("A:", answer_reasoning_few_shot.choices[0].message.content, "\n")


MODEL 2: GPT-5 (Reasoning-Tuned)

Few-Shot Prompting (with examples):

A: Example 1:
{
  "possible_explanation": "Real wage squeeze",
  "mechanism": "Prices rise faster than nominal pay.",
  "impact_on_wages": "Real wages fall, reducing purchasing power.",
  "time_frame": "Immediate to short run",
  "economic_context": "High inflation with weak bargaining power or limited indexation."
}

Example 2:
{
  "possible_explanation": "Wage catch-up",
  "mechanism": "COLAs and bargaining push pay increases alongside prices.",
  "impact_on_wages": "Real wages stabilize or slip only slightly.",
  "time_frame": "Short to medium run",
  "economic_context": "Inflation with tight labor markets or indexed contracts."
} 




## 4. Evaluation Framework

LLM as a judge


In [20]:
import json

# --------------------------------------------------
# ‚öñÔ∏è LLM-as-a-Judge Evaluation Script
# --------------------------------------------------

# Define evaluation scale (0‚Äì4)
# 0 = completely incorrect / irrelevant
# 1 = partially correct but weak or inaccurate reasoning
# 2 = fair factual accuracy, minimal reasoning
# 3 = accurate and somewhat reasoned
# 4 = highly accurate, clear causal explanation, correct logic

evaluation_prompt = f"""
You are an impartial economics teacher grading two student answers to the same question.

Question:
{question}

Answer A (non-reasoning model):
{answer_nonreasoning_few_shot.choices[0].message.content}

Answer B (reasoning model):
{answer_reasoning_few_shot.choices[0].message.content}

Evaluate both answers on accuracy and reasoning quality on a 0‚Äì4 scale:
- 0 = completely incorrect or irrelevant
- 1 = partially correct, but flawed
- 2 = fair factual accuracy, limited reasoning
- 3 = mostly correct, some reasoning
- 4 = fully accurate and clearly reasoned, ability to see the interdependencies between variables.

Return your evaluation as a JSON object in this exact format:
{{
  "Answer A Score": <0-4>,
  "Answer B Score": <0-4>,
  "Better Answer": "A" or "B",
  "Explanation": "Why the better answer is more accurate or reasoned"
}}
"""

# Choose a strong evaluator model (GPT-4.1 is good for judging)
evaluation = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": "You are an impartial LLM evaluator for economics-related answers."},
        {"role": "user", "content": evaluation_prompt}
    ],
)

# Parse and display the evaluation
response_text = evaluation.choices[0].message.content

# Optional: try to parse JSON for structured output
try:
    result = json.loads(response_text)
    print("\nParsed JSON Result:")
    print(json.dumps(result, indent=2))
except json.JSONDecodeError:
    print("\nNote: Could not parse JSON, model may have returned free text instead.")



Parsed JSON Result:
{
  "Answer A Score": 3,
  "Answer B Score": 4,
  "Better Answer": "B",
  "Explanation": "Answer A is accurate and concise about the basic mechanism (real wages fall when nominal wages lag inflation) but is limited in reasoning and misses alternative outcomes. Answer B gives the same core point plus plausible counterfactuals (indexation, COLAs, tight labor markets) and links outcomes to institutional and market conditions, showing the interdependencies between inflation, nominal wage setting, and bargaining\u2014hence it is more complete and better reasoned."
}


### üèóÔ∏è Activity #2:

Evaluate different prompting strategies using your own example.

In [None]:
# --------------------------------------------------
# üß© Comparing GPT Models: Reasoning vs Non-Reasoning
# --------------------------------------------------

from openai import OpenAI
client = OpenAI()

system_prompt = """
You are an experienced historian and journalist. You are an expert in the history of Scotland.
"""

question = """Who was the most controversial king in Scotlan's history?"""

prompt_few = f"""
Use this exact format to answer the question:
Example 1:
{{
  "Name": "Robert I (The Bruce)",
  "Reign": "1306 ‚Äì 1329",
  "Reason for Controversy": "Seized the throne after committing sacrilege by murdering his rival, John Comyn, in a church. He was excommunicated, making him a king condemned by the Church, despite achieving national independence.",
  "Cause of Death": "Natural causes (possibly leprosy or another chronic illness)."
}}

Example 2:
{{
  "Name": "Macbeth (Mac Bethad mac Findla√≠ch)",
  "Reign": "1040 ‚Äì 1057",
  "Reason for Controversy": "His historical reputation was permanently and falsely smeared by later pro-dynastic propaganda and William Shakespeare's portrayal of him as a tyrannical, power-mad usurper who unjustly murdered his way to the throne.",
  "Cause of Death": "Killed in battle by the forces of M√°el Coluim mac Donnchada (the future Malcolm III) at the Battle of Lumphanan."
}}

Now answer:
Q: {question}
"""


# --------------------------------------------------
# MODEL 1: GPT-4-turbo  ‚Üí Non-Reasoning
# --------------------------------------------------
print("\n==============================")
print("MODEL 1: GPT-4-turbo (Non-Reasoning)")
print("==============================\n")

# Zero-shot
answer_nonreasoning_zero_shot = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ],
)
print("Zero-Shot Prompting (no examples):\n")
print("A:", answer_nonreasoning_zero_shot.choices[0].message.content, "\n")

In [None]:
# --------------------------------------------------
# MODEL 2: GPT-5  ‚Üí Reasoning
# --------------------------------------------------
print("\n==============================")
print("MODEL 2: GPT-5 (Reasoning-Tuned)")
print("==============================\n")

# Zero-shot
answer_reasoning_zero_shot = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ],
)
print("Zero-Shot Prompting (no examples):\n")
print("A:", answer_reasoning_zero_shot.choices[0].message.content, "\n")

In [None]:
print("\n==============================")
print("MODEL 1: GPT-4-turbo (Non-Reasoning)")
print("==============================\n")

# Few-shot
answer_nonreasoning_few_shot = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_few}
    ],
)
print("Few-Shot Prompting (with examples):\n")
print("A:", answer_nonreasoning_few_shot.choices[0].message.content, "\n")

In [None]:
print("\n==============================")
print("MODEL 2: GPT-5 (Reasoning-Tuned)")
print("==============================\n")

# Few-shot
answer_reasoning_few_shot = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_few}
    ],
)
print("Few-Shot Prompting (with examples):\n")
print("A:", answer_reasoning_few_shot.choices[0].message.content, "\n")

In [None]:
import json

# --------------------------------------------------
# ‚öñÔ∏è LLM-as-a-Judge Evaluation Script
# --------------------------------------------------

# Define evaluation scale (0‚Äì4)
# 0 = completely incorrect / irrelevant
# 1 = partially correct but weak or inaccurate reasoning
# 2 = fair factual accuracy, minimal reasoning
# 3 = accurate and somewhat reasoned
# 4 = highly accurate, clear causal explanation, correct logic

evaluation_prompt = f"""
You are an impartial history teacher who teaches the history of Scotland. You are grading two student answers to the same question.

Question:
{question}

Answer A (non-reasoning model):
{answer_nonreasoning_few_shot.choices[0].message.content}

Answer B (reasoning model):
{answer_reasoning_few_shot.choices[0].message.content}

Evaluate both answers on accuracy and reasoning quality on a 0‚Äì4 scale:
- 0 = completely incorrect or irrelevant
- 1 = partially correct, but flawed
- 2 = fair factual accuracy, limited reasoning
- 3 = mostly correct, some reasoning
- 4 = fully accurate and clearly reasoned, ability to see the interdependencies between variables.

Return your evaluation as a JSON object in this exact format:
{{
  "Answer A Score": <0-4>,
  "Answer B Score": <0-4>,
  "Better Answer": "A" or "B",
  "Explanation": "Why the better answer is more accurate or reasoned"
}}
"""

# Choose a strong evaluator model (GPT-4.1 is good for judging)
evaluation = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": "You are an impartial LLM evaluator for economics-related answers."},
        {"role": "user", "content": evaluation_prompt}
    ],
)

# Parse and display the evaluation
response_text = evaluation.choices[0].message.content

# Optional: try to parse JSON for structured output
try:
    result = json.loads(response_text)
    print("\nParsed JSON Result:")
    print(json.dumps(result, indent=2))
except json.JSONDecodeError:
    print("\nNote: Could not parse JSON, model may have returned free text instead.")

## Saving results

In [21]:
# Create markdown content
markdown_content = f"""
# üß† Reasoning Model Answer
### Question:
How does inflation affect interest rates and the broader market?

### Model Used:
`gpt-4.1` (Reasoning-tuned)

### Response:
{answer_reasoning_few_shot.choices[0].message.content}

---

*This answer was generated by a reasoning model to illustrate step-by-step economic reasoning.*
"""

output_path='./results.md'
# Save to file
with open(output_path, "w", encoding="utf-8") as f:
    f.write(markdown_content)

print(f"‚úÖ Reasoning model answer saved to: {os.path.abspath(output_path)}")

‚úÖ Reasoning model answer saved to: /Users/andrewmacvean/Documents/AIO-amacvean/Session_01_LLM_APIs_&_AI-Assisted_Development/results.md


## Conclusion

- **Few-shot prompts** improve structure and reasoning consistency.  
- **Reasoning models** (like GPT-5-reasoning) deliver more coherent causal explanations between inflation, interest rates, and growth indicators.  
- **Non-reasoning models** (e.g., GPT-5-mini) provide faster, surface-level insights ideal for retrieval or summarization tasks.  
- Future work could add **RAG pipelines** with real-time macroeconomic data or integrate with financial dashboards for live LLM reasoning visualization.