# Reasoning usecases : Model comparison o1 vs gpt-4o

In [9]:
from pydantic import BaseModel, Field
from openai import AzureOpenAI
import os
from dotenv import load_dotenv
import json
import copy
import textwrap
import warnings
warnings.filterwarnings('ignore')
from IPython.display import display, Image, Markdown
# Load environment variables
load_dotenv("./.env")

client = AzureOpenAI(  
    api_version="2024-12-01-preview",  
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),  
    api_key=os.getenv("AZURE_OPENAI_API_KEY")  
)  

GPT_MODEL = 'gpt-40-globalStandard'
O1_MODEL = 'o1'

## Create



In [10]:
def get_chat_completion(model, prompt):
    """
    Calls the OpenAI API to get a chat completion.

    :param model: The model to use for the completion.
    :param prompt: The prompt to send to the model.
    :return: The completion response from the model.
    """
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

In [11]:
demo_prompt = """case_study_task = "<title>
Commercial Underwriting Analysis for TechEdge Manufacturing Co.: Balancing Risk and Compliance under State Regulations
</title>

<datasets>
**Dataset 1: Business Profile**
This dataset provides details about TechEdge Manufacturing Co., a small business seeking commercial insurance coverage. The dataset includes key information on the business profile.

| Field                         | Details                               |
|-------------------------------|---------------------------------------|
| Business Name                 | TechEdge Manufacturing Co.            |
| Industry                      | Electronics Manufacturing             |
| Years in Operation            | 12                                    |
| Annual Revenue ($)            | 8,500,000                             |
| Employee Count                | 120                                   |
| Coverage Requested            | Commercial Property & Liability       |
| Location                      | Texas                                 |
| Risk Management Measures      | Fire alarms, sprinkler system installed |

---

**Dataset 2: Claims History**
This table outlines the historical insurance claims filed by TechEdge Manufacturing Co. over the past 5 years. (Also visualized as a line graph showing claim amounts over time.)

| Date       | Claim Type           | Claim Amount ($) | Cause of Loss           | Resolution Status |
|------------|----------------------|------------------|-------------------------|-------------------|
| 2018-05-20 | Fire Damage          | 125,000          | Electrical Fault        | Settled           |
| 2019-08-15 | Equipment Breakdown  | 75,000           | Machinery Failure       | Settled           |
| 2020-11-10 | Property Damage      | 200,000          | Structural Issues       | Open              |
| 2021-03-05 | Theft                | 50,000           | Burglary                | Settled           |
| 2022-07-22 | Fire Damage          | 150,000          | Overheating Equipment   | Open              |

---

**Dataset 3: Risk Assessment Survey**
This dataset provides risk factor scores based on an on-site evaluation. (Visualized as a radar chart.)

| Risk Factor                   | Score (out of 10) |
|-------------------------------|-------------------|
| Fire Risk                     | 8                 |
| Flood Risk                    | 4                 |
| Equipment Malfunction Risk    | 7                 |
| Theft Risk                    | 6                 |
| Employee Safety               | 5                 |

---

**Dataset 4: Commercial Property Inspection Report**
An inspection report detailing key findings about the property including structural integrity and fire safety systems.

| Inspection Area        | Finding                                  | Rating (1-10) |
|------------------------|------------------------------------------|---------------|
| Structural Integrity   | Minor cracks; overall acceptable         | 7             |
| Fire Safety Systems    | Sprinkler system partially outdated      | 5             |
| Electrical Systems     | Inconsistent maintenance practices       | 6             |
| Building Maintenance   | Signs of wear; upgrades recommended        | 5             |

---

**Dataset 5: State Regulation Compliance Checklist**
This checklist evaluates the property’s compliance with current state safety and insurance regulations.

| Regulation Category    | Requirement                             | Compliance Status         |
|------------------------|-----------------------------------------|---------------------------|
| Fire Code              | Fully compliant with sprinkler requirements | Non-Compliant             |
| Electrical Safety      | Updated wiring and periodic inspections | Compliant                 |
| Structural Safety      | Seismic retrofitting as per code         | Conditionally Compliant   |
| Health & Safety        | Regular safety drills mandated           | Non-Compliant             |

---

**Dataset 6: Underwriting Guidelines Document** *DS*
A text document outlining standard underwriting procedures for commercial risks. (Presented as bullet points.)

- Base premium is determined by annual revenue and inherent risk rating.
- Adjustments are applied for historical claims frequency and severity.
- Additional loadings are imposed for non-compliance with state regulations.
- Discounts can be applied for proactive risk management measures already in place.

---

**Dataset 7: Financial Performance and Revenue Trends** *DS*
Monthly revenue data over the past 12 months, depicted as a line graph.

| Month | Revenue ($) |
|-------|-------------|
| Jan   | 700,000     |
| Feb   | 680,000     |
| Mar   | 720,000     |
| Apr   | 710,000     |
| May   | 690,000     |
| Jun   | 705,000     |
| Jul   | 715,000     |
| Aug   | 700,000     |
| Sep   | 690,000     |
| Oct   | 710,000     |
| Nov   | 720,000     |
| Dec   | 730,000     |

---

**Dataset 8: Email Transcript** *DS*
An email conversation between the business owner and an insurance broker discussing premium quotes and potential risk improvements.

"Subject: Insurance Premium Inquiry
Hi,
I am seeking a detailed quote for our property insurance. While we have installed fire alarms and a sprinkler system, we are aware that some systems still need updating. Please advise on how these factors might affect our premium.
Regards,
Owner – TechEdge Manufacturing Co."

---

**Dataset 9: Employee Satisfaction Survey Results** *DS*
Survey data regarding employee satisfaction levels. (Presented as a bar chart.)

| Department      | Satisfaction Score (out of 10) |
|-----------------|-------------------------------|
| Operations      | 7                             |
| Maintenance     | 6                             |
| Administration  | 8                             |

---

**Dataset 10: Social Media Marketing Performance** *DS*
Metrics depicting the company’s online engagement, shown as pie charts and trend lines.

| Metric                    | Value    |
|---------------------------|----------|
| Monthly Engagement Rate (%) | 12     |
| Social Media Reach          | 50,000 |
| Ad Spend ($)                | 15,000 |

</datasets>

<question>
As an insurance consultant, perform a comprehensive underwriting analysis for TechEdge Manufacturing Co. Based on the provided datasets, determine the appropriate premium rate adjustment for insuring their commercial property. Specifically:
1. Calculate the risk adjustment factor based on their historical claims frequency and severity (Dataset 2).
2. Assess the risk score derived from the on-site risk assessment (Dataset 3) and property inspection findings (Dataset 4).
3. Incorporate penalties due to non-compliance with state regulations (Dataset 5).
4. Recommend a final premium loading percentage above the base rate and propose risk mitigation strategies.
The final answer should clearly state the recommended premium loading percentage and justify your conclusion with detailed calculations referencing the datasets.
</question>

<instruction>
Please provide a detailed solution including all working calculations, intermediate steps, and references to the provided datasets. Begin with a brief executive summary outlining key findings and key numerical results. Use markdown formatting for headers, tables, and calculations. Ensure your final answer explicitly states the premium adjustment percentage and lists the recommended risk mitigation strategies.
</instruction>
"""

In [12]:
gpt_output = get_chat_completion(GPT_MODEL, demo_prompt)

In [13]:
print(gpt_output)

Below is a detailed analysis that combines the historical claims data, on‐site risk evaluation, inspection findings, and state compliance issues to determine the appropriate premium adjustment for TechEdge Manufacturing Co.

---

# Executive Summary

• Based on the historical claims data (Dataset 2), TechEdge has filed 5 claims over 5 years with an average severity of approximately $120,000 per claim. This suggests an elevated loss experience, leading to an estimated +35% loading on the base premium.  
• An evaluation of the risk survey (Dataset 3) shows high scores in key areas such as fire (8/10) and equipment risk (7/10) while the property inspection (Dataset 4) reveals only moderate ratings, with fire safety (5/10) and building maintenance (5/10) as concerns. Together, these assessments contribute an additional +10% loading.  
• The state compliance checklist (Dataset 5) shows non-compliance in critical areas (Fire Code and Health & Safety) and partial issues with Structural Safety

In [14]:
o1_output = get_chat_completion(O1_MODEL, demo_prompt)

In [15]:
print(o1_output)

Below is a detailed underwriting analysis for TechEdge Manufacturing Co. that walks through the calculations and findings from each dataset.

──────────────────────────────
Executive Summary
──────────────────────────────
• Based on the historical claims data, the company’s claims frequency and severity (with two open claims and multiple high‐value losses) suggest a notable risk that merits a +15% adjustment over the base premium.  
• The on‐site risk assessment (Dataset 3) and the property inspection report (Dataset 4) both indicate moderate-to-high concerns—especially in fire risk and electrical system maintenance—with an overall combined risk impact of about +10%.  
• Non-compliance with key state regulations as shown in Dataset 5 (fire code and health & safety, in particular) further increases exposure by another +10%.  
• Taken together, we recommend applying a total premium loading of 35% (15% + 10% + 10%) above the base rate.  
• Recommended risk mitigation strategies include sy

In [17]:
result = get_chat_completion(
    O1_MODEL,
    f"""
You are a seasoned expert in evaluating and comparing outputs from large language models. Given a question and two responses, please analyze and compare the answers in detail. Your evaluation should include:

1. **Clarity:** How clear and easy to understand is each answer?
2. **Accuracy & Correctness:** Are the facts and reasoning correct?
3. **Completeness:** Does the answer fully address the question?
4. **Relevance & Adherence:** How well does each answer follow the question’s instructions?

For each answer, highlight its strengths and weaknesses, then provide a concise summary stating which answer is better and why, using specific examples from the responses.

Question: "{gpt_output}"  
Answer 1 (GPT-4o): {gpt_output}  
Answer 2 (O1): {o1_output}
"""
)
display(Markdown(result))


Below is a detailed evaluation comparing Answer 1 and Answer 2 according to clarity, accuracy/correctness, completeness, and adherence to the question’s instructions.

──────────────────────────────
1. Clarity

• Answer 1 is very clear and easy to follow. It uses the same structure as the provided analysis—with sections such as “Executive Summary,” “Detailed Calculations and Analysis,” and “Final Recommendation.” The use of tables and bullet points makes the narrative highly organized and readable.

• Answer 2 is also organized—it clearly breaks the content into an “Executive Summary,” “Detailed Analysis & Calculations,” and “Final Answer.” However, some parts (for example the explanation of the historical claims and risk scores) are slightly less descriptive, and the structure is not as detailed in laying out the numerical work as in Answer 1.

──────────────────────────────
2. Accuracy & Correctness

• Answer 1 correctly reflects the numbers from the question’s analysis. It accurately computes:
  – A +35% adjustment due to historical claims (by summing claims data and comparing with industry standards)
  – A +10% loading from the on-site risk evaluation and inspection findings
  – A +25% penalty from state compliance issues
When these are summed (35% + 10% + 25%), the final premium loading is correctly stated as 70%.

• Answer 2, on the other hand, significantly departs from the provided data. It assigns:
  – +15% for historical claims (instead of +35%)
  – +10% for on-site risk
  – +10% for compliance issues (instead of +25%)
This produces a final total loading of only 35%, which does not align with the detailed reasoning given in the question. These discrepancies make Answer 2 factually incorrect relative to the original analysis.

──────────────────────────────
3. Completeness

• Answer 1 covers every aspect of the analysis in full. It explains the calculations based on each dataset (historical claims, risk survey, property inspection, and compliance checklist), presents detailed tables, and even reiterates the selection of risk mitigation strategies. All the elements of the question are fully addressed.

• Answer 2, while including all major sections (claims, risk assessment, compliance, premium calculation, and mitigation strategies), does not fully incorporate the detailed breakdown provided by the question. The simplified and lower loading percentages mean that it neglects to incorporate the full scale of risk depicted in the numbers. Thus, it is less complete relative to the details of the analysis.

──────────────────────────────
4. Relevance & Adherence

• Answer 1 adheres exactly to the task. It mirrors the structure, data points, and recommendations exactly as laid out in the original detailed analysis:
  – It clearly explains each dataset and shows the same numerical calculations.
  – It outlines risk mitigation strategies that match the scenario.
There is a high degree of relevance as every calculated component and guideline is explicitly covered.

• Answer 2 is relevant in the sense that it covers the same thematic issues (claims, risk evaluations, regulatory compliance, and risk mitigation) but does not adhere as well to the original information. Using different percentage adjustments (15% vs. 35% for claims, 10% vs. 25% for compliance) results in a recommendation (35% overall) that is at odds with the provided detailed data and computed final premium loading in the question. This divergence makes it less faithful to the instructions.

──────────────────────────────
Summary Evaluation

Answer 1 is clearly superior:
 • It is crisp and clear—the organization of content (sections, tables, bullet points) makes it very easy to follow.
 • It is factually accurate: It uses the correct percentage adjustments (+35% for historical claims, +10% for risk survey/inspection, and +25% for state compliance) leading to a final 70% loading, exactly as the provided analysis requires.
 • It is complete: All datasets and risk mitigation strategies are fully reflected.
 • It adheres completely to the instructions and the details of the question.

In contrast, Answer 2, although structured and generally clear, introduces inaccuracies by using lower loading percentages (+15% for claims and +10% for compliance) which lead to a final loading recommendation (35%) that does not reflect the complete risk profile. This deviation from the detailed analysis means Answer 2 fails to fully follow the question’s instructions.

Therefore, Answer 1 is better because it is more accurate, complete, and strictly adheres to the detailed information provided in the question.