# Analyzing Data and Interpreting Images with OpenAI's o1 Reasoning Model vs. GPT

## Introduction
OpenAI's o1 reasoning model is designed for complex problem-solving, data analysis, and image interpretation by simulating a multi-step thought process before generating responses. Unlike traditional GPT models, which produce output in a single pass, reasoning models use internal **reasoning tokens** to explore multiple approaches before finalizing an answer.
<p align="center">
    <img src="https://cdn.openai.com/API/images/guides/reasoning_tokens.png" alt="Reasoning Tokens" width="600">
</p>  

*Source: [OpenAI Reasoning Models Guide](https://platform.openai.com/docs/guides/reasoning)*

**Key Differences: o1 Reasoning Model vs. GPT**
- Multi-step reasoning: o1 evaluates different solutions before selecting the best response.
- Deeper analytical capabilities: Optimized for complex data interpretation tasks.
- Context-aware image analysis: Provides more structured and insightful image descriptions.
- Reasoning Effort Control: Users can adjust the depth of reasoning (`low`, `medium`, `high`).


For more details, refer to the [OpenAI Reasoning Models Guide](https://platform.openai.com/docs/guides/reasoning).


## Purchase and Store API Key

You need to **purchase** your [OpenAI](https://openai.com/) API key and store it securely, such as in **AWS Secrets Manager**.

- **Key Name:** `api_key`  
- **Key Value:** `<your OpenAI API key>`  
- **Secret Name:** `openai`  

## Install Python Libraries

- **openai**: Used to call `o1` and `GPT` models for data analysis and image interpretation.

In [1]:
pip install openai -q

Note: you may need to restart the kernel to use updated packages.


## Import Required Libraries

The following libraries are used in this notebook:

- **boto3**: AWS SDK for Python, used to interact with AWS services.
- **json**: Standard Python library for handling JSON data.
- **IPython.display**: Provides tools to display images, Markdown content, and other rich media in Jupyter Notebook.
- **openai**: Used to call `o1` and `GPT` models for data analysis and image interpretation.
- **pandas**: A powerful library for data manipulation and analysis.
- **pprint**: Pretty prints data structures for better readability.

In [2]:
import boto3
import json
from IPython.display import display, Image, Markdown
from openai import OpenAI
import pandas as pd
from pprint import pprint

## Retrieve API Keys Securely from AWS Secrets Manager

The following function, `get_secret()`, retrieves a secret from **AWS Secrets Manager**. This is a secure way to store and access sensitive credentials, such as API keys, without hardcoding them into the script

In [3]:
def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Initialize OpenAI Client

The following code initializes the OpenAI client using a securely stored API key retrieved from AWS Secrets Manager.

In [4]:
client = OpenAI(api_key= get_secret('openai')['api_key'])

## Load and Analyze the Diamonds Dataset

This notebook uses the **diamonds dataset ([diamonds.csv](https://github.com/lbsocial/data-analysis-with-generative-ai/blob/main/diamonds.csv))**, which contains detailed attributes of diamonds, including weight, color, clarity, and price.

One interesting pattern in the dataset is that **diamonds with "IF" (Internally Flawless) clarity tend to have the lowest average price** compared to other clarity grades. This observation is counterintuitive, as one might expect the highest-clarity diamonds to be the most expensive.

In [6]:
df = pd.read_csv('full_asylum_apps_copy.csv')
data_json = df.to_json(orient="records")
df.head()

Unnamed: 0,Country,Year,Count,Accepted
0,France,2008,31760,5150
1,France,2009,35295,5055
2,France,2010,37610,5100
3,France,2011,42215,4620
4,France,2012,59805,8640


## Generate Data Analysis Prompt for OpenAI Model

To investigate why diamonds with **IF (Internally Flawless) clarity** have the **lowest average price**, we generate a structured prompt for the OpenAI model. The model will analyze the dataset and generate insights, including **Python code for visualizations**.


In [7]:
data_prompt = f"Analyze the provided data and determine how France and Germany fluctuate in accepting asylum applicants. Provide Python-generated charts to support your conclusion. Data: {data_json}"
# print(prompt)

## Define a Function to Get Assistance from OpenAI GPT-4o

The following function, `openai_gpt_help()`, sends a prompt to OpenAI's **GPT-4o model** and returns a response. It also prints the number of tokens used in the request.

In [8]:
def openai_gpt_help(prompt):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=messages,
        temperature = 0
    )
    token_usage = response.usage
    
    pprint(f"Tokens used: {token_usage}")

    return response.choices[0].message.content

In [14]:
gpt_result = openai_gpt_help(prompt=data_prompt)

('Tokens used: CompletionUsage(completion_tokens=1502, prompt_tokens=681, '
 'total_tokens=2183, '
 'completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, '
 'audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), '
 'prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')


In [15]:
display(Markdown(gpt_result))

To analyze the fluctuations in asylum applications and acceptance rates for France and Germany, we can visualize the data using Python. We'll use libraries such as `pandas` for data manipulation and `matplotlib` for plotting the charts. Let's create line charts to show the trends over the years for both countries.

```python
import pandas as pd
import matplotlib.pyplot as plt

# Data
data = [
    {"Country": "France", "Year": 2008, "Count": 31760, "Accepted": 5150},
    {"Country": "France", "Year": 2009, "Count": 35295, "Accepted": 5055},
    {"Country": "France", "Year": 2010, "Count": 37610, "Accepted": 5100},
    {"Country": "France", "Year": 2011, "Count": 42215, "Accepted": 4620},
    {"Country": "France", "Year": 2012, "Count": 59805, "Accepted": 8640},
    {"Country": "France", "Year": 2013, "Count": 61715, "Accepted": 10705},
    {"Country": "France", "Year": 2014, "Count": 68500, "Accepted": 14810},
    {"Country": "France", "Year": 2015, "Count": 77910, "Accepted": 20635},
    {"Country": "France", "Year": 2016, "Count": 87485, "Accepted": 28750},
    {"Country": "France", "Year": 2017, "Count": 110945, "Accepted": 32565},
    {"Country": "France", "Year": 2018, "Count": 115050, "Accepted": 32720},
    {"Country": "France", "Year": 2019, "Count": 113895, "Accepted": 28140},
    {"Country": "France", "Year": 2020, "Count": 86330, "Accepted": 19130},
    {"Country": "France", "Year": 2021, "Count": 137015, "Accepted": 33880},
    {"Country": "France", "Year": 2022, "Count": 129655, "Accepted": 35675},
    {"Country": "France", "Year": 2023, "Count": 132640, "Accepted": 41660},
    {"Country": "France", "Year": 2024, "Count": 138385, "Accepted": 52125},
    {"Country": "Germany", "Year": 2008, "Count": 19245, "Accepted": 7870},
    {"Country": "Germany", "Year": 2009, "Count": 26765, "Accepted": 9765},
    {"Country": "Germany", "Year": 2010, "Count": 45305, "Accepted": 10445},
    {"Country": "Germany", "Year": 2011, "Count": 40280, "Accepted": 9675},
    {"Country": "Germany", "Year": 2012, "Count": 58605, "Accepted": 17140},
    {"Country": "Germany", "Year": 2013, "Count": 76170, "Accepted": 20120},
    {"Country": "Germany", "Year": 2014, "Count": 97280, "Accepted": 53820},
    {"Country": "Germany", "Year": 2015, "Count": 249280, "Accepted": 140910},
    {"Country": "Germany", "Year": 2016, "Count": 631090, "Accepted": 433905},
    {"Country": "Germany", "Year": 2017, "Count": 524200, "Accepted": 261625},
    {"Country": "Germany", "Year": 2018, "Count": 179110, "Accepted": 75935},
    {"Country": "Germany", "Year": 2019, "Count": 154180, "Accepted": 70315},
    {"Country": "Germany", "Year": 2020, "Count": 128585, "Accepted": 62475},
    {"Country": "Germany", "Year": 2021, "Count": 132675, "Accepted": 59850},
    {"Country": "Germany", "Year": 2022, "Count": 197540, "Accepted": 128460},
    {"Country": "Germany", "Year": 2023, "Count": 217430, "Accepted": 135275},
    {"Country": "Germany", "Year": 2024, "Count": 250255, "Accepted": 133710}
]

# Convert to DataFrame
df = pd.DataFrame(data)

# Separate data for France and Germany
france_data = df[df['Country'] == 'France']
germany_data = df[df['Country'] == 'Germany']

# Plotting
plt.figure(figsize=(14, 6))

# France
plt.subplot(1, 2, 1)
plt.plot(france_data['Year'], france_data['Count'], label='Applications', marker='o')
plt.plot(france_data['Year'], france_data['Accepted'], label='Accepted', marker='o')
plt.title('France Asylum Applications and Acceptances')
plt.xlabel('Year')
plt.ylabel('Number of People')
plt.legend()
plt.grid(True)

# Germany
plt.subplot(1, 2, 2)
plt.plot(germany_data['Year'], germany_data['Count'], label='Applications', marker='o')
plt.plot(germany_data['Year'], germany_data['Accepted'], label='Accepted', marker='o')
plt.title('Germany Asylum Applications and Acceptances')
plt.xlabel('Year')
plt.ylabel('Number of People')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()
```

### Analysis

1. **France:**
   - The number of asylum applications in France has generally increased over the years, with a notable rise from 2011 to 2017.
   - The number of accepted applications also increased, particularly from 2015 onwards, indicating a more accommodating stance or increased processing capacity.
   - There was a dip in both applications and acceptances in 2020, likely due to the COVID-19 pandemic, but numbers have since rebounded.

2. **Germany:**
   - Germany experienced a significant surge in asylum applications in 2015 and 2016, corresponding to the European migrant crisis.
   - The acceptance rate also peaked during these years, reflecting Germany's open-door policy at the time.
   - Post-2016, both applications and acceptances decreased significantly, stabilizing at lower levels compared to the peak years.
   - Recent years show a gradual increase in applications and acceptances, indicating a potential new trend of rising asylum seekers.

These visualizations and analyses provide insights into the trends and fluctuations in asylum applications and acceptances in France and Germany over the years.

## Define a Function to Get Assistance from OpenAI o1 Model  

The following function, `openai_o_help()`, sends a prompt to OpenAI's **o1 reasoning model** and returns a response.  

### Key Differences Between o1 and GPT Models:
- **Reasoning Effort**: The o1 model allows users to control reasoning depth using `reasoning_effort` (`low`, `medium`, `high`).  
- **No Temperature Parameter**: Unlike GPT models, **o1 does not support `temperature`**.  
- **Developer Messages Replace System Messages**:  
  - Starting with `o1-2024-12-17`, **developer messages** replace **system messages** to align with chain-of-command behavior.  

### Best Practices for Prompting o1  
- **Keep prompts simple and direct.**  
- **Avoid chain-of-thought prompts.** o1 reasons internally, so step-by-step instructions aren't needed.  
- **Use delimiters for clarity.** Use Markdown, XML tags, or section titles.  
- **Try zero-shot first.** If needed, add few-shot examples that closely match your goal.  
- **Be explicit.** Clearly define success criteria and constraints.  
- **Markdown is disabled by default.** To enable, start with `"Formatting re-enabled"`.  

Source: [OpenAI Reasoning Models Best Practices Guide](https://platform.openai.com/docs/guides/reasoning-best-practices).  


In [11]:
def openai_o_help(prompt):
    messages = [ {"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model='o1',
        reasoning_effort="high", # low, medium or high
        messages=messages,

    )
    token_usage = response.usage
    
    pprint(f"Tokens used: {token_usage}")

    return response.choices[0].message.content

In [12]:
o1_result = openai_o_help(prompt=data_prompt)

('Tokens used: CompletionUsage(completion_tokens=3658, prompt_tokens=680, '
 'total_tokens=4338, '
 'completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, '
 'audio_tokens=0, reasoning_tokens=1280, rejected_prediction_tokens=0), '
 'prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')


In [13]:
print(o1_result)

Below is a step-by-step analysis of the provided data for France and Germany, focusing on how they fluctuate in “accepting” asylum applicants. The Python code at the end creates charts that illustrate these trends.

────────────────────────────────────────────
1. Data Overview
────────────────────────────────────────────

• The dataset includes yearly asylum application counts (Count) and accepted asylum outcomes (Accepted) from 2008 through 2024 for both France and Germany.  
• Key columns:  
  – Country: France or Germany  
  – Year: 2008–2024  
  – Count: Number of asylum applications  
  – Accepted: Number of accepted applications  

────────────────────────────────────────────
2. General Observations
────────────────────────────────────────────

• France (2008–2024):
  – The total number of asylum applications (“Count”) grows steadily over time, from around 31,760 in 2008 to 138,385 in 2024.  
  – Accepted applications also increase, though at a slower pace initially. Notably, 201

## References  
- **OpenAI Reasoning Models Guide**: [OpenAI](https://platform.openai.com/docs/guides/reasoning)  
- **OpenAI Reasoning Models Best Practices Guide**: [OpenAI](https://platform.openai.com/docs/guides/reasoning-best-practices)  
- **Colin Jarvis. “Reasoning with O1.” DeepLearning.AI.** Accessed February 14, 2025. [DeepLearning.AI](https://www.deeplearning.ai/short-courses/reasoning-with-o1/)  

Reflection
- The model's reasoning did support my analysis. The data set used was one from my capstone project, and the model reached the same conclusion that my team also reached.
- This approach could definitely be applied to real-world intelligence workflows, as the model was able to incorporate outside resources (reference to COVID, EU migrate crisis, etc.) and relate it to the analysis.
- One ethical concern I would have is the ability to bring in outside sources. I asked a fairly simple question that did not have bias, but I could see how easy it would be for humans to incorporate bias into their prompt, and then attempt to justify the claim through the use of this model's response.