Prompt Engineering:

1. Give a persona
2. Split complex tasks into small tasks
3. Give it in steps and make sure to keep further steps or describe the context.
4. Give models time to think:
    Example: First work out your own solution to the problem. Then compare your solution
    

In [3]:
import base64
import pandas as pd

df = pd.read_excel("ChartswithCaptions.xlsx")

def encode_image(image_path):

    caption = df[df["imageid"]==int(image_path.split("/")[1].split(".")[0])]["full_caption"].values[0]

    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8"), caption
    

# image_2130, caption_2130 = encode_image("Charts/2130.png")
# image_2107, caption_2107 = encode_image("Charts/2107.png")
image_2065, caption_2065 = encode_image("Charts/2065.png")

In [4]:
caption_2065

"Figure 3.35: Deployment of primary energy technologies across post-2001 \nscenarios by 2030 and 2100: Left-side 'error' bars show baseline (non-intervention) \nscenarios and right-side ones show intervention and stabilization scenarios. The full \nranges of the distributions (full vertical line with two extreme tic marks), the 25th and \n75th percentiles (blue area) and the median (middle tic mark) are also shown."

In [5]:
system_prompt = """
You are GPT-4o-mini, a reasoning model specialized in analyzing charts and their captions. Your task is to carefully examine the provided chart and caption, then clearly identify and list the attributes used in the chart.

When presented with a chart and its caption:
    1. Carefully inspect the chart provided.
    2. **Let's think step-by-step.** First, carefully inspect the provided chart and caption. Identify and list all attributes used in the chart. 
    3. **Briefly explain** how each attribute is represented or visualized in the chart (e.g., axis labels, legends, colors, data points).
    4. **Clearly separate** your response into two sections:
        a. "**Identified Attributes:**" (list attributes succinctly)
        b. "**Visualization Explanation:**" (briefly describe how each attribute is visualized)
    5. **Clearly summarize** the specific chart variables used (e.g., x-axis, y-axis, legend, color encoding) separately at the end of your response.

Important instructions for optimal performance:

1. Be concise and direct in your analysis.
2. Do not include unnecessary examples or additional context beyond what is provided.
3. Do not generate or assume any information not explicitly present in the provided chart and caption.
4. Leverage your internal chain-of-thought reasoning capability without explicit prompting for step-by-step reasoning.
5. Ensure your output is accurate, consistent, and directly based on the provided data only.
"""

In [6]:
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": [
                {"type": "text", "text": f"Here is the caption for the image: {caption_2065}"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_2065}"
                    }
                }
            ],
        }
    ]
)
chart_2065 = response.choices[0].message.content

In [7]:
print(chart_2065)

### Identified Attributes:
1. **Categories of Energy Technologies**: Coal, Gas, Oil, Nuclear, Biomass (with distinctions for baseline (B) and stabilization (S) scenarios).
2. **Years**: 2030 (top chart) and 2100 (bottom chart).
3. **Energy Units**: Exajoules (EJ).
4. **Error Bars**: Baseline (non-intervention) and intervention/stabilization scenarios.
5. **Statistical Measures**: Full range (vertical lines), 25th and 75th percentiles (blue area), and median (middle tic mark) for each category.

### Visualization Explanation:
1. **Categories of Energy Technologies**: Represented on the x-axis with labels for each technology, indicating both baseline and stabilization scenarios.
2. **Years**: The two separate charts clearly label the years (2030 and 2100) at the top of each section.
3. **Energy Units**: The y-axis is labeled "EJ" (Exajoules), representing the amount of energy.
4. **Error Bars**: Left-side error bars indicate baseline scenarios, while right-side bars represent interventio

In [8]:
image_2064, caption_2064 = encode_image("Charts/2064.png")
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": [
                {"type": "text", "text": f"Here is the caption for the image: {caption_2064}"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_2064}"
                    }
                }
            ],
        }
    ]
)
chart_2064 = response.choices[0].message.content

In [9]:
print(chart_2064)

**Identified Attributes:**

1. **Time (Ma)** - Horizontal axis across all sections.
2. **Atmospheric CO2 (ppm)** - Vertical axis in the top and bottom sections.
3. **Continental glaciation (° palaeolatitude)** - Secondary vertical axis in the top section.
4. **Deep Ocean Temperature (°C)** - Vertical axis in the middle section.
5. **CO2 Proxies** - Various lines/bars representing different methods (e.g., Stomata, Phytoplankton, Boron).
6. **Geochemical Models** - Area representing plausible ranges of CO2 based on models (e.g., GEOCARB III).
7. **Isotope Values (‰ 18O)** - Vertical axis in the middle section, represented by line data points.
8. **Data layers** - Different colored sections in the plots indicating ranges, averages, or types of data.

**Visualization Explanation:**

1. **Time (Ma)** - Represented on the horizontal axis, spanning from 400 million years ago to the present.
2. **Atmospheric CO2 (ppm)** - Displayed on the vertical axis in both the top and bottom sections with 

In [13]:
bridging_gpt_system_prompt = """
You are a Bridging GPT responsible for generating meaningful questions based on synthesized attributes from two charts.

Given the attributes of Chart A and Chart B, 
Both the charts contain "Identified Attributes", "Visualization Explanation", and "Summary of Chart Variables"

1. Analyze the relationships between the attributes of Chart A and Chart B.
2. Generate insightful questions and also answers for them that explore:
   - Comparisons between the two charts.
   - Trends, patterns, or correlations suggested by their attributes.
   - Real-world implications or hypotheses based on their common attributes (if applicable).

Structure your response as follows:

## Generated Questions and Answers:
- [List meaningful questions here]
- [List corresponding answers here]

Be creative, logical, and ensure that your questions are relevant to the provided attributes and their relationships.
"""

In [14]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": bridging_gpt_system_prompt},
        {
            "role": "user",
            "content": [
                {"type": "text", "text": f"Here is the synthesis of chart A: {chart_2064} and chart B: {chart_2065}"}
            ],
        }
    ]
)
print(response.choices[0].message.content)

## Generated Questions and Answers:

1. **Question:** How do the trends in atmospheric CO2 levels represented in Chart A compare to the projected energy outputs for fossil fuels in Chart B for the years 2030 and 2100?
   **Answer:** Chart A shows a historical rise in atmospheric CO2 levels, particularly correlating with periods of industrialization. In contrast, Chart B projects significant energy outputs from fossil fuel technologies (Coal, Gas, Oil) for both 2030 and 2100, which is likely to further exacerbate CO2 emissions if no stabilization interventions are applied. As such, increased reliance on fossil fuels directly correlates with rising CO2 levels, signifying a concerning trend for climate change.

2. **Question:** What potential implications might the increase in deep ocean temperatures shown in Chart A have on the energy technology adoption patterns depicted in Chart B?
   **Answer:** The increase in deep ocean temperatures highlighted in Chart A suggests a rising trend in 