## Pipeline: LLM-powered program generation for solving ARC-AGI

### Imports

In [289]:
import numpy as np
import ollama
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
import re

### Shared Variables

In [290]:
# Shared system prompt for all tasks
system_prompt = """
You are a visual reasoning and Python programming expert solving ARC-AGI (Abstraction and Reasoning Corpus - Artificial General Intelligence) tasks.

Each integer in the grid represents a color:
0 = black, 1 = blue, 2 = red, 3 = green, 4 = yellow,
5 = grey, 6 = pink, 7 = orange, 8 = light blue, 9 = brown.
"""


### Prompts

#### Basic Prompt

In [291]:
base_prompt = """
Write a Python function that correctly transforms each input grid into its corresponding output grid based on the given examples.

- The function must be named: `solve(grid: List[List[int]]) -> List[List[int]]`
- Include only the code and necessary imports (e.g., `import numpy as np`)
- Do not include comments, explanations, or print statements
- Do not hard-code values or specific grid sizes — the function must generalize based on the patterns in the examples
- Ensure your solution works for all provided input-output pairs
"""

In [292]:
print(base_prompt)


Write a Python function that correctly transforms each input grid into its corresponding output grid based on the given examples.

- The function must be named: `solve(grid: List[List[int]]) -> List[List[int]]`
- Include only the code and necessary imports (e.g., `import numpy as np`)
- Do not include comments, explanations, or print statements
- Do not hard-code values or specific grid sizes — the function must generalize based on the patterns in the examples
- Ensure your solution works for all provided input-output pairs



#### Prompt 1

In [293]:
prompt_1 = """
List visual observations from the training pairs.

- Use bullet points (max 10).
- Focus on colors, shapes, object counts, positions, and differences.
- Avoid reasoning or explanations.
- Be concise. No full sentences, no extra formatting.
"""

In [294]:
print(prompt_1)


List visual observations from the training pairs.

- Use bullet points (max 10).
- Focus on colors, shapes, object counts, positions, and differences.
- Avoid reasoning or explanations.
- Be concise. No full sentences, no extra formatting.



#### Prompt 2

In [295]:
prompt_2 = """
Describe the transformation(s) from input to output grids.

- Use 3 to 5 short sentences.
- Focus on what changes: movement, color, shape, duplication, etc.
- Mention if the transformation is based on position, context, or rules.
- Avoid implementation hints or code.
"""

In [296]:
print(prompt_2)


Describe the transformation(s) from input to output grids.

- Use 3 to 5 short sentences.
- Focus on what changes: movement, color, shape, duplication, etc.
- Mention if the transformation is based on position, context, or rules.
- Avoid implementation hints or code.



#### Prompt 3

In [297]:
prompt_3 = """
Reflect on how you would solve the task in Python.

- Use 3 to 5 sentences.
- Mention your overall approach, logical steps, and possible uncertainties.
- Do not return code or pseudocode.
"""

In [298]:
print(prompt_3)


Reflect on how you would solve the task in Python.

- Use 3 to 5 sentences.
- Mention your overall approach, logical steps, and possible uncertainties.
- Do not return code or pseudocode.



#### Prompt 4

In [299]:
#TODO: Add outputs of secondary prompts to other secondary prompts (especially for prompt 3) Maybe "buildPrompt" function.
#TODO: Create Revision prompt.

In [300]:
prompt_4 = ""

### Functions

#### Load Tasks

In [301]:
def load_tasks(folder):
    tasks = []
    for filename in sorted(os.listdir(folder)):
        if filename.endswith(".json"):
            with open(os.path.join(folder, filename), "r") as f:
                data = json.load(f)
                tasks.append({"filename": filename, "data": data})
    return tasks

#### Load API-Key

In [302]:
def load_api_key(file_path="key.env"):
    load_dotenv(file_path)
    import openai
    openai.api_key = os.getenv("OPENAI_API_KEY")
    if not openai.api_key:
        print("No API key found. Please set OPENAI_API_KEY in key.env.")
    global client
    client = OpenAI()

#### Building and Combining Prompts

Adds the tasks demonstration pairs to the prompt:

In [303]:
def build_prompt(prompt, task_data):
    full_prompt = prompt.strip() + "\n\nHere are the demonstration pairs (JSON data):\n"
    for i, pair in enumerate(task_data['train']):
        full_prompt += f"\nTrain Input {i+1}: {pair['input']}\n"
        full_prompt += f"Train Output {i+1}: {pair['output']}\n"
    return full_prompt

Combines secondary prompt 1 and 2:

In [304]:
def combine_prompts_1_and_2(prompt_1_response, prompt_2_template):
    combined_prompt = f"""{prompt_2_template.strip()}

Here are visual observations of the task at hand, that may assist you in identifying the transformation:

{prompt_1_response.strip()}

Now provide your transformation analysis based on these observations."""
    return combined_prompt

Combines secondary prompt 1, 2 and 3:

In [305]:
def combine_prompts_1_2_and_3(prompt_1_response, prompt_2_response, prompt_3_template):
    combined_prompt = f"""{prompt_3_template.strip()}

Here are visual observations of the task that may help inform your implementation:
{prompt_1_response.strip()}

Here are the transformation rules that have been identified based on the task:
{prompt_2_response.strip()}

Now reflect on how you would implement a solution to this task in Python, following the instructions above.
"""
    return combined_prompt


Combines secondary prompt 3 with the base prompt

In [306]:
def combine_prompts_3_and_base(prompt_3_response, prompt_base_template):
    combined_prompt = f"""
Implementation Reflection:
{prompt_3_response.strip()}

{prompt_base_template.strip()}
"""
    return combined_prompt.strip()

#### Save Programs

In [307]:
def save_program(program_text, task_id):
    import re

    # Define the base and task-specific folder paths
    base_folder = "Candidate_programs"
    task_folder = os.path.join(base_folder, f"task_{task_id}")
    
    # Create the task-specific folder if it doesn't exist
    os.makedirs(task_folder, exist_ok=True)

    # Remove ```python or ``` if present
    cleaned_text = re.sub(r"^```(?:python)?\s*|```$", "", program_text.strip(), flags=re.MULTILINE)

    # Find the next available version number
    existing_files = os.listdir(task_folder)
    version_numbers = [
        int(re.search(r"solution_v(\d+)\.py", fname).group(1))
        for fname in existing_files
        if re.match(r"solution_v\d+\.py", fname)
    ]
    next_version = max(version_numbers, default=0) + 1
    
    # Define the full path to the new Python file
    file_path = os.path.join(task_folder, f"solution_v{next_version}.py")
    
    # Save the program text to the file
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(cleaned_text.strip())

    
    print(f"✅ Saved program for task {task_id} as version {next_version}: {file_path}")


#### Call GPT

In [308]:
def call_gpt(prompt):
    response = client.chat.completions.create(
        model="o3-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content.strip()


### Pipeline

In [312]:
tasks = load_tasks("evaluation_set")
load_api_key()

# Find task with filename "817e6c09.json"
target_filename = "da2b0fe3.json"
task = next(t for t in tasks if t["filename"] == target_filename)

# Use task directly (no loop needed if it's just one)
task_id = tasks.index(task) + 1  # Optional: preserve position info

# Build and run secondary prompts:
full_prompt_1 = build_prompt(prompt_1, task["data"])
response_1 = call_gpt(full_prompt_1)
print(f"Prompt 1 Response for task {task_id}:\n{response_1}\n")

combined_prompt_2 = combine_prompts_1_and_2(response_1, prompt_2)
full_prompt_2 = build_prompt(combined_prompt_2, task["data"])
response_2 = call_gpt(full_prompt_2)
print(f"Prompt 2 Response for task {task_id}:\n{response_2}\n")

combined_prompt_3 = combine_prompts_1_2_and_3(response_1, response_2, prompt_3)
full_prompt_3 = build_prompt(combined_prompt_3, task["data"])
response_3 = call_gpt(full_prompt_3)
print(f"Prompt 3 Response for task {task_id}:\n{response_3}\n")

combined_prompt_base = combine_prompts_3_and_base(response_3, base_prompt)
tailored_prompt = build_prompt(combined_prompt_base, task["data"])
response = call_gpt(tailored_prompt)
print(f"Base Prompt Response for task {task_id}:\n{response}\n")

save_program(response, task_id)


Prompt 1 Response for task 83:
- Predominantly blank backgrounds (color 0) in all grids.
- Central patterns formed by nonzero colors (2, 1, 5) in inputs.
- Train 1: Color 2 arranged in a rectangular/square pattern.
- Train 2: Color 1 arranged in an uneven, connected pattern.
- Train 3: Color 5 arranged in a blocky, repeated pattern.
- Outputs add a continuous line of color 3 (green) across one row in Train 1 and 2.
- Output 3 shows a vertical stripe of color 3 (green) replacing part of the structure.
- Underlying object shapes remain while a green line is superimposed.

Prompt 2 Response for task 83:
In each case the base pattern is preserved, but a new continuous green (color 3) line is added. In the first two examples, a horizontal green stripe is inserted along the fifth row of the grid, overriding any existing values in that row. In the third example, a vertical green stripe is drawn down the center column instead. The transformation is positional, replacing an entire row or column

In [311]:
print(tailored_prompt)

Write a Python function that correctly transforms each input grid into its corresponding output grid based on the given examples.

- The function must be named: `solve(grid: List[List[int]]) -> List[List[int]]`
- Include only the code and necessary imports (e.g., `import numpy as np`)
- Do not include comments, explanations, or print statements
- Do not hard-code values or specific grid sizes — the function must generalize based on the patterns in the examples
- Ensure your solution works for all provided input-output pairs

Here are the demonstration pairs (JSON data):

Train Input 1: [[2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0], [0, 0, 0, 0, 2, 2, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0], [0, 2, 2, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 2, 0, 0], [0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 0, 0], [0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 2, 2], [0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2]]
Train Output 1: [[8, 8, 0, 0, 0, 0,