## Pipeline: LLM-powered program generation for solving ARC-AGI

### Imports

In [44]:
import numpy as np
import ollama
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
import re

### Shared Variables

In [45]:
# Shared system prompt for all tasks
system_prompt = """
You are a visual reasoning and Python programming expert solving ARC-AGI (Abstraction and Reasoning Corpus - Artificial General Intelligence) tasks.

Each integer in the grid represents a color:
0 = black, 1 = blue, 2 = red, 3 = green, 4 = yellow,
5 = grey, 6 = pink, 7 = orange, 8 = light blue, 9 = brown.
"""


### Prompts

#### Basic Prompt

In [46]:
base_prompt = """
Write a Python function that correctly transforms each input grid into its corresponding output grid based on the given examples.

- The function must be named: `solve(grid: List[List[int]]) -> List[List[int]]`
- Include only the code and necessary imports (e.g., `import numpy as np`)
- Do not include comments, explanations, or print statements
- Do not hard-code values or specific grid sizes — the function must generalize based on the patterns in the examples
- Ensure your solution works for all provided input-output pairs
"""

In [47]:
print(base_prompt)


Write a Python function that correctly transforms each input grid into its corresponding output grid based on the given examples.

- The function must be named: `solve(grid: List[List[int]]) -> List[List[int]]`
- Include only the code and necessary imports (e.g., `import numpy as np`)
- Do not include comments, explanations, or print statements
- Do not hard-code values or specific grid sizes — the function must generalize based on the patterns in the examples
- Ensure your solution works for all provided input-output pairs



#### Prompt 1

In [48]:
prompt_1 = """
List visual observations from the training pairs.

- Use bullet points (max 10).
- Focus on colors, shapes, object counts, positions, and differences.
- Avoid reasoning or explanations.
- Be concise. No full sentences, no extra formatting.
"""

In [49]:
print(prompt_1)


List visual observations from the training pairs.

- Use bullet points (max 10).
- Focus on colors, shapes, object counts, positions, and differences.
- Avoid reasoning or explanations.
- Be concise. No full sentences, no extra formatting.



#### Prompt 2

In [50]:
prompt_2 = """
Describe the transformation(s) from input to output grids.

- Use 3 to 5 short sentences.
- Focus on what changes: movement, color, shape, duplication, etc.
- Mention if the transformation is based on position, context, or rules.
- Avoid implementation hints or code.
"""

In [51]:
print(prompt_2)


Describe the transformation(s) from input to output grids.

- Use 3 to 5 short sentences.
- Focus on what changes: movement, color, shape, duplication, etc.
- Mention if the transformation is based on position, context, or rules.
- Avoid implementation hints or code.



#### Prompt 3

In [52]:
prompt_3 = """
Reflect on how you would solve the task in Python.

- Use 3 to 5 sentences.
- Mention your overall approach, logical steps, and possible uncertainties.
- Do not return code or pseudocode.
"""

In [53]:
print(prompt_3)


Reflect on how you would solve the task in Python.

- Use 3 to 5 sentences.
- Mention your overall approach, logical steps, and possible uncertainties.
- Do not return code or pseudocode.



#### Prompt 4

In [54]:
#TODO: Add outputs of secondary prompts to other secondary prompts (especially for prompt 3) Maybe "buildPrompt" function.
#TODO: Create Revision prompt.

In [55]:
prompt_4 = ""

### Functions

#### Load Tasks

In [56]:
def load_tasks(folder):
    tasks = []
    for filename in sorted(os.listdir(folder)):
        if filename.endswith(".json"):
            with open(os.path.join(folder, filename), "r") as f:
                data = json.load(f)
                tasks.append({"filename": filename, "data": data})
    return tasks

#### Load API-Key

In [57]:
def load_api_key(file_path="key.env"):
    load_dotenv(file_path)
    import openai
    openai.api_key = os.getenv("OPENAI_API_KEY")
    if not openai.api_key:
        print("No API key found. Please set OPENAI_API_KEY in key.env.")
    global client
    client = OpenAI()

#### Call GPT

In [58]:
def call_gpt(prompt):
    response = client.chat.completions.create(
        model="o3-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content.strip()

#### Building and Combining Prompts

Adds the tasks demonstration pairs to the prompt:

In [59]:
def add_tasks(prompt, task_data):
    full_prompt = prompt.strip() + "\n\nHere are the demonstration pairs (JSON data):\n"
    for i, pair in enumerate(task_data['train']):
        full_prompt += f"\nTrain Input {i+1}: {pair['input']}\n"
        full_prompt += f"Train Output {i+1}: {pair['output']}\n"
    return full_prompt

Combines secondary prompt 1 and 2:

In [60]:
def combine_prompts_1_and_2(prompt_1_response, prompt_2_template):
    combined_prompt = f"""{prompt_2_template.strip()}

Here are visual observations of the task at hand, that may assist you in identifying the transformation:

{prompt_1_response.strip()}

Now provide your transformation analysis based on these observations."""
    return combined_prompt

Combines secondary prompt 1, 2 and 3:

In [61]:
def combine_prompts_1_2_and_3(prompt_1_response, prompt_2_response, prompt_3_template):
    combined_prompt = f"""{prompt_3_template.strip()}

Here are visual observations of the task that may help inform your implementation:
{prompt_1_response.strip()}

Here are the transformation rules that have been identified based on the task:
{prompt_2_response.strip()}

Now reflect on how you would implement a solution to this task in Python, following the instructions above.
"""
    return combined_prompt


Combines secondary prompt 3 with the base prompt

In [62]:
def combine_prompts_3_and_base(prompt_3_response, prompt_base_template):
    combined_prompt = f"""
Implementation Reflection:
{prompt_3_response.strip()}

{prompt_base_template.strip()}
"""
    return combined_prompt.strip()

Combine responses of the secondary prompt to the base prompt to create task-tailored prompt.

In [63]:
def build_prompts(task_data):
    # Build secondary prompt 1
    full_prompt_1 = add_tasks(prompt_1, task_data)
    response_1 = call_gpt(full_prompt_1)
    print('Built prompt 1')
    
    # Build secondary prompt 2
    combined_prompt_2 = combine_prompts_1_and_2(response_1, prompt_2)
    full_prompt_2 = add_tasks(combined_prompt_2, task_data)
    response_2 = call_gpt(full_prompt_2)
    print('Built prompt 2')
    
    # Build secondary prompt 3
    combined_prompt_3 = combine_prompts_1_2_and_3(response_1, response_2, prompt_3)
    full_prompt_3 = add_tasks(combined_prompt_3, task_data)
    response_3 = call_gpt(full_prompt_3)
    print('Built prompt 3')
    
    # Build task-tailored prompt
    combined_prompt_base = combine_prompts_3_and_base(response_3, base_prompt)
    tailored_prompt = add_tasks(combined_prompt_base, task_data)
    
    return tailored_prompt

#### Save Programs

In [64]:
def save_program(program_text, task_id):
    import re

    # Define the base and task-specific folder paths
    base_folder = "Candidate_programs"
    task_folder = os.path.join(base_folder, f"task_{task_id}")
    
    # Create the task-specific folder if it doesn't exist
    os.makedirs(task_folder, exist_ok=True)

    # Remove ```python or ``` if present
    cleaned_text = re.sub(r"^```(?:python)?\s*|```$", "", program_text.strip(), flags=re.MULTILINE)

    # Find the next available version number
    existing_files = os.listdir(task_folder)
    version_numbers = [
        int(re.search(r"solution_v(\d+)\.py", fname).group(1))
        for fname in existing_files
        if re.match(r"solution_v\d+\.py", fname)
    ]
    next_version = max(version_numbers, default=0) + 1
    
    # Define the full path to the new Python file
    file_path = os.path.join(task_folder, f"solution_v{next_version}.py")
    
    # Save the program text to the file
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(cleaned_text.strip())

    
    print(f"Saved program for task {task_id} as version {next_version}: {file_path}")


#### Create Programs

In [65]:
def create_programs(tailored_prompt, task_index):
    # Create two programs
    for i in range(2):
        # Call the model with the tailored prompt
        response = call_gpt(tailored_prompt)
        
        # Save the generated program
        save_program(response, task_index)
    
    return response

### Pipeline

In [None]:
tasks = load_tasks("evaluation_set")
load_api_key()

for i, task in enumerate(tasks[:1]):
    # Build secondary prompts and create tailored_prompt from their responses
    tailored_prompt = build_prompts(task['data'])
    # Create two programs based on the tailored prompt
    create_programs(tailored_prompt, i+1)

Built prompt 1
Built prompt 2
Built prompt 3
Saved program for task 0 as version 1: Candidate_programs\task_0\solution_v1.py
Saved program for task 0 as version 2: Candidate_programs\task_0\solution_v2.py


In [67]:
print(tailored_prompt)

Implementation Reflection:
I would use a flood-fill or connected-components algorithm to scan the grid and group pixels by color into connected regions. Then I would compute the bounding boxes for the significant clusters, filtering out those that are too small or peripheral, and fill in each valid region uniformly with its dominant color. I would then reassemble the grid by replacing the clusters with these solid blocks while leaving the background unchanged. One possible uncertainty is tuning the thresholds that determine when a cluster is considered stray or peripheral versus part of a central, valid shape.

Write a Python function that correctly transforms each input grid into its corresponding output grid based on the given examples.

- The function must be named: `solve(grid: List[List[int]]) -> List[List[int]]`
- Include only the code and necessary imports (e.g., `import numpy as np`)
- Do not include comments, explanations, or print statements
- Do not hard-code values or speci