## Pipeline: LLM-powered program generation for solving ARC-AGI

### Imports

In [154]:
import numpy as np
import ollama
import os
import json
from dotenv import load_dotenv
from openai import OpenAI

### Shared Variables

In [155]:
# Shared system prompt for all tasks
system_prompt = """
You are a visual reasoning and Python programming expert solving ARC-AGI (Abstraction and Reasoning Corpus - Artificial General Intelligence) tasks.

Each integer in the grid represents a color:
0 = black, 1 = blue, 2 = red, 3 = green, 4 = yellow,
5 = grey, 6 = pink, 7 = orange, 8 = light blue, 9 = brown.
"""


### Prompts

#### Basic Prompt

In [None]:
basic_prompt = f"""
You are a programming expert specialized in solving ARC-AGI (Abstraction and Reasoning Corpus - Artificial General Intelligence) tasks. 
Your task is to write efficient and correct Python functions that solve given ARC tasks based on example input-output pairs.



Your Python solution should include a function called `solve(grid: List[List[int]]) -> List[List[int]]` that performs the transformation.
Only return the function definition and any necessary imports (e.g., `import numpy as np`).
Avoid explanation, comments, or print statements—only return valid code that could be run as-is in a script or notebook cell.
"""

In [158]:
print(basic_prompt)


You are a programming expert specialized in solving ARC-AGI (Abstraction and Reasoning Corpus - Artificial General Intelligence) tasks. 
Your task is to write efficient and correct Python functions that solve given ARC tasks based on example input-output pairs.

Each integer of the grids corresponds to a color using the following mapping:

0 = black  
1 = blue  
2 = red  
3 = green  
4 = yellow  
5 = grey  
6 = pink  
7 = orange  
8 = light blue  
9 = brown


Your Python solution should include a function called `solve(grid: List[List[int]]) -> List[List[int]]` that performs the transformation.
Only return the function definition and any necessary imports (e.g., `import numpy as np`).
Avoid explanation, comments, or print statements—only return valid code that could be run as-is in a script or notebook cell.



#### Prompt 1

In [159]:
prompt_1 = """
List visual observations from the training pairs.

- Use bullet points (max 10).
- Focus on colors, shapes, object counts, positions, and differences.
- Avoid reasoning or explanations.
- Be concise. No full sentences, no extra formatting.
"""

In [160]:
print(prompt_1)


List visual observations from the training pairs.

- Use bullet points (max 10).
- Focus on colors, shapes, object counts, positions, and differences.
- Avoid reasoning or explanations.
- Be concise. No full sentences, no extra formatting.



#### Prompt 2

In [161]:
prompt_2 = """
Describe the transformation(s) from input to output grids.

- Use 3 to 5 short sentences.
- Focus on what changes: movement, color, shape, duplication, etc.
- Mention if the transformation is based on position, context, or rules.
- Avoid implementation hints or code.
"""

In [162]:
print(prompt_2)


Describe the transformation(s) from input to output grids.

- Use 3 to 5 short sentences.
- Focus on what changes: movement, color, shape, duplication, etc.
- Mention if the transformation is based on position, context, or rules.
- Avoid implementation hints or code.



#### Prompt 3

In [163]:
prompt_3 = """
Reflect on how you would solve the task in Python.

- Use 3 to 5 sentences.
- Mention your overall approach, logical steps, and possible uncertainties.
- Do not return code or pseudocode.
"""

In [164]:
print(prompt_3)


Reflect on how you would solve the task in Python.

- Use 3 to 5 sentences.
- Mention your overall approach, logical steps, and possible uncertainties.
- Do not return code or pseudocode.



#### Prompt 4

In [165]:
#TODO: Add outputs of secondary prompts to other secondary prompts (especially for prompt 3) Maybe "buildPrompt" function.
#TODO: Create Revision prompt.

In [166]:
prompt_4 = ""

### Functions

#### Load Tasks

In [167]:
def load_tasks(folder):
    tasks = []
    for filename in sorted(os.listdir(folder)):
        if filename.endswith(".json"):
            with open(os.path.join(folder, filename), "r") as f:
                data = json.load(f)
                tasks.append({"filename": filename, "data": data})
    return tasks

#### Load API-Key

In [168]:
def load_api_key(file_path="key.env"):
    load_dotenv(file_path)
    import openai
    openai.api_key = os.getenv("OPENAI_API_KEY")
    if not openai.api_key:
        print("No API key found. Please set OPENAI_API_KEY in key.env.")
    global client
    client = OpenAI()

#### Building Prompts

Adds the tasks demonstration pairs to the prompt:

In [169]:
def build_prompt(prompt, task_data):
    full_prompt = prompt.strip() + "\n\nHere are the demonstration pairs (JSON data):\n"
    for i, pair in enumerate(task_data['train']):
        full_prompt += f"\nTrain Input {i+1}: {pair['input']}\n"
        full_prompt += f"Train Output {i+1}: {pair['output']}\n"
    return full_prompt

Combines Secondary prompt 1 and 2:

In [170]:
def combine_prompts_1_and_2(prompt_1_response, prompt_2_template):
    combined_prompt = f"""{prompt_2_template.strip()}

Here are visual observations of the task at hand, that may assist you in identifying the transformation:

{prompt_1_response.strip()}

Now provide your transformation analysis based on these observations."""
    return combined_prompt

Combines Secondary prompt 1, 2 and 3:

In [171]:
def combine_prompts_1_2_and_3(prompt_1_response, prompt_2_response, prompt_3_template):
    combined_prompt = f"""{prompt_3_template.strip()}

Here are visual observations of the task that may help inform your implementation:
{prompt_1_response.strip()}

Here are the transformation rules that have been identified based on the task:
{prompt_2_response.strip()}

Now reflect on how you would implement a solution to this task in Python, following the instructions above.
"""
    return combined_prompt


#### Call GPT

In [174]:
def call_gpt(prompt):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content.strip()


### Pipeline

In [175]:
tasks = load_tasks("evaluation_set")
load_api_key()

for i, task in enumerate(tasks[:1]):
    # Build and run secondary prompts:
    full_prompt_1 = build_prompt(prompt_1, task["data"])
    response_1 = call_gpt(full_prompt_1)
    print(f"Prompt 1 Response for task {i + 1}:\n{response_1}\n")
    combined_prompt_2 = combine_prompts_1_and_2(response_1, prompt_2)
    full_prompt_2 = build_prompt(combined_prompt_2, task["data"])
    response_2 = call_gpt(full_prompt_2)
    print(f"Prompt 2 Response for task {i + 1}:\n{response_2}\n")
    combined_prompt_3 = combine_prompts_1_2_and_3(response_1, response_2, prompt_3)
    full_prompt_3 = build_prompt(combined_prompt_3, task["data"])
    response_3 = call_gpt(full_prompt_3)
    print(f"Prompt 3 Response for task {i + 1}:\n{response_3}\n")


Prompt 1 Response for task 1:
- Dominant colors are zero (black), three (green), eight (light blue), and six (pink).
- Uniform rows of color eight in various rows.
- Presence of color two in patterns and structures around greens.
- Sequential arrangement of three and two with clear adjacency.
- Multi-color patterns with two, three, and one were observed.
- Distinct areas of concentration for colors six and three.
- Black color (zero) predominantly occupies the borders and voids.
- Shapes formed by three and two exhibit repetitive motifs.
- Clear contrast between grouped colors (three, one, two) against black.
- Regular occurrences of vertical and horizontal symmetry in patterns.

Prompt 2 Response for task 1:
The transformation from input grids to output grids predominantly involves cleaning up the patterns while retaining recognizable structures. Color 8 (light blue) is condensed into uniform sections, while background black (color 0) is maintained to enhance contrast. Red (color 2) a

In [None]:
print(full_prompt_1)
print("="*50)
print(full_prompt_2)
print("="*50)
print(full_prompt_3)

List visual observations from the training pairs.

- Use bullet points (max 10).
- Focus on colors, shapes, object counts, positions, and differences.
- Avoid reasoning or explanations.
- Be concise. No full sentences, no extra formatting.

Here are the demonstration pairs (JSON data):

Train Input 1: [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0], [0, 0, 6, 6, 6, 6, 6, 6, 0, 0, 6, 6, 6, 6, 3, 6, 0, 0, 0, 0, 0, 0], [0, 0, 8, 8, 3, 3, 8, 8, 0, 0, 8, 3, 3, 3, 8, 8, 0, 0, 0, 0, 0, 0], [0, 0, 8, 8, 3, 3, 8, 8, 0, 0, 8, 8, 3, 3, 8, 8, 0, 0, 3, 0, 0, 0], [0, 3, 8, 8, 3, 3, 8, 8, 0, 0, 8, 8, 3, 3, 8, 3, 0, 0, 0, 3, 0, 0], [0, 3, 8, 8, 3, 3, 8, 8, 0, 0, 8, 8, 3, 3, 8, 8, 3, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 3, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 6, 6, 6, 6, 6, 6, 0, 0, 6, 6, 3, 6, 6, 6, 0, 0, 0, 0, 0, 0], [0, 0, 8, 8, 3, 3, 8, 8, 0, 0, 3, 8, 3, 3, 8, 3, 0, 3, 0, 0, 0, 3], [0, 0, 8, 8, 3, 