## <div  style="color:black;  font-size:100%; text-align:left;padding:12.0px; background:#ffffff"> Welcome to the Abstraction and Reasoning Challenge (ARC), a potential major step towards achieving artificial general intelligence (AGI)! In this competition, we are challenged to build an algorithm that can perform reasoning tasks it has never seen before. Classic machine learning problems generally involve one specific task which can be solved by training on millions of data samples. But in this challenge, we need to build an algorithm that can learn patterns from a minimal number of examples. </div>

## <div  style="color:#D35142;  font-weight:bold; font-size:100%; text-align:center;padding:12.0px; background:#ffffff"> Thank you for your attention! Please upvote this kernel if you like it. It motivates me to produce more quality content) </div>





<center>
<img src="https://i.postimg.cc/26RtyM0s/3221asdf.jpg" width=1100>
</center>



# <div  style="color:white; border:lightgreen solid;  font-weight:bold; font-size:120%; text-align:center;padding:12.0px; background:black">1. OVERVIEW</div>


# Additional notebooks

### For convenience, there is a visualization of all 800 tasks, including training set (400) and evaluating set (400):
- to get the full vision about task, and
- to see the true scale and complexity of the problem

### For memory and time optimization we will split it into **two** notebooks:

- [Visualizing **training** set](https://www.kaggle.com/code/allegich/arc-2024-show-all-400-tasks-training-set/)
- [Visualizing **evaluating** set](https://www.kaggle.com/code/allegich/arc-2024-show-all-400-evaluating-tasks/)

# Goal
The objective of this competition is to create an algorithm that is capable of solving **abstract reasoning** tasks. Critically, these are **novel** tasks: tasks that the algorithm has never seen before. Hence, **simply memorizing** a set of reasoning templates will **not suffice**.

The goal is to construct the output grid(s) corresponding to the test input grid(s), using 2 trials for each test input.

# Approach overview
- Artificial general intelligence (**AGI**) is a type of artificial intelligence (**AI**) that matches or surpasses human capabilities across a wide range of cognitive tasks. This is in contrast to narrow AI, which is designed for specific tasks. AGI is considered one of various definitions of strong AI
- Creating AGI is a primary goal of AI research and of companies such as OpenAI, DeepMind, Anthropic and other. A 2020 survey identified 72 active AGI R&D projects  spread across 37 countries

# Data overview
- A "grid" is a **rectangular** matrix (list of lists) of integers between 0 and 9 (**inclusive**). The smallest possible grid size is **1x1** and the largest is **30x30**
- The public evaluation set is different from the public training set (**which is significantly easier**)

The following **three datasets** are associated with the ARC Prize competition:
- Public training set
- Public evaluation set
- Private evaluation set





**PUBLIC:**
- The publicly available data is to be used for training and evaluation
- The **public training set** contains 400 task files you will use to train your algorithm
- The **public evaluation set** contains 400 task files for to test the performance of your algorithm



**PRIVATE:**
- The **private evaluation set** contains 100 task files
- The ARC-AGI leaderboard is measured using 100 private evaluation tasks which are privately held on Kaggle. These tasks are private to ensure models may not be trained on them. These tasks are not included in the public tasks, but they do use the **same structure and cognitive priors**
- Public training set consists of **simpler tasks** whereas the public evaluation set is roughly the **same level** of difficulty as the private test set



**DIFFICULTY OF SETS:**
- The public training set is **significantly easier** than the others (public evaluation and private evaluation set) since it contains many "curriculum" type tasks intended to demonstrate Core Knowledge systems. It's like a tutorial level
- The public evaluation sets and the private test sets are intended to be the **same difficulty**


# Evaluation
- The competition evaluates submissions on the **percentage of correct predictions** on the private evaluation set (**100 tasks**)
- For each task, you should predict **exactly 2 outputs**  for every test input grid contained in the task (attempt_1, attempt_2). **All cells** should match the expected answer. Otherwise you score will be 0.  Each task's test output has **one ground truth**
- Tasks can have **more than one test input** that needs a predicted output. But most tasks only have a single output
- The **final score** is the sum averaged of the highest score per task output divided by the total number of task test outputs. *Ex: If there are two task outputs, and one is 100% correct and the other is 0% correct, your score is 0.5*


# Transformation overview

We can classify several typical transformations:

- **Geometry**
    - do nothing
    - rotate / mirror / shift image
    - crop image background
    - draw border


- **Objects**
    - rotate / mirror / shirt objects
    - move two objects together
    - move objects to edge
    - extend / repeat an object
    - delete an object
    - count unique objects and select the object that appears the most times
    - create pattern based on image colors
    - overlay object
    - replace objects


- **Coloring**
    - select colors for objects
    - select dominant/smallest color in image
    - denoise
    - fill in empty spaces


- **Lines**
    - color edges
    - extrapolate a straight/diagonal line
    - draw a line between two dots / or inersections between such lines
    - draw a spiral


- **Grids**
    - select grid squares with most pixels


- **Patterns**
    - complete a symmetrical/repeating pattern 


- **Subtasks**
    - object detection / cohesion / seperation
    - object persistance
    - counting or sorting objects


If such problem classes can be correctly detected, it may be possible to get some quick wins by writing a libary of simple solvers for known problems. These could be tried in sequence before resorting to more advanced general purpose algorithms.

# <div  style="color:white; border:lightgreen solid;  font-weight:bold; font-size:120%; text-align:center;padding:12.0px; background:black">2. DATA LOADING AND PREPARATION</div>


## Import libraries and define parameters

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from   matplotlib import colors
import seaborn as sns

import json
import os
from pathlib import Path

from subprocess import Popen, PIPE, STDOUT
from glob import glob

In [None]:
base_path='../../data/arc-prize-2024/'
# Loading JSON data
def load_json(file_path):
    with open(file_path) as f:
        data = json.load(f)
    return data

In [None]:
# Reading files
training_challenges =  load_json(base_path +'arc-agi_training_challenges.json')
training_solutions =   load_json(base_path +'arc-agi_training_solutions.json')
evaluation_challenges =load_json(base_path +'arc-agi_evaluation_challenges.json')
evaluation_solutions = load_json(base_path +'arc-agi_evaluation_solutions.json')


# <div  style="color:white; border:lightgreen solid;  font-weight:bold; font-size:120%; text-align:center;padding:12.0px; background:black">3. DATA EXPLORATION</div>


All datasets have 400 JSON tasks:

In [None]:
print(f'Number of training challenges = {len(training_challenges)}')

In [None]:
print(f'Number of training solutions = {len(training_solutions)}')

In [None]:
print(f'Number of evaluation challenges = {len(evaluation_challenges)}')

In [None]:
print(f'Number of evaluation solutions = {len(evaluation_solutions)}')

The names of the first fife "training challenges" are shown below:

In [None]:
for i in range(5):
    t=list(training_challenges)[i]
    task=training_challenges[t]
    print(f'Set #{i}, {t}')

In each task, there are **two** dictionary keys, **train** and **test**. We learn the pattern from the train input-output pairs, and then apply the pattern to the test input, to predict an output.

In [None]:
task = training_challenges['007bbfb7']
print(task.keys())

Tasks have multiple train input-output pairs. Most tasks have a single test input-output pair, although some have more than one.

In [None]:
n_train_pairs = len(task['train'])
n_test_pairs = len(task['test'])

print(f'task contains {n_train_pairs} training pairs')
print(f'task contains {n_test_pairs} test pairs')

Dive into the first train input-output pair, we can see the grids are expressed as 2d lists with integers 0-9:

In [None]:
display(task['train'][0]['input'])
display(task['train'][0]['output'])

### Function to plot input/output pairs of a task

In [None]:
# 0:black, 1:blue, 2:red, 3:greed, 4:yellow, # 5:gray, 6:magenta, 7:orange, 8:sky, 9:brown

_cmap = colors.ListedColormap(
    ['#000000', '#0074D9','#FF4136','#2ECC40','#FFDC00',
     '#AAAAAA', '#F012BE', '#FF851B', '#7FDBFF', '#870C25'])
norm = colors.Normalize(vmin=0, vmax=9)

plt.figure(figsize=(4, 1), dpi=200)
plt.imshow([list(range(10))], cmap=_cmap, norm=norm)
plt.xticks(list(range(10)))
plt.yticks([])
plt.show()

In [None]:
def plot_task(task, task_solutions, i, t):
    """    Plots the train and test pairs of a specified task,
    using same color scheme as the ARC app    """    
    
    num_train = len(task['train'])
    num_test  = len(task['test'])
    
    w=num_train+num_test
    fig, axs  = plt.subplots(2, w, figsize=(3*w ,3*2))
    plt.suptitle(f'Set #{i}, {t}:', fontsize=20, fontweight='bold', y=1)
    
    for j in range(num_train):     
        plot_one(axs[0, j], j,'train', 'input')
        plot_one(axs[1, j], j,'train', 'output')        
    
    plot_one(axs[0, j+1], 0, 'test', 'input')

    cmap = colors.ListedColormap(['#000000', '#0074D9', '#FF4136', '#2ECC40', '#FFDC00',
                                  '#AAAAAA', '#F012BE', '#FF851B', '#7FDBFF', '#870C25'])
    norm = colors.Normalize(vmin=0, vmax=9)
    answer = task_solutions
    input_matrix = answer
    
    axs[1, j+1].imshow(input_matrix, cmap=cmap, norm=norm)
    axs[1, j+1].grid(True, which = 'both',color = 'lightgrey', linewidth = 0.5)
    axs[1, j+1].set_yticks([x-0.5 for x in range(1 + len(input_matrix))])
    axs[1, j+1].set_xticks([x-0.5 for x in range(1 + len(input_matrix[0]))])     
    axs[1, j+1].set_xticklabels([])
    axs[1, j+1].set_yticklabels([])
    axs[1, j+1].set_title('TEST OUTPUT', color = 'green', fontweight='bold')

    fig.patch.set_linewidth(5)
    fig.patch.set_edgecolor('black')  # substitute 'k' for black
    fig.patch.set_facecolor('#dddddd')
   
    plt.tight_layout()
    plt.show()  
    
    print()
    print()
    
    
def plot_one(ax, i, train_or_test, input_or_output):
    cmap = colors.ListedColormap(['#000000', '#0074D9', '#FF4136', '#2ECC40', '#FFDC00',
                                  '#AAAAAA', '#F012BE', '#FF851B', '#7FDBFF', '#870C25'])
    norm = colors.Normalize(vmin=0, vmax=9)
    input_matrix = task[train_or_test][i][input_or_output]
    ax.imshow(input_matrix, cmap=cmap, norm=norm)
    ax.grid(True, which = 'both',color = 'lightgrey', linewidth = 0.5)
    
    plt.setp(plt.gcf().get_axes(), xticklabels=[], yticklabels=[])
    ax.set_xticks([x-0.5 for x in range(1 + len(input_matrix[0]))])     
    ax.set_yticks([x-0.5 for x in range(1 + len(input_matrix))])   
    ax.set_title(train_or_test + ' ' + input_or_output, fontweight='bold')

# Visualization Training set

In [None]:
for i in range(0,20):
    t=list(training_challenges)[i]
    task=training_challenges[t]
    task_solution = training_solutions[t][0]
    plot_task(task,  task_solution, i, t)

# Visualization Evaluating set

In [None]:
for i in range(0,20):
    t=list(evaluation_challenges)[i]
    task=evaluation_challenges[t]
    task_solution = evaluation_solutions[t][0]
    plot_task(task,  task_solution, i, t)

## <div  style="color:#D35142;  font-weight:bold; font-size:100%; text-align:center;padding:12.0px; background:#ffffff"> Thank you for your attention! Please upvote this kernel if you like it. It motivates me to produce more quality content) </div>