# Setup

You should download and install julia (we use version 1.10.4 but it should work with newer versions, I think) and then clone https://github.com/mlb2251/Stitch.jl to ../Stitch.jl.

Then install requirements.txt for this repository. You should then be able to run this notebook

# Example

The following code is an example of compression.

In [1]:
import os

os.chdir("..")

In [2]:
%load_ext autoreload
%autoreload 2
%load_ext jupyter_black


In [3]:
import neurosym as ns
import tqdm.auto as tqdm

from datasets import load_dataset

from imperative_stitch.parser import converter
from imperative_stitch.compress.julia_stitch import run_julia_stitch
from imperative_stitch.compress.abstraction import Abstraction
from imperative_stitch.compress.manipulate_abstraction import abstraction_calls_to_stubs

from s_expression_parser import nil

In [4]:
file_1 = """
def f(x):
    y = function_1(x ** 2 + x ** x + x * 2 - x + 3, x)
    print(y)
    z = function_2(x, y ** x + y ** y)
    t = function_3(x, z)
    return x, y, t
"""

file_2 = """
def g(a, y=2):
    b = function_1(a ** 2 + a ** 3 + a * 2 - a + 3, a)
    c = function_2(a, b ** a + b ** b)
    d = b ** 2 + c ** 3 + a - 2
    return x, y, d
"""

dataset = [ns.python_to_s_exp(x) for x in (file_1, file_2)]
stitch_jl_dir = "../Stitch.jl"
iters = 10
max_arity = 3

In [5]:
_, abstrs, rewritten = run_julia_stitch(
    dataset,
    stitch_jl_dir=stitch_jl_dir,
    iters=iters,
    max_arity=max_arity,
    quiet=True,
    root_states=("S", "seqS", "E"),
    metavariable_statements=True,
    metavariables_anywhere=False,
    minimum_number_matches=2,
    application_utility_metavar=-1,
    application_utility_symvar=-0.2,
    application_utility_fixed=-0.5,
)

In [6]:
abstrs = [Abstraction.of(name=f"fn_{i}", **x) for i, x in enumerate(abstrs, 1)]
abstrs_d = {abstr.name: abstr for abstr in abstrs}

In [7]:
abstraction_code = {
    abstr.name: abstraction_calls_to_stubs(
        abstr.body_with_variable_names(), abstrs_d
    ).to_python()
    for abstr in abstrs
}
rewritten_code = [
    abstraction_calls_to_stubs(
        converter.s_exp_to_python_ast(ns.parse_s_expression(rewr)), abstrs_d
    ).to_python()
    for rewr in rewritten
]

## Abstractions

Here only one abstraction was found, it can be seen below as a 2-line piece of code that shares the exact structure except for

- the variables (abstracted away by %1 through %3)
- the second exponent, which is abstracted away as "$0"
- the extra print statement, abstracted away by ?0

In [8]:
print("ABSTRACTIONS")
for name in abstraction_code:
    print()
    print(name)
    print(abstraction_code[name])

ABSTRACTIONS

fn_1
%2 = function_1(%1 ** 2 + %1 ** #0 + %1 * 2 - %1 + 3, %1)
?0
%3 = function_2(%1, %2 ** %1 + %2 ** %2)


## Rewritten programs

Here we can see the rewritten programs, which have differing structure before and after the abstraction call, as well as different arguments to the abstraction

In [9]:
print("REWRITTEN")
for rewr in rewritten_code:
    print("*" * 80)
    print(rewr)

REWRITTEN
********************************************************************************
def f(x):
    fn_1(__code__('x'), __ref__(x), __ref__(y), __ref__(z), __code__('print(y)'))
    t = function_3(x, z)
    return (x, y, t)
********************************************************************************
def g(a, y=2):
    fn_1(__code__('3'), __ref__(a), __ref__(b), __ref__(c), __code__(''))
    d = b ** 2 + c ** 3 + a - 2
    return (x, y, d)
