Skip to content

Transposition of my MDL-based approach to ARC with strings instead of grids as data.

License

Notifications You must be signed in to change notification settings

sebferre/ARC-MDL-strings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MDL-based Machine Learning for Abstraction and Reasoning Tasks on Strings

Suppose that you have a spreadsheet with a number of columns (the inputs), and that you want to automatically fill additional columns (the outputs) as a function of the input columns. Instead of programming a formula or a program to do the job, you would prefer to provide a few examples, and have an AI system to synthesize a program matching your intention.

input output
marie Dupont - 19/4/2000 M. Dupont, 2000
Jean Martin - 12/8/1999 J. Martin, 1999
Marc Bonpain - 13/12/2002 ???
... ???

A correct program could be printf("%s. %s, %d", uppercase(substr(firstname,0,1)), lastname, year), where firstname, lastname and year are substrings of the input string, possibly extracted with regexps.

This is known as program synthesis or programming by examples, and this has been studied for already a long time. We can cite in particular the work of Gulwani, Automating string processing in spreadsheets using input-output examples, which has shown some success and has been integrated into Microsoft Excel.

A parallel can be made with the Abstraction and Reasoning Corpus (ARC), where the objective is the same, to learn by examples how to generate an output from an input, except that inputs and outputs are colored grids. ARC has been introduced as a kind of IQ test for AI, and to foster AI research towards broad generalization capabilities.

I started to develop an approach to ARC based on object-centered models for grids, and the Minimum Description Length (MDL) principle. The ARC-MDL-strings project aims at evaluating the genericity of the approach by transposing it from grids to strings. Another motivation is that it also has obvious practical applications for data wrangling.

Installation Instructions

The tool is developed in OCaml, and compiled to Javascript with the js_of_ocaml tool. It is strongly recommended to use the opam tool to manage OCaml dependencies.

The dependencies are:

  • OCaml compilers version 4.11.1+
  • opam packages: yojson, ppx_deriving_yojson, js_of_ocaml, js_of_ocaml-lwt, js_of_ocaml-ppx
  • personal libraries: ocaml-lib, fablis

When all dependencies are installed, the tool can be compiled by moving in the src directory, and executing the command make lis. This generates a src/html/script.js file that contains all the code of a web application, along with other files in the src/html directory (HTML, CSS, ...).

User Interface

To run the tool, simply open src/html/index.html file in your browser (Firefox or Chrome recommended). Here is a screenshot of the tool UI.

screenshot

In short, here are the main UI components:

  • top left: the model under construction, in the form INPUT-PATTERN => OUTPUT-TEMPLATE, along with a number of description lengths (DL);
  • top right: a list of suggestions for refining the model, from the most compressive one to the less compressive one. Suggestions in red are not compressive but they are useful for testing. The button Choose enables to load a new task, which resets the model to the initial empty model;
  • bottom: examples, one per row, with contents dispatched into three columns:
    • Example: the input and the expected output
    • Description: the input-output pair, as seen by the current model. Void for test examples.
    • Prediction: the input as seen by the input pattern, and the predicted output according to the output template, to be compared with the expected output in the left column.

JSON schema for Tasks

The tool accepts task descriptions, i.e. input-output pairs for training and for test, along the following JSON schema.

{"train": [
   {"input": [A1, B1], "output": [C1, D1]},
   {"input": [A2, B2], "output": [C2, D2]},
   ...],
 "test": [
   {"input": [A3, B3], "output": [C3, D3]},
   ...]
}

A1, B1, A2... must be strings where A, B, C... denote columns, and indices 1, 2... denote rows/examples. Singleton lists like ["foo"] can be flatten as string "foo".

Credits

Author: Sébastien Ferré

Affiliation: Univ. Rennes 1, team LACODAM at IRISA

Copyright © 2022 Sébastien Ferré, IRISA, Université de Rennes 1, France

Licence: GPLv3

About

Transposition of my MDL-based approach to ARC with strings instead of grids as data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published