This is a modular and extendable benchmark of progressively more difficult AI tasks to measure learning speed of ML systems.
This repository contains the code to generate the incremental task dataset used in [1].
This package can also be used as a library. Just install it from PyPI (ideally in a virtual env if you don't want the CLI command to pollute your path).
pip install incremental_tasks
This installs the library as well as an executable generate_tasks_cli
The command generate_tasks_cli
can be used to directly generate sequences from
the command line. They are printed to stdout and can be saved to a file to
quickly create a dataset.
A user can try the tasks by himself by running generate_tasks_cli
. This will
start an interactive session that will show random examples from the tasks of
the benchmarks, starting from the easiest.
Once a task is solved, it switches to a new harder one.
An example interactive session:
$ generate_tasks_cli --interactive
======================================================================
0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 {?} {?} {?} {?} {?}
Type you answers (space separated) 0 0 0 1 1
OK!
0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1
======================================================================
1 0 0 0 1 0 0 0 {?} {?} {?} {?} {?}
Type you answers (space separated) 0 1 1 1 0
Wrong! right answer was:
1 0 0 0 1 0 0 0 1 0 0 0 1
In [1] the human evaluation score were computed using this interactive
game with the extra flag --human-eval
which maps every token to a random one
so the player doesn't have any prior knowledge about the text and needs to do
pattern matching like a neural network would.
You can use the library in your own code to generate the data on the fly:
from incremental_tasks import ElementaryLanguageWithWorldDef
task = ElementaryLanguageWithWorldDef()
To generate a single sentence from the task use generate_single
:
print(task.generate_single())
# This will print (['I', 'DO', 'NOT', 'SMELL', 'PETER', '.', 'DO', 'I', 'SMELL', 'PETER', '?', 'NO'], [11])
To generate n
unique sequences (will be less than n
if there aren't enough
available unique sequences):
task.generate_tasks(max_n_seq=n)
A task can also create a generator that will yield an endless stream of sequences (not necessarily unique):
task.generate_tasks_generator(max_n_seq=None)