APO - Automatic Program Optimizer

APO is a toy project to explore reinforcement learning for program optimization - pattern-rewriting on compute DAGs in the search for the shortest program. The project is inspired by AlphaZero (https://arxiv.org/abs/1712.01815).

The toy language

This toy program computes the expression 7 * ((b + a) - b):

0: add %b %a
1: sub %0 %b
2: 7
3: mul %2 %1
4: ret %3

A toy program consists of a sequence of operations on unsigned, 32bit integers. Every statement has the form

<line_no>: <op>( <id)*

<id> - Identifiers

Identifiers can refer to line numbers or arguments, for example: %2 - is the result of the statement in line 2 (general case %n with n being an integer >= 0). %c - is the value of the third parameter (%a to %z).

<op> - Binary Operators

3: add %0 %1

adds the values of variables %0 and %1 and store the result in %3. available operators are add / sub / mul / and / or / xor.

<op> - Constant-yielding

2: 42

Stores 42 in variable %2.

<op> - pipe (fake use)

3: # %2

Assign to %3 the contents of %2.

<op> - return value

6: ret %5

%5 is the return value of the program.

configuration files

devices.conf

Purpose:

Tensorflow tower configuration file

Syntax:

<tensorflow_device> <name> <task1>[,<task2>]*  <rating>

Interpretation:

create a tower on <tensorflow_device> with the op-prefix <name>/. The tower is instantiated for the tasks specified in <taskSet>. The interpretation of <rating> depends on the task. It is possible to create multiple towers per device.

infer task:: Inference tower used during reinforcement learning. <rating> translates to the number of concurrent search threads using this inference device. An infer tower uses a StagingArea (dev/put_stage is required to pull data into the model before dev/infer_X_dist can be used).
loss task:: Inference tower used for loss computation (logging). This device is unbuffered (directly feeded from tf.placeholders). <rating> does not have any meaning here.
train task:: There must be only a single train device at the moment. <rating> may in the future be used to automatically distribute gradient computation to all train devices.

server.conf

SampleServer configuration (mostly the training sample queue).

train.conf

Soft model parameters. Does not affect the MetaGraph so no rebuilding necessary.

model.conf

Hard model parameters. If any entry changes, the Tensorflow MetaGraph needs to be rebuild (which takes forever).

Name		Name	Last commit message	Last commit date
Latest commit History 381 Commits
include		include
progs		progs
src		src
.gitignore		.gitignore
Makefile		Makefile
README.rst		README.rst
devices.conf.example		devices.conf.example
dummy.task		dummy.task
large.task		large.task
make.conf.example		make.conf.example
model.conf		model.conf
nomut.task		nomut.task
perm.task		perm.task
plot.py		plot.py
racket.task		racket.task
server.conf		server.conf
setup_env.sh		setup_env.sh
small.task		small.task
test.task		test.task
train.conf		train.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APO - Automatic Program Optimizer

The toy language

configuration files

devices.conf

server.conf

train.conf

model.conf

About

Releases

Packages

Languages

simoll/apo

Folders and files

Latest commit

History

Repository files navigation

APO - Automatic Program Optimizer

The toy language

configuration files

devices.conf

server.conf

train.conf

model.conf

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages