Skip to content

Genetic algorithm for maximising the input weight of weighted automata

Notifications You must be signed in to change notification settings

elenagutiv/ga-wfas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genetic Algorithm for the Weight Maximization Problem on Weighted Automata

We propose a genetic-algorithm-based metaheuristic to approximate the so-called Weighted Maximization Problem (WMP), i.e., the problem of computing the word with the highest weight on a weighted automata with weights over the rationals [1]. Since the WMP is an undecidable problem [4], we look at the problem that results from bounding the length of the words in the search space by a fixed value k ≥ 1. We call the later problem the Bounded Weight Maximization Problem (WBMP) and this is the question our algorithm approximates.

Our algorithm is implemented in the C language. The procedure takes as input the matrix representation of a WFA A and a value k ≥ 1. For the code representation of the weights we use rationals of arbitrary precision by means of the library for arbitrary-precision arithmetic on rational numbers, GMP [2]. For the lookup table implementation we use uthash.h [3], a header file written in C that implements a hash table for handling C structures.

[1] E. Gutiérrez, T. Okudono, M. Waga and I. Hasuo. 2020. Genetic Algorithm for the Weighted Maximization Problem on Weighted Automata. To appear in GECCO 2020. arXiv: https://arxiv.org/abs/2004.06581.

[2] The GNU Multiple Precision Arithmetic Library. 1991. GNU Project. (1991). Retrieved January 28, 2020 from gmplib.org.

[3] Troy D. Hanson and Arthur O’Dwyer. 2006. Uthash. (2006). Retrieved January 28, 2020 from https://troydhanson.github.io/uthash/

[4] Azaria Paz. 1971. Introduction to Probabilistic Automata (Computer Science and Applied Mathematics). Academic Press, Inc., Orlando, FL, USA.

Folders

  • bin: execution files
  • inc: header files
  • obj: object files
  • plots: folder that stores plots generated by experiments
  • res: folder that stores the .txt files generated by experiments
  • scripts: script files to perform experiments
  • src: source files
  • test: benchmarks for experiments

General

We strongly recommend to read the extended version of our article [1].

Generally speaking, our algorithm takes as input a WFA matrix description and a value k ≥ 1 that represents the bound in the length of words that are searched. The WFA matrix description is provided in .txt format (examples can be found in tests/). The natural value k and other parameters relative to the genetic algorithm and experiment specification can be introduced in the right format by command line or modifying the header file 'inc/params.h'.

A demo can be run as follows:

make env
make
cd bin/
./main <input_file.txt> bwmp <k_value> <size_hash_block>

This line shows how to run our algorithm on mode 'bwmp', i.e., it will approximate a solution of the BWMP instance given by 'input_file.txt' and k = 'k_value'. The maximum block size of the hash table will be set up to 'size_hash_block'. The output of the algorithm will be shown in stdout. Additionally a log file ('log.txt') can be inspected for further information of the execution.

For instance, run:

./main ../tests/random/Random/1-abcdef-9.txt bwmp 20 6

Experiments

The following experiments can be carried out using our tool and the set of benchmarks Random, in the folder tests/random/Random. See [1] for further details on these experiments.

Please, execute the following commands in the command line inside the folder ga-wfas/ to create the directories obj/, plots/ and res/, and needed subdirectories inside these.

make env

Genetic Algorithm vs Random Search experiment

This experiment compares our genetic-algorithm-based metaheuristic against random search for approximating a solution to the WMP. It uses the set of benchmarks located in tests/random/Random. We perform the comparison by plotting an histogram for each procedure and test case. The histogram represents the distribution of weights observed by the two procedures.

Write the following commands in the command line inside the folder ga-wfas.

make clean
make
make random-search
cd scripts
./exp-comparison.sh

Note:

  • Fix the variable N_SAMPLES in ./exp-comparison.sh as desired to repeat each execution N_SAMPLES times.
  • Beware of space on your machine. The size of each file generated is 15-70MB.

When the execution of exp-comparison.sh finishes, results can be found in ga-wfas/res/ga/random/exp-comparison/ and ga-wfas/res/random-search/random/exp-comparison/ for each algorithm ga and random search, respectively.

To process the results of the experiment, run from the folder scripts:

python plotting-comparison.py

Note: Please, change the value of variable N_SAMPLES in plotting-comparison.py to the value fixed in ./exp-comparison.sh

Memoization Experiment

This experiment compares the use of a simple memoization technique for matrix multiplication operations to improve the algorithm efficiency against a no-memoization version of the algorithm. It uses the set of benchmarks located in tests/random/Random. We perform the comparison in terms of the number of words processed by the two versions.

Write the following commands in the command line inside the folder ga-wfas.

make clean
make
cd scripts
./exp-memoization.sh

Note:

  • Fix variable N_SAMPLES in ./exp-memoization.sh as desired to repeat each execution N_SAMPLES times.
  • Beware of space on your machine. The size of each file generated is 10-100KB.

When the execution of exp-memoization.sh finishes, results can be found in ga-wfas/res/ga/random/exp-memoization/ There are 2 folders there:

  • memo/
  • no-memo/ Memo folder stores for each test file the number of words processed by the algorithm when using memoization, whereas the no-memo folder stores the number of words processed by the algorithm when not using such memoization.

To process the results of the experiment, run:

python plotting-memoization.py

Note: Please, change the value of variable N_SAMPLES in plotting-memoization.py to the value fixed in ./exp-memoization.sh

Examples with known maximum weight

This experiments validates the performance of our algorithm by comparing its solution with the highest weight in the automaton.

To do so, we simply select a subset of 9 benchmarks of Random, located in tests/random/Random-bf-search, and we perform exhaustive search to solve the BWMP precisely. For this, we choose an appropiate length bound k, that makes this search feasible in a reasonable amount of time. This value will vary depending on the size of the alphabet of the input WFA. In addition, we run our algorithm on the same set of benchmarks and the same length bound k.

To repeat this experiment, run the following commands in the folder ga-wfas:

make clean
make
make bf-search
cd scripts
./exp-bf-search.sh

Note:

  • Fix variable N_SAMPLES in ./exp-comparison.sh as desired to repeat each execution N_SAMPLES times.
  • Beware of the space on your machine.

When the execution of exp-bf-search.sh finishes, results can be found in ga-wfas/res/ga/random/exp-bf-search/

To process the results of the experiment, run:

python plotting-bf-search.py

Note: Please, change the value of variable N_SAMPLES in plotting-bf-search.py to the value fixed in ./exp-bf-search.sh

Case of study

This experiment shows a case of study of the application of our tool for detecting misclassified input sequences in a Recurrent Neural Network (RNN). We use our algorithm in combination with the procedure that extracts a WFA from a given RNN [5] to estimate the error between the extracted WFA formula and the WFA formula that describes the specification of the RNN over a bounded-length set of words. This yields to an estimation of the error together with an evidence of why the network is not properly approximating its specification. Given a WFA formula (described in tests/paren/paren.txt) corresponding to the extracted WFA from a given RNN, and a WFA formula, describing the specification of the RNN (in tests/paren/spec.txt), we compute the maximum between the weight of the highest-weighted word of formula and formula over a bounded-length set of words.

Write the following commands in the command line from the folder containing ga-wfas:

cd ga-wfas/
make clean
make
cd scripts
./exp-case-study.sh

When the execution of the later script finishes, results will be stored in /ga-wfas/res/ga/paren/exp-paren/ in two files, one corresponding to formula, and the other one corresponding to formula.

To process the data of the experiment to create a plot, run:

python plotting-case-study.py

The resulting plot (.pdf) is in plots/paren/exp-paren/

[5] Takamasa Okudono, Masaki Waga, Taro Sekiyama, and Ichiro Hasuo. 2019. Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces. AAAI 2020, to appear. CoRR abs/1904.02931 (2019).

Clean options

To remove the folders bin/, obj/ and plots/, use:

clean

To additionally remove /res, use:

clean-all

Contact

email: elena.gutierrez[at]imdea.org

About

Genetic algorithm for maximising the input weight of weighted automata

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published