We propose a genetic-algorithm-based metaheuristic to approximate the so-called Weighted Maximization Problem (WMP), i.e., the problem of computing the word with the highest weight on a weighted automata with weights over the rationals [1]. Since the WMP is an undecidable problem [4], we look at the problem that results from bounding the length of the words in the search space by a fixed value k ≥ 1. We call the later problem the Bounded Weight Maximization Problem (WBMP) and this is the question our algorithm approximates.
Our algorithm is implemented in the C language. The procedure takes as input the matrix representation of a WFA A and a value k ≥ 1. For the code representation of the weights we use rationals of arbitrary precision by means of the library for arbitrary-precision arithmetic on rational numbers, GMP [2]. For the lookup table implementation we use uthash.h [3], a header file written in C that implements a hash table for handling C structures.
[1] E. Gutiérrez, T. Okudono, M. Waga and I. Hasuo. 2020. Genetic Algorithm for the Weighted Maximization Problem on Weighted Automata. To appear in GECCO 2020. arXiv: https://arxiv.org/abs/2004.06581.
[2] The GNU Multiple Precision Arithmetic Library. 1991. GNU Project. (1991). Retrieved January 28, 2020 from gmplib.org.
[3] Troy D. Hanson and Arthur O’Dwyer. 2006. Uthash. (2006). Retrieved January 28, 2020 from https://troydhanson.github.io/uthash/
[4] Azaria Paz. 1971. Introduction to Probabilistic Automata (Computer Science and Applied Mathematics). Academic Press, Inc., Orlando, FL, USA.
- bin: execution files
- inc: header files
- obj: object files
- plots: folder that stores plots generated by experiments
- res: folder that stores the .txt files generated by experiments
- scripts: script files to perform experiments
- src: source files
- test: benchmarks for experiments
We strongly recommend to read the extended version of our article [1].
Generally speaking, our algorithm takes as input a WFA matrix description and a value k ≥ 1 that represents the bound in the length of words that are searched. The WFA matrix description is provided in .txt format (examples can be found in tests/). The natural value k and other parameters relative to the genetic algorithm and experiment specification can be introduced in the right format by command line or modifying the header file 'inc/params.h'.
A demo can be run as follows:
make env
make
cd bin/
./main <input_file.txt> bwmp <k_value> <size_hash_block>
This line shows how to run our algorithm on mode 'bwmp', i.e., it will approximate a solution of the BWMP instance given by 'input_file.txt' and k = 'k_value'. The maximum block size of the hash table will be set up to 'size_hash_block'. The output of the algorithm will be shown in stdout. Additionally a log file ('log.txt') can be inspected for further information of the execution.
For instance, run:
./main ../tests/random/Random/1-abcdef-9.txt bwmp 20 6
The following experiments can be carried out using our tool and the set of benchmarks Random, in the folder tests/random/Random. See [1] for further details on these experiments.
Please, execute the following commands in the command line inside the folder ga-wfas/ to create the directories obj/, plots/ and res/, and needed subdirectories inside these.
make env
This experiment compares our genetic-algorithm-based metaheuristic against random search for approximating a solution to the WMP. It uses the set of benchmarks located in tests/random/Random. We perform the comparison by plotting an histogram for each procedure and test case. The histogram represents the distribution of weights observed by the two procedures.
Write the following commands in the command line inside the folder ga-wfas.
make clean
make
make random-search
cd scripts
./exp-comparison.sh
Note:
- Fix the variable
N_SAMPLES
in ./exp-comparison.sh as desired to repeat each executionN_SAMPLES
times. - Beware of space on your machine. The size of each file generated is 15-70MB.
When the execution of exp-comparison.sh
finishes, results can be found in ga-wfas/res/ga/random/exp-comparison/ and ga-wfas/res/random-search/random/exp-comparison/ for each algorithm ga and random search, respectively.
To process the results of the experiment, run from the folder scripts:
python plotting-comparison.py
Note:
Please, change the value of variable N_SAMPLES
in plotting-comparison.py to the value fixed in ./exp-comparison.sh
This experiment compares the use of a simple memoization technique for matrix multiplication operations to improve the algorithm efficiency against a no-memoization version of the algorithm. It uses the set of benchmarks located in tests/random/Random. We perform the comparison in terms of the number of words processed by the two versions.
Write the following commands in the command line inside the folder ga-wfas.
make clean
make
cd scripts
./exp-memoization.sh
Note:
- Fix variable
N_SAMPLES
in ./exp-memoization.sh as desired to repeat each executionN_SAMPLES
times. - Beware of space on your machine. The size of each file generated is 10-100KB.
When the execution of exp-memoization.sh finishes, results can be found in ga-wfas/res/ga/random/exp-memoization/ There are 2 folders there:
- memo/
- no-memo/ Memo folder stores for each test file the number of words processed by the algorithm when using memoization, whereas the no-memo folder stores the number of words processed by the algorithm when not using such memoization.
To process the results of the experiment, run:
python plotting-memoization.py
Note:
Please, change the value of variable N_SAMPLES
in plotting-memoization.py to the value fixed in ./exp-memoization.sh
This experiments validates the performance of our algorithm by comparing its solution with the highest weight in the automaton.
To do so, we simply select a subset of 9 benchmarks of Random, located in tests/random/Random-bf-search, and we perform exhaustive search to solve the BWMP precisely. For this, we choose an appropiate length bound k, that makes this search feasible in a reasonable amount of time. This value will vary depending on the size of the alphabet of the input WFA. In addition, we run our algorithm on the same set of benchmarks and the same length bound k.
To repeat this experiment, run the following commands in the folder ga-wfas:
make clean
make
make bf-search
cd scripts
./exp-bf-search.sh
Note:
- Fix variable
N_SAMPLES
in ./exp-comparison.sh as desired to repeat each executionN_SAMPLES
times. - Beware of the space on your machine.
When the execution of exp-bf-search.sh finishes, results can be found in ga-wfas/res/ga/random/exp-bf-search/
To process the results of the experiment, run:
python plotting-bf-search.py
Note:
Please, change the value of variable N_SAMPLES
in plotting-bf-search.py to the value fixed in ./exp-bf-search.sh
This experiment shows a case of study of the application of our tool for detecting misclassified input sequences in a Recurrent Neural Network (RNN). We use our algorithm in combination with the procedure that extracts a WFA from a given RNN [5] to estimate the error between the extracted WFA and the WFA that describes the specification of the RNN over a bounded-length set of words. This yields to an estimation of the error together with an evidence of why the network is not properly approximating its specification. Given a WFA (described in tests/paren/paren.txt) corresponding to the extracted WFA from a given RNN, and a WFA , describing the specification of the RNN (in tests/paren/spec.txt), we compute the maximum between the weight of the highest-weighted word of and over a bounded-length set of words.
Write the following commands in the command line from the folder containing ga-wfas:
cd ga-wfas/
make clean
make
cd scripts
./exp-case-study.sh
When the execution of the later script finishes, results will be stored in /ga-wfas/res/ga/paren/exp-paren/ in two files, one corresponding to , and the other one corresponding to .
To process the data of the experiment to create a plot, run:
python plotting-case-study.py
The resulting plot (.pdf) is in plots/paren/exp-paren/
[5] Takamasa Okudono, Masaki Waga, Taro Sekiyama, and Ichiro Hasuo. 2019. Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces. AAAI 2020, to appear. CoRR abs/1904.02931 (2019).
To remove the folders bin/, obj/ and plots/, use:
clean
To additionally remove /res, use:
clean-all
email: elena.gutierrez[at]imdea.org