Author: | Lukas Prokop |
---|---|
Date: | April to August 2016 |
Version: | 1.0.0 |
license: | CC-0 |
cnf-analysis-go is a tool to analyze DIMACS CNF files. Those files are commonly used to decode SAT problems and a few basic features of a CNF file might tell you something about the problem stated.
This tool evaluates a list of features which are thoroughly described by the project and stores them in a JSON file.
To install use ` go get
and go install
<https://golang.org/cmd/go/>`_:
$ go get github.com/prokls/cnf-analysis-go $ go install github.com/prokls/cnf-analysis-go $ ls -hl $GOROOT/bin/cnf-analysis-go -rwxrwxr-x 1 prokls prokls 3.4M Aug 14 16:18 /opt/go-1.6.2/bin/cnf-analysis-go
Then the command line tool to analyze CNF files is available:
$ echo "p cnf 3 2\n1 -3 0\n2 3 -1 0\n" > example.cnf $ cnf-analysis-go example.cnf 2016/08/14 14:45:42 considering example.cnf 2016/08/14 14:45:42 writing file example.stats.json $ cat example.stats.json [{"@cnfhash":"cnf2$7d16f8d71b7097a2f931936ae6d03d738117b2c6","@filename":"exa $ cat example.stats.json | jq [ { "@cnfhash": "cnf2$7d16f8d71b7097a2f931936ae6d03d738117b2c6", "@filename": "example.cnf", "@md5sum": "04f6bf2c537242f15082867e66847bd7", "@sha1sum": "23dd9e64ae0fb4806661b49a31e7f5e692f2d045", "@timestamp": "2016-08-14T12:45:42.262244157", "@version": "1.0.0", "featuring": { "clause_variables_sd_mean": 0.9082483053207397, "clauses_count": 2, "clauses_length_largest": 3, "clauses_length_mean": 2.5, "clauses_length_median": 2.5, "clauses_length_sd": 0.5000000000000001, "clauses_length_smallest": 2, "connected_literal_components_count": 3, "connected_variable_components_count": 1, "definite_clauses_count": 1, "existential_literals_count": 1, "existential_positive_literals_count": 1, "false_trivial": true, "goal_clauses_count": 0, "literals_count": 5, "literals_frequency_0_to_5": 1, "literals_frequency_5_to_10": 0, "literals_frequency_10_to_15": 0, "literals_frequency_15_to_20": 0, "literals_frequency_20_to_25": 0, "literals_frequency_25_to_30": 0, "literals_frequency_30_to_35": 0, "literals_frequency_35_to_40": 0, "literals_frequency_40_to_45": 0, "literals_frequency_45_to_50": 0, "literals_frequency_50_to_55": 5, "literals_frequency_55_to_60": 0, "literals_frequency_60_to_65": 0, "literals_frequency_65_to_70": 0, "literals_frequency_70_to_75": 0, "literals_frequency_75_to_80": 0, "literals_frequency_80_to_85": 0, "literals_frequency_85_to_90": 0, "literals_frequency_90_to_95": 0, "literals_frequency_95_to_100": 0, "literals_frequency_entropy": 2.5, "literals_frequency_largest": 0.5, "literals_frequency_mean": 0.4166666666666667, "literals_frequency_median": 0.5, "literals_frequency_sd": 0.18633899812498247, "literals_frequency_smallest": 0, "literals_occurence_one_count": 5, "nbclauses": 2, "nbvars": 3, "negative_literals_in_clause_largest": 1, "negative_literals_in_clause_mean": 1, "negative_literals_in_clause_smallest": 1, "negative_unit_clause_count": 0, "positive_literals_count": 3, "positive_literals_in_clause_largest": 2, "positive_literals_in_clause_mean": 1.5, "positive_literals_in_clause_median": 1.5, "positive_literals_in_clause_sd": 0.5000000000000001, "positive_literals_in_clause_smallest": 1, "positive_negative_literals_in_clause_ratio_entropy": 0.8899749834391559, "positive_negative_literals_in_clause_ratio_stdev": 0.08333334326744612, "positive_negative_literals_in_clause_ratio_mean": 0.5833333730697632, "positive_unit_clause_count": 0, "tautological_literals_count": 0, "true_trivial": true, "two_literals_clause_count": 1, "variables_frequency_0_to_5": 0, "variables_frequency_5_to_10": 0, "variables_frequency_10_to_15": 0, "variables_frequency_15_to_20": 0, "variables_frequency_20_to_25": 0, "variables_frequency_25_to_30": 0, "variables_frequency_30_to_35": 0, "variables_frequency_35_to_40": 0, "variables_frequency_40_to_45": 0, "variables_frequency_45_to_50": 0, "variables_frequency_50_to_55": 1, "variables_frequency_55_to_60": 0, "variables_frequency_60_to_65": 0, "variables_frequency_65_to_70": 0, "variables_frequency_70_to_75": 0, "variables_frequency_75_to_80": 0, "variables_frequency_80_to_85": 0, "variables_frequency_85_to_90": 0, "variables_frequency_90_to_95": 0, "variables_frequency_95_to_100": 2, "variables_frequency_entropy": 0.5, "variables_frequency_largest": 1, "variables_frequency_mean": 0.8333333333333334, "variables_frequency_median": 1, "variables_frequency_sd": 0.23570226039551584, "variables_frequency_smallest": 0.5, "variables_largest": 3, "variables_smallest": 1, "variables_used_count": 3 } } ]
cnf-analysis-go
creates a few workers (its number can be
set by -u
) and puts CNF files in a queue. Whenever a worker
has finished its work, it receives the next CNF file. A worker
is modelled by a goroutine. Per default 4 workers are used.
This implementation is recognizably faster than cnf-analysis-go
and uses less memory. The following data was retrieved for
SAT competition 2016 files:
esawn_uw3.debugged.cnf
(1.4 GB) in app16 took 1 hour and 13 minutesbench_573.smt2.cnf
(1.6 MB) in Agile took 1 second- SAT competition 2009 files (5.9 GB) have been computed in 2 hours and 16 minutes
I am using my Thinkpad x220t with 16GB RAM and an Intel Core i5-2520M CPU (2.50GHz) as reference system here.
The major advantage of cnf-analysis-go
over cnf-analysis-py
is its memory usage.
esawn_uw3.debugged.cnf
(1.4 GB) in app16 used approx. 3 GB RAMbench_573.smt2.cnf
(1.6 MB) in Agile uses approx. 2 MB memory
Considering a factor of 10 for cnf-analysis-py
, I assume a factor
(ratio of filesize to RAM usage) of at most 3 for cnf-analysis-go
.
Its memory usage is generally much more constant, but on the other hand
the results are less accurate.
Because of these data I recommend up to 6 units for 8 GB RAM or 12 units for 16 GB RAM.
I achieved my goal to make this implementation memory-efficient.
- golang or specifically 'go1.6.2 linux/amd64' was used
It has only one external dependency, namely
--ignore c --ignore x
- Ignore any lines starting with "c" or "x". If none is specified "c" and "%" is ignored.
--units 4
or-u 4
- use (at most) 4 parallel units
--no-hashes
or-n
- skip hash computations (SHA1, MD5, cnfhash)
--fullpath
or-p
- print full path, not basename
--skip-existing
or-s
- skip stats computation if stats.json already exists
DIMACS files are read by skipping any lines starting with characters
from --ignore
. The remaining content is parsed (header line with
nbvars
and nbclauses
) and in the remaining line, integers are
retrieved and passed over. Hence the parser yields a sequence of
literals.
Features are documented in my paper "Analyzing CNF benchmarks".
Cheers, prokls