This repository contains the experimental data for the paper
Fuchs, Bach, Iser. "Active Learning for SAT Solver Benchmarking"
accepted at the conference TACAS 2023 as well as an extended (currently unpublished) version at the Journal of Automated Reasoning.
The paper source and code are available in a separate repository.
Clone this repository and the code repository in the same parent directory.
Instructions to reproduce the experiments are included in the code repository's README.
The data were obtained on a server with an AMD EPYC 7551
CPU (32 physical cores, base clock of 2.0 GHz) and 128 GB RAM.
The Python version was 3.9.12
.
With this configuration, the experimental pipeline (run.sh
) took about a week.
In the following, we describe how to use the data as well as how this repository is structured.
Most of our files are pickled Python objects, e.g., lists, sets, dictionaries, and pandas
data frames.
This has the advantage of not needing to preprocess any data.
Use the following code snippet to inspect a file's contents (tested with Python 3.9.12
).
import pickle
with open("pickled-data/anni_final_df.pkl", "rb") as file:
anni_final_df = pickle.load(f)
print(anni_final_df.head())
Some of the experimental pipeline's input and output files are plain CSVs.
You can easily read any of them with pandas
.
import pandas as pd
results = pd.read_csv("pickled-data/active_learning_instance_sampling_frequencies.csv")
Instance features and runtimes come in the form of sqlite3
databases.
To inspect them, type for example sqlite3 gbd-data/meta.db
.
Thereafter, you may use, e.g., the .tables
command to list available databases or the .schema [tablename]
command to examine a table's schema.
- The folder
gbd-data
contains the relevant data of the Global Benchmark Database (GBD; originally from here).anni-seq.csv
is a CSV file containing the runtimes of 28 SAT solvers on all 5355 SAT Competition 2022 Anniversary Track instances.base.db
is asqlite3
database containing thefeatures
table. It contains values of 54 instance features for all instances.meta.db
is asqlite3
database with metadata containing thefeatures
table. It shows the instance family of each instance among other data.
- The folder
pickled-data
contains final results, intermediate results, and preprocessed data for reproducing our experiments.- The folder
pickled-data/end_to_end
contains both pickle and CSV files. The pickle files are binarynumpy
arrays indicating which instances are selected for each particular hyperparameter choice. The CSV files contain results for the respective hyperparameter choice. The README's section Using the Data shows how to inspect those files. active_learning_instance_sampling_frequencies.csv
is a CSV file showing how often instances are selected by the best-performing active-learning approach.anni_final_df.pkl
andanni_full_df.pkl
are pickledpandas
data frames that include the data fromgbd-data/anni-seq.csv
in a preprocessed form.anni_train_df.pkl
is a pickledpandas
data frame including a subset ofanni_final_df.pkl
for hyper-parameter optimization purposes.base_features_df.pkl
is a pickledpandas
data frame including the already normalized and preprocessed features fromgbd-data/base.db
.meta_df.pkl
is a pickledpandas
data frame including the already preprocessed features fromgbd-data/meta.db
.
- The folder
README.md
is this document.