NOTE
This fork is adapted to (kind of) work with kernel versions 4.x onward (tested with 4.13.3). The original
kconfigreaderrelies on
dumpconf, which can fail if the
implykeyword is used in
Kconfig` files (this was introduced in a Kconfig version past 4.x).
Edits:
- added script to replace
imply
withselect
in relevant Kconfig files - modified
src/main/scala/KConfigReader.scala
following the issue in the original repository
Usage:
- follow the build instructions below
chmod +x do.sh && ./do.sh
cd $LINUX_SRC_DIR
export SRCARCH=x86 && ../kconfigreader/binary/dumpconf Kconfig > ../kconfigreader/out.rsf
cp -r ../kconfigreader/binary/ .
export SRCARCH=x86 && ./../kconfigreader/run.sh de.fosd.typechef.kconfig.KConfigReader --writeDimacs Kconfig out
This will place an out.dimacs
file in the $LINUX_SRC_DIR
folder. I added my version in this fork for convenience.
tooling to read kconfig files and convert them into formulas for further reasoning (primarily for the TypeChef infrastructure, but may be used elsewhere).
Build with sbt
To extract the raw data from kconfig files, this tool
relies on a utility dumpconf
, inspired by Untertaker, that
builds on on top of Linux's kconfig infrastructure to dump
the internal representation in XML format. The file
dumpconf/dumpconf.c
must be compiled against the
Linux source tree that is to be analyzed.
A binary version compiled on ubuntu 12.04 for Linux 2.3.33.6
is available in the binary
folder for convenience.
With sbt mkrun
you can create a run.sh
file that configures
all dependencies correctly.
To extract information from a model run ./run.sh de.fosd.typechef.kconfig.KConfigReader [kconfig] [out]
where kconfig
refers to the kconfig file that should be analyzed and out
points to the base name of the
files that should be written (the tool will create multiple output files with
different extensions).
By default, the tool will create two files: out.rsf
contains the output of dumpconf
, a raw
dump of the kconfig information in a intermediate format and out.model
contains the boolean constraints
sorted by the feature they belong to.
The feature names in this file have the following encoding: Feature names in quotes occur undefined in the kconfig model and are hence dead. A tristate feature X is represented by two variables X and X_MODULE that are mutually exclusive, just as used in Linux. Nonboolean options are represented by multiple variables, one for each value explicitly mentioned in the kconfig files (X=n means the variable is deactived, X=1 means it has value 1, and so on).
Additional options:
-
--fast
skips the consistency check. This speeds up extraction significantly and uses much less memory, since the SAT solver is not involved. May silently produce unsatisfiable models though. Omit--fast
for checking consistency and debugging inconsistent models. -
--dumpconf [file]
provide the path to the dumpconf tool to be called from within this tool -
--writeDimacs
writes aout.dimacs
file that can be used with any SAT solver or directly as TypeChef feature model (e.g. throughFeatureExprFactory.dflt.featureModelFactory.createFromDimacsFile
). Comments in the beginning of the dimacs file provide a mapping to the option names (including variables for nonboolean options). The dimacs file contains additional variables to avoid explosion of the transformation into CNF (the transformation is equisatisfiable, but not equivalent; and equivalent transformation into CNF is possible for small models, but not for the larger constraints in Linux; changes are easily possible in the source code by changing the parameter toDimacsWriter.writeAsDimacs2
). -
--writeCompletedConf
writesout.completed.h
andout.open
files. It checks for every option whether it can be activated. If it is activated in all configurations, it is defined as macro in the .h file, if it is deactivated in all configurations, it is undefined in the .h file. If it is activated in some and deactivated in other valid configurations, it is included in the .open file. Those files are used as input for TypeChef to reduce the search space (using TypeChef's--include
and--openFeat
parameters). Since it requires two SAT calls for every option, it is expensive to compute. It requires to write a .dimacs file for reasonable performance. -
--writeNonBoolean
writes aout.nonbool.h
file that defines all nonboolean options to their defaults using #define directives. Additionally, #ifdef directives are used if different defaults are defined in different configurations or defaults are not available for all configurations. -
--reduceConstraints
eliminates all redundant constraints (i.e., constraints that are implied already by previous constraints) before writing the .dimacs file. This is a very expensive operation, that however can reduce the size of the .dimacs file by a few percent.
For an example of how to use this, see genFMs.sh
in https://github.com/ckaestne/TypeChef-LinuxAnalysis
Tristate and boolean options (with prompts and without) are accurately handled as far as we know. If you find a mistake, please provide a small kconfig file as test case where our extraction differs from the default kconfig behavior (the test infrastructure tests all combinations of those files in a brute-force fashion, there are many examples of such files in the test directory).
Select statements (and depends) are potentially order dependent and may trigger kconfig to produce otherwise invalid configurations. Kconfig issues a warning when this happens and it is rather unlikely. Our tool does not model this extreme behavior correctly. It would be worth writing an extension which detects potential issues to report them for to the Linux maintainers (as they try to avoid these cases as well).
The precise handling of tristate and nonprompt options often lead to large constraints. This is unavoidable unless imprecise approximations are desired. (Those could be added on top of our infrastructure easily).
A different behavior for nonboolean options would be possible. Here a finite abstraction of an infinite domain is necessary. The current encoding represents precisely what is enforced within kconfig, but limited to the values mentioned in the kconfig file as defaults or in constraints. A different behavior would be possibly by changing the implementation.
Range expressions currently may not depend on other configuration values, but only on constants. There is only a single case in Linux-x86 where range expressions are ignored due to this limitation.
We currently create constraints for each option separately. Select statements are listed under the selected statement, not the selecting statement. It would be an straightforward extension to additionally distinguish the kind of constraints further and maintain traceability information back to the .rsf file, if desired.
The MODULES option (if used in the model) must be named MODULES. It is matched by name, not by the additional Kconfig attribute. (could be changed by modifying both this tool and dumpconf)
The semantics of Kconfig are nontrivial, not only with regard to tristate options, but also with regard to nonboolean functions, items and choices with and without prompts, and so forth.
We invested significant effort in a testing infrastructure, to ensure that the Kconfig behavior
is correctly captured. The idea is to use a differential testing approach. We use kconfig
itself (more precisely the tool conf --olddefconfig
) to check whether a specific
configuration is valid. This tool will modify values in an invalid configuration to a valid
one; that is, it provides a means to establish ground truth which configurations are valid.
On small kconfig models we apply a brute-force strategy where we execute kconfig on all possible valid and invalid configurations (or a sampled subset of values for nonboolean options). We compare for every configuration whether our propositional abstraction yields the same result as kconfig.
On large models as the Linux kernel model, a brute-force strategy obviously does not work. Instead, we can provide a partial configuration or a subset of configuration options that should be explored in a brute-force way. Our testing infrastructure completes the configuration with the abstracted model and checks whether the configuration validity agrees between our abstraction and the kconfig behavior.
Note that this requires a releatively recent version of Kconfig's conf tool that is part of
the Linux kernel in which the option --olddefconfig
is available. (after 2011)
For all bug reports, please provide a failing test case in terms of a small kconfig file where the brute-force analysis finds differences between the kconfig behavior and our abstraction.
tristate to CONFIG_x translation:
x=y
=> #define CONFIG_x
=> #undef CONFIG_x_MODULE
x=m
=> #undef CONFIG_x
=> #define CONFIG_x_MODULE
x=n
=> #undef CONFIG_x
=> #undef CONFIG_x_MODULE
This tool was developed by Christian Kaestner at Carnegie Mellon University. Please contact him in terms of questions.
It builds on the dumpconf infrastructure of the Undertaker project and reuses some test cases from that project. It was inspired by Undertaker and the LVAT infrastructure (which is unfortunately no longer maintained).