Join GitHub today
The Soufflé project designs and implements novel compilation techniques for the efficient translation of Datalog programs to C++. The aim of the Souffle Datalog compiler is to use it as a domain-specific language tool for static program analysis including points-to, taint, constant-propagation, and security analyses for large-scale code bases with million lines of code. Soufflé provides the ability to rapid prototype and make deep design space explorations possible for the tool designers writing static program analyses with the aim to increase the productivity.
One of the major challenges in a declarative programming approach like Datalog is to make the execution of static program analyses written in Datalog scalable for large code bases. Hence, new high-performance execution models are required that go beyond state of-the art techniques. Soufflé differs from existing Datalog implementations. Soufflé aims to be a computational machinery for static program analyses solely without the database management aspects. For this reason, new compilation techniques are investigated such as improved semi-naive evaluation, staged-compilation with a new abstract machine, partial evaluation, and parallelisation for achieving high performance.
The major research questions in the Soufflé project are centered around:
- How to translate declarative rules to imperative programs efficiently on modern computer hardware including multi-core computers?
- What domain specific language extensions of Datalog are necessary in Soufflé expressing static program analyses and its development more effectively?
We introduce a novel compilation technique that translates a Datalog program to a native C++ program. The relational algebra operations for the execution of Datalog programs are performed on highly optimized C++ data structures in memory, since for static program analysis no persistent storage or concurrency control is required, and hence the compilation to pure C++ yields high performance. To obtain a highly optimized C++ program, we employ staged compilation: a Datalog program is translated to an abstract machine using a semi-naive evaluation scheme, and then further translated to a C++ program. The abstract machine is a relational algebra machine, that can express the imperative execution of relational algebra operation and fixed-point computations for the evaluation of Datalog programs. By lowering the Datalog program to C++, computations can be moved from runtime to compile-time. This new compilation technique overcomes performance bottlenecks observed in traditional Datalog implementations that are (1) implemented as an interpreter and (2) perform the relational algebra operations on disk using a relational database.
Bernhard Scholz, Herbert Jordan, Pavle Subotic, Till Westmann: On fast large-scale program analysis in Datalog. CC 2016: 196-206
Bernhard Scholz, Kostyantyn Vorobyov, Padmanabhan Krishnan, Till Westmann: A Datalog Source-to-Source Translator for Static Program Analysis: An Experience Report. ASWEC 2015: 28-37
Herbert Jordan, Bernhard Scholz, Pavle Subotic: Souffle:On Synthesis of Program Analyzers?. CAV'2016 (to appear)
- Bernhard Scholz: project lead, design and implementation
- Herbert Jordan: high-performance parallel data-structures, language extensions including records and components, static type system, major refactoring of the system
- Paul Subotic: index selection for relations
- Raihaan Amod: profiler
- Alex Jordan: support designing/implementing the object-oriented C++ interface for Soufflé
- Nicholas Allen: refactoring of the front-end
Soufflé's history can be found here.