Skip to content

Glossary

Christoph Matheja edited this page Feb 5, 2019 · 20 revisions

This page contains brief and mostly informal descriptions of most definitions, concepts and terminology used within Attestor. Further details are found in the publications section of Attestor's documentation.

Abstract Semantics

The abstract semantics determines how a program is executed on abstract heap configurations. The state space generated by Attestor is obtained by executing the abstract semantics of a given program until the state space stabilises, i.e. a fixed point is reached.

The abstract semantics used within Attestor is derived automatically from both the concrete semantics and the user-given graph grammars that together determine canonicalisation and materialisation operations.

Intuitively, the abstract semantics of executing a program P on abstract state S corresponds to first materialising S to ensure that the concrete semantics of P can be executed. Applying the concrete semantics of P is the next step. The result is then canonicalized into an abstract heap configuration again.

Abstraction

Abstraction refers to the process of computing an abstract heap configuration, say K, from a given (abstract) heap configuration, say H, such that the heap language of K subsumes the heap language of H.

Abstraction within Attestor is performed by applying graph grammar derivations backwards. In each derivation step, an embedding of a rule's right-hand side is replaced by a hyperedge that is labelled by the left-hand side nonterminal.

Abstract Heap Configuration

An abstract heap configuration is a heap configuration that contains at least one hyperedge which is labelled with a nonterminal.

Admissible Heap Configuration

An admissible heap configuration is a heap configuration that contains no violation points. In other words, there exists no node with an attached variable edge and an attached nonterminal hyperedge such that at least one tentacle connected to that node is not a reduction tentacle.

Annotation

An annotation is an optional part of a selector label that is accessible by the grammar but not by the program. For example, selector labels "left [0]" and "left [1]" would both correspond to the field left in the program, but would be distinguished when computing embeddings.

Atomic Proposition

Atomic propositions are Boolean properties that are assigned to program states such as the heap of this state constitutes a doubly-linked list or program execution terminated in this state. Atomic propositions serve atoms within LTL specifications supplied to Attestor for verification.

Attached Hyperedge

A hyperedge e is attached to a node n of the same hypergraph if n appears in the sequence of attached nodes of e.

Attached Node

The attached nodes of a hyperedge e are exactly those nodes that have e as an attached hyperedge.

Attached Variable

An attached variable of a node is an attached hyperedge whose label is a variable name.

Backward Confluence

A graph grammar is backward confluent if and only if for all hypergraphs H, there is a unique abstraction of H that cannot be further abstracted.

Base Marking

A base marking is a marking that is not placed as a successor of another marking. For example, if a marking m is placed on a node and a marking m.next is placed on a direct successor of that node then m is called a base marking and m.next a composed marking.

Canonicalisation

Canonicalisation refers to the process of computing an abstract heap configuration from a given heap configuration before adding a program state to the state space. Canonicalisation is a special case of abstraction that usually implies that no further abstraction is applicable.

Note that due to heuristics and optimisations, not every form of canonicalisation yields a canonical state.

Canonical State

A canonical state is a program state in a state space with the special property that its heap language is disjoint from the heap languages of all other states in a state space.

Canonicalisation in theory yields a canonical state. In practice however, optimisations, e.g. omitting certain abstractions, may yield states that are not canonical.

Composed Marking

A marking that is derived from another marking and placed, for example, on one of the direct successors of the respective node.

Concrete Semantics

The semantics for symbolically executing a program on heap configurations. It is thus a partial transformer that, for any given program, takes a heap configuration and maps it to one or more heap configurations that capture the effect of executing the program. The concrete semantics is in general not suitable to be executed on abstract heap configurations as it is agnostic of nonterminal hyperedges.

Concretisation

Concretisation yields the set of fully concrete heap configurations, i.e. heap configurations that do not contain any nonterminals, of a given abstract heap configuration.

Concretisation in Attestor coincides with computing the heap language of an abstract heap configuration.

Constant

A literal, such as null, true, false, etc. From a modeling perspective, constants are treated within Attestor like variables. However, they cannot be modified by the program semantics.

Furthermore, certain optimisations treat constants different from variables. For example, by default, constants are ignored when checking whether an abstraction violates admissibility.

Contract

A contract is pair of abstract heap configurations (H,K), where H is interpreted as a precondition and K is interpreted as a postcondition, respectively. Contracts assign a (modular) semantics to function calls: If a function is executed with a given heap configuration as input and this heap configuration matches a precondition of a contract, it can immediately be replaced by the corresponding postcondition instead of executing the function (again).

Counterexample

A counterexample is a sequence of program states that starts in the initial state and proves why an LTL property is violated for a given state space. If a counterexample contains abstract heap configurations, it may be supurious.

Derivation

Given a graph grammar and two hypergraphs H, K, we say that the grammar derives K from H in a single step if and only if H contains a nonterminal hyperedge e and a rule with the nonterminal on its left-hand side such that H is isomorphic to the replacement e by the rule's right-hand side.

The derivation relation of a grammar is then the reflexive and transitive closure of the single-step derivation relation.

Embedding

An embedding of a hypergraph, say H, in another hypergraph, say K, is a mapping of the nodes and hyperedges of K that respects the structure of H. It thus formally describes that K contains H as a subgraph with the external nodes as the boundary between this subgraph and the remaining nodes and edges of K.

External Node

Every hypergraph is equipped with a repetition-free sequence of external nodes that serve as the "boundary" of a graph whenever it is glued together with another hypergraph. External nodes are required for hyperedge replacement.

Final State

A final state is a program state in which the anaylzed program has terminated.

Grammar Refinement

Grammar refinement refers to the process of constructing a refined hyperedge replacement grammar from a given hyperedge replacement grammar, such that validity of a property of interest can be immediately decided by inspection of the nonterminals of the refined grammar. The heap language of the refined grammar is thus a subset of the heap language of the original grammar, where only hypergraphs satisfying a property of interest are kept in the language.

For example, one can construct a refined grammar that stores for each nonterminal and each pair of external nodes whether they can reach each other or not.

Heap Configuration

A heap configuration is a hypergraph equipped with additional restrictions to ensure that it models a realistic heap together with program variables and constansts. These restrictions are as follows:

  • All hyperedges are either labelled with a selector, nonterminal, variable, constant, or marking.
  • For every node n and every selector s there is at most one edge with source n, i.e. n is the first attached node of that edge, that is labelled with s.
  • For every variable name there is at most one hyperedge that is labelled with this variable name.
  • There is a unique hyperedge labelled with null that is attached to a node which is not the source of any edges labelled with selectors.

Furthermore, we refer to a heap configuration that contains at least one hyperedge labelled with a nonterminal as an abstract heap configuration. Otherwise, a heap configuration is called concrete.

Heap Language

The language of an abstract heap configuration with respect to a graph grammar is the set of all concrete heap configurations that can be derived in finitely many steps from the original heap configuration.

Hyperedge

A hyperedge is an element of the sets of edges of a hypergraph. By default, it is equipped with a label and a sequence of attached nodes.

Hyperedge Replacement

A hyperedge e of a hypergraph H can be replaced by a hypergraph K if the rank of e coincides with the number of external nodes of K. Assuming that H and K do not have any nodes and edges in common (otherwise an isomorphic copy of K is computed first), the replacement works as follows:

  1. e is removed from K and the sequence of nodes attached to e is remembered.
  2. K is pasted into H, where the external nodes of K are identified with the sequence of attached nodes of e in the order of both sequences.

Hyperedge Replacement Grammar

A hyperedge replacement grammar (alternatively: context-free graph grammar) is a finite set of rules mapping a nonterminal symbol to a hypergraph. It is required that the number of external nodes of the rule's right-hand side, i.e. the hypergraph, coincides with the rank of the nonterminal of the rule's left-hand side.

For every hypergraph, a hyperedge replacement grammar specifies its language with respect to the grammar.

Hypergraph

A hypergraph is an extension of an ordinary undirected, edge-labelled graph in which edges may be attached to an arbitrary ordered sequence of nodes instead of just two nodes. More formally, a hypergraph consists of a finite set of nodes, a finite set of hyperedges, a function mapping each hyperedge to its sequence of attached nodes, a function mapping each hyperedge to its label, and a sequence of external nodes.

Index

Hyperedges may be additionally labelled with a finite sequence of symbols, called indices, that can be accessed by an extension of standard hyperedge replacement grammars that are called indexed graph grammars.

These grammars are more expressive than standard context-free graph grammars while preserving most of their desriable properties.

Initial State

Initial states are the first program states added to the state space during state space generation. Their program state thus corresponds to the very first statement of the top-level method that should be analyzed. Furthermore, their heap configuration corresponds to the input.

There is not necessarily a unique input state as, for example, many initial states can be generated if Markings are used.

Intermediate Reference

An internal reference for communication between methodExecution, such that this, return, param_n, etc.. From a modeling perspective they are treated within Attestor like variables. However, they cannot be referenced in the original program and are always removed on first access.

Isomorphism

Two hypergraphs are isomorphic if they are identical up to renaming of nodes and edges (but not their labels). More formally, there must exist bijective functions between the nodes and edges of two hypergraphs that preserve the hypergraph's attachment functions, labeling functions and sequences of external nodes, respectively.

Label

A label of a hyperedge is a symbol with an assigned rank. Within Attestor, a label is either a variable name, a constant name, a marking (all of rank 1), a selector (of rank 2), or a nonterminal (of arbitrary rank).

Language Inclusion

Given a graph grammar and two hypergraphs H and K, the language inclusion is the question whether the language of H with respect to the grammar is contained in the language of K with respect to the grammar.

Language inclusion is crucial, because it allows to precisely check whether a computed abstract heap configuration is already subsumed by another heap configuration that is already in the state space. This question is essential to check whether the state space generation has reached a fixed point and can thus terminate.

Since language inclusion is undecidable in general, Attestor either requires additional properties, such as backward confluence, or needs to use approximations to discharge the language inclusion problem.

Linear Temporal Logic

Linear temporal logic (LTL) is a popular logic to write specifications that is supported by Attestor for a specialized set of atomic propositions.

There is a dedicated Wiki page on writing LTL specifications for program analysis with Attestor.

Marking

A marking is a special variable that cannot be accessed by the program semantics. The purpose of markings is to track a fixed location, i.e. a node, throughout state space generation.

Materialisation

Materialisation refers to the process of partially concretising an abstract heap configuration such that one step of the concrete semantics can be executed.

This usually corresponds to applying all possible derivations that remove violation points preventing the concrete semantics from being executable on the given abstract heap configuration.

Notice that materialisation of an abstract heap configuration is not unique, i.e. an analysis usually has to explore multiple materialised heap configurations.

Node

A node is an element of a hypergraph. Within Attestor a node is equipped with an identifier and a type. However, only the type is relevant when checking whether two hypergraphs are isomorphic.

Nonterminal

A nonterminal symbol is a label that is used within Attestor to represent abstract heap shapes such as linked list segments, trees, etc.

Phase

Attestor is organized in phases, e.g. parsing, state space generation, model-checking, etc., that are executed in sequence.

Program State

A program state within Attestor consists of a program counter, i.e. the next program statement that should be executed, and an (abstract) heap configuration which determines the values of variables and the content of the heap. Furthermore, states may be assigned atomic propositions that are accessed during model-checking.

Two program states are considered isomorphic if their program counters coincides and their heap configurations are isomorphic.

Program states make up the nodes of a state space.

Rank

Every label is assigned a rank that determines the number nodes attached to a hyperedge with that label.

Reduction Tentacle

If a nonterminal edge cannot be concretised to outgoing edges at its i-th attached node, the i-th tentacle of this nonterminal is called a reduction tentacle.

Rule

A rule of a graph grammar is a mapping of a nonterminal to a hypergraph such that the rank of the nonterminal coincides with the number of external nodes of the rule's right-hand side.

Selector

A selector is a label of rank two that represents a pointer variable. For example, given a class

 class A {
      private B x;
      private C y;
 }

leads to two selectors: x and y. A hyperedge labelled with a selector corresponds to a pointer from the edge's source node, i.e. its first attached node, to the edge's target node, i.e. its second attached node.

Semantics

Both the concrete semantics and the abstract semantics used within Attestor are an operational semantics that, for each program, map an input program state to a set of output program states. The semantics has to map an input to a set of outputs, because certain statements can only be resolved nondeterministically, i.e. if the result of a Boolean condition cannot be precisely evaluated and thus both results are analyzed.

Soot

Attestor uses the Soot framework to process Java and Java Bytecode files that should be analyzed.

State

See Program State.

State Space

A state space is a labelled transition system, where each state (node of the underlying directed graph) is a program state. The labels of states are atomic propositions.

State Space Generation

State space generation refers the construction of a state space for a given program, initial state and graph grammar.

Initially, the state space consists only of all initial states. After that, the abstract program semantics is computing and adding the resulting successor states to the state space until a fixed point is reached.

Spurious Counterexample

A counterexample is spurious, if it does not correspond to an actually possible program execution. This is typically caused by the fact that program states in a counterexample contain abstract heap configurations that capture a set of heap configurations. It is thus possible that there exists a state with an abstract heap configuration such that for all heap configurations in its heap language, executing the concrete semantics does not lead to a heap configuration in the heap language of the heap configuration of the successor state in the counterexample.

Symbolic Heap

A symbolic heap is a formula in a fragment of separation logic. Symbolic heaps are formulas in prenex normal form consisting of both free variables, which in our case are interpreted as program variables (or external nodes for predicate definitions), and existentially quantified variables. In terms of heap configurations, the latter roughly correspond to internal nodes. The edges of a heap configuration are determined by the spatial part of a symbolic heap, i.e., all points-to assertions connected by the separating conjunction. Furthermore, symbolic heaps may contain predicate calls, which correspond to nonterminal edges in terms of heap configurations. In addition to that, symbolic heaps may contain pure formulas, i.e. equalities between variables. These formulas have no direct correspondence in the world of hypergraphs. Instead, every node in a hypergraph corresponds to the equivalence class of some variable with respect to all pure formulas.

For a detailed description of the syntax of symbolic heaps supported by Attestor, please confer this page.

System of Inductive Predicate Definitions

A system of inductive predicate definitions in the fragment symbolic heap fragment of separation logic, SID for short, is a finite set of rules mapping predicate symbols to symbolic heaps. Every rule of an SID is (under careful syntactic restrictions; see SID-Syntax) equivalent to a rule of a context-free graph grammar, where the predicate symbol on the left-hand side of a rule corresponds to a nonterminal symbol and the symbolic heap on the right-hand side of a rule corresponds to an abstract heap configurations.

For a detailed description of the syntax of systems of inductive predicate definitions supported by Attestor, please confer this page.

Tentacle

A tentacle is a pair consisting of a hyperedge and a natural number between 0 and the rank of the edge's label minus one. It thus uniquely identifies a position in a hyperedge's sequence of attached nodes.

Type

Every node is assigned a type which intuitively corresponds either to the class of an object modeled by a node or a constant.

Variable Edge

A variable edge is a hyperedge of rank 1 that is labelled with a variable name.

Violation Point

A pair consisting of a variable and a selector is called a violation point if the unique attached node of the variable is not the source of an edge labelled with the selector, but it is attached to a nonterminal hyperedge such that there exists some derivation which adds such an edge labelled with the selector to the node.