forked from niswegmann/copilot-discussion
/
Structure.tex
159 lines (126 loc) · 10.4 KB
/
Structure.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
\section{Implementation}~\label{sec:structure}
%\begin{figure}[ht!]
% \begin{tikzpicture}[->, node distance=2.5cm, auto, shorten >=1pt, bend angle=45,
% thick]
% \tikzstyle{every state}=[rectangle, rounded corners]
%
% \node[state] (Lang)
% {
% \begin{tabular}[b]{l}
% Copilot Libraries\\ \hline Copilot Language
% \end{tabular}};
% \node[state] (Core) [below of =Lang] {Copilot Core};
%\node[state] (Cnub) [below of = Core] {Cnub};
% \node[state] (TransSys) [left of=Core] {$\mathrm{Transition Sys}$};
% \node[state] (IL) [right of =Core] {IL};
% \node[state] (SmtLib) [right of=IL] {SMT LIB};
% \node[state] (NMTransSys)[below left of = TransSys] {$\mathrm{Transition %Sys}^{nm}$} ;
% \node[state] (MprimeTransSys)[below right of = TransSys] {$\mathrm{Transition %Sys}^{m}$} ;
% \node[state] (Kind2Native) [left of = TransSys]{Kind 2} ;
% \tikzstyle{every node}=[]
%
% \path
% (Lang) edge [ anchor=west, text width=7.00cm] node {Reification and DSL-specific %type-checking} (Core)
% (Core) edge node {} (Cnub)
% (Core) edge node {} (TransSys)
% edge node {} (IL)
% (TransSys) edge node[swap] {Inline} (NMTransSys)
% edge node {Remove Cycles} (MprimeTransSys)
% (NMTransSys) edge node {} (Kind2Native)
% (TransSys) edge node {} (Kind2Native)
% (IL) edge node{} (SmtLib) ;
% \end{tikzpicture}
% \caption{The Copilot verification toolchain.}
% \label{fig:toolchain}
%\end{figure}
%\jonathan{Add a Cnub node after Copilot Core and between TS and IL}
In this section, we shall outline the structure of the implementation
of our Copilot verification system. %% Copilot is deeply embedded in
%% Haskell. A Copilot program is \emph{reified} (i.e. transformed from
%% a recursive structure into explicit graphs) and then domain specific
%% type checking is done.
After Copilot type-checking and compilation, a Copilot program is approximated
so it can be expressed in a theory handled by most SMT solvers, as described below. Any information
of no use for the model checking process is thrown away. The result of this
process is encoded in the Cnub format, which is is structurally close to the
Copilot core format but supports fewer datatypes and operators. Then, it can be
translated into one of two available representation formats:
\begin{itemize}
\item
The \textbf{IL} format: a list
of quantifier-free equations over integer sequences, where each sequence
roughly corresponds to a stream. This
format is similar to the one developed by Hagen~\cite{HagenPhD}, but customized for Copilot. The \emph{light prover} works with this
format.
\item The \textbf{TransSys} format: a modular representation of a
\emph{state transition system}. The \emph{Kind2 prover} uses this
format, which can be printed into Kind2's native format~\cite{kind}.
\end{itemize}
\subsection{Approximating a specification}
The complexity of the models that are built from Copilot specifications is
limited by the power and expressiveness of the SMT solvers in use. For instance,
most SMT solvers do not handle real functions like trigonometric functions. Moreover, bounded integer arithmetic is often to be approximated by standard integer arithmetic.
The {Cnub} format is aimed at approximating a copilot specification in a format relying on a simple theory including basic integer arithmetic, real arithmetic and uninterpreted functions. The stream structure is kept from the copilot core format but the following differences have to be emphasized:
\begin{itemize}
\item In contrast to the great diversity of numeric types available in Copilot, we restrain ourselves to three basic types which are handled by the \textsc{SmtLib} standard: \texttt{Bool}, \texttt{Integer} and \texttt{Real}. Problems related to integer overflows and floating point arithmetic are ignored.\footnote{In fact, there are techniques to model these effects anyway, as we discuss below.}
\item {Uninterpreted functions} are used to model operators that are not
handled. They are abstract as function symbols satisfying the
equality: $$ \left( \forall i . \; x_i = y_i \right) \Longrightarrow f(x_1,
\cdots, x_n) = f(y_1, \cdots, y_n). $$ in the quantifier-free theory of
uninterpreted function symbols, as provided by most SMT solvers.
\item Copilot extern variables are modelled by unconstrained streams. Particular precautions have to be taken to model accesses to external arrays in order to express the constraint that several requests to the same index inside a clock period must yield the same result.
\end{itemize}
Excepting the first point, the approximations made are sound:\jonathan{I'm almost sure there is an official word for this} they result in a superset of possible behaviors for the RV. %% In the case this approximation is too coarse to prove the RV correct or treating machine integers as unbounded integers is unacceptable, some techniques can be used to refine the Cnub model anyway.
%% \medskip
The problem of integer overflows can be tackled by adding automatically to the property being verified some bound-checking conditions for all integer variables. However, this solution can generate a great overhead for the proof engine. Moreover, it treats every program which causes an integer overflow as wrong, although this behaviour could be intended. An intermediate way to go would be to let the developper annotate the program so he can specify which bounds have to be checked automatically.
%\medskip
%
%The problem of complex operators like trigonometry operators is more difficult to handle. A seemingly good solution would be to give some classical properties of these operators as additional constraints for the SMT solvers. For instance, we could add the following constraint to any specification where the sine function appears : \[ \forall x . \; -1 \leq \sin x \leq 1 \]
%Unfortunately, quantifiers are not handled well by the current state
%of the art SMT solvers\footnote{As we will discuss it later, it
% seems to us it is one of the main limitation of SMT-based techniques
% which keeps them from scaling well.}. An alternative would be to add
%automatically a specialized version of these constraints for each
%appropriate variable. For instance, we could add the constraint $-1
%\leq y \leq 1 $ for all variables $y$ such that we have a
%constraint $ y = \sin x $ for $x$ any variable. Further
%improvements on these techniques warrant further study.
\subsection{The Light prover and the IL format}
Our homegrown prover relies on an intermediate representation format called
{IL}. An IL specification mostly consists of a list of quantifier-free equations
over integer sequences. These equations contain a free variable $n$ which is
implicitly universally quantified. The IL format is similar to the one used by Hagen~\cite{HagenPhD}.
A stream of type $a$ is modeled by a function of type $\mathbb{N} \to a$. Each stream definition is translated into a list of constraints on such functions. For instance, the stream definition
\begin{lstlisting}
fib = [1, 1] ++ (fib + drop 1 fib)
\end{lstlisting}
is translated into the IL chunk:
$$
\begin{array}{c}
f: \mathbb{N} \to \mathbb{N} \\
f(0) = 1 \\
f(1) = 1 \\
f(n + 2) = f(n + 1) + f(n)
\end{array}
$$
Let's say we want to check the property \texttt{fib > 0} which translates into $f(n) > 0$. This can be done in two steps of the $k$-induction seen in Section~\ref{sec:background} by taking \[T[n] \;\equiv\; \left(\, f(0) = 1 \,\wedge\,
f(1) = 1 \,\wedge\, f(n + 2) = f(n + 1) + f(n) \,\right) \] \[P[n] \;\equiv\; \left(\, f(n) > 0 \,\right) \]
and checking that both \[T[0] \,\wedge\, T[1] \,\wedge\, \neg \left( P[0] \,\wedge\, P[1] \right) \] and \[T[n] \,\wedge\, T[n + 1] \,\wedge\, P[n] \,\wedge\, P[n + 1] \,\wedge\, \neg P[n + 2] \]
are non-satisfiable, the last one being equivalent to \[ \left[f(n+2) = f(n+1) + f(n)\right] \,\wedge\, \left[f(n) > 0\right] \,\wedge\, \left[f(n + 1) > 0\right] \,\wedge\, \left[f(n + 2) \leq 0\right] \,\wedge\, \cdots \]
Therefore, as the reader can see in this simple example, the construction of \textsc{SmtLib} requests from an IL specification is pretty straightforward.
\subsection{The Kind2 prover and the {TransSys} format}
Recall, a state transition system is a triple $(S,I,T),$
where $S$ is a set of states, $I \subseteq S$ is the set of initial
states and $T \subseteq S \times S $ is a transition relation over $S$.
Here, a state consists of the values of a finite set of variables, with types belonging to $\{ \texttt{Int}, \texttt{Real}, \texttt{Bool}\}$. $I$ is encoded by a logical formula whose free variables correspond to the state variables and that holds for a state $q$ if and only if $q$ is an initial state. Similarly, the transition relation is given by a formula $T$ such that $T\left[\, q, \, q' \,\right]$ holds if and only if $q \rightarrow q'$.
\medskip
The \textbf{TransSys} format is a modular encoding of such a state transition system. Related variables are grouped into \textit{nodes}, each node providing a distinct namespace and expressing some constraints between its variables. A significant task of the translation process to TransSys is to flatten the copilot specification so the value of all streams at time $n$ only depends on the values of all the streams at time $n - 1$ which is not the case in the Fibonacci example shown earlier. This is done by a simple program transformation which turns
\begin{lstlisting}
fib = [1, 1] ++ (fib + drop 1 fib)
\end{lstlisting}
into
\begin{lstlisting}
fib0 = [1] ++ fib1
fib1 = [1] ++ (fib1 + fib0)
\end{lstlisting}
After this, it is natural to associate a variable to each stream. Here, the variables \texttt{fib0} and \texttt{fib1} would be grouped into a single node, in order to keep some structure in the representation of the transition system\footnote{This can be important for two reasons. First, the model checker could use these structural informations to optimize its research. See \emph{structural abstraction} in \cite{HagenPhD}. Then, structured transition systems are easier to read, debug and transform. }. Such a modular transition system is almost ready to be translated into the Kind2 native format. However, we first have to merge each nodes pair whose components are mutually dependent as Kind2 requires a topological order on its nodes.