Permalink
Browse files

paper: extrapolation section.

  • Loading branch information...
1 parent 26cacb9 commit 07bb2f466559af1259882e07395a05ed1319c7da @leepike committed Dec 28, 2012
Showing with 159 additions and 64 deletions.
  1. +16 −10 paper/Paper.hs
  2. +13 −0 paper/paper.bib
  3. +130 −54 paper/paper.tex
View
@@ -93,8 +93,7 @@ newVals (SubVal a) sz tries =
s = sizedArbitrary sz a
>>= return . SubVal
-test :: SubTypes a
- => a -> Idx -> [SubVal]
+test :: SubTypes a => a -> Idx -> [SubVal]
-> (a -> Property) -> Maybe a
test cex idx vs prop = go vs
where
@@ -140,18 +139,25 @@ reduce0 cex prop = reduce' 1
--------------------------------------------------------------------------------
+subTree :: SubTypes a => a -> Idx -> Idx -> Bool
+subTree _ _ _ = undefined
+
extrapolate :: SubTypes a
=> a -> (a -> Property) -> IO [Idx]
extrapolate cex prop = extrapolate' 1 []
where
- extrapolate' idx idxs =
- case index cex idx of
- Nothing -> return idxs
- Just v -> do
- vs <- newVals v scMaxSize scMaxSuccess
- if allFail cex idx vs prop
- then extrapolate' (idx+1) (idx:idxs)
- else extrapolate' (idx+1) idxs
+ extrapolate' idx idxs
+ | or (map (subTree cex idx) idxs) =
+ extrapolate' (idx+1) idxs
+ extrapolate' idx idxs
+ | otherwise =
+ case index cex idx of
+ Nothing -> return idxs
+ Just v -> do
+ vs <- newVals v scMaxSize scMaxSuccess
+ if allFail cex idx vs prop
+ then extrapolate' (idx+1) (idx:idxs)
+ else extrapolate' (idx+1) idxs
allFail :: SubTypes a
=> a -> Idx -> [SubVal]
View
@@ -52,3 +52,16 @@ @article{haskell98
year = 2003,
note = {Available at \url{http://www.haskell.org/definition/}},
}
+
+@inproceedings{sc,
+ author = {Colin Runciman and
+ Matthew Naylor and
+ Fredrik Lindblad},
+ title = {Smallcheck and lazy {s}mallcheck: automatic exhaustive testing
+ for small values},
+ booktitle = {ACM SIGPLAN Symposium on Haskell},
+ publisher = {ACM},
+ booktitle = {Haskell},
+ year = {2008},
+ pages = {37-48}
+}
View
@@ -139,13 +139,13 @@
%% embedded domain-specific language, compiler, verification
-\todo{note constants are user-defined arguments}
+\todo{note constants are user-defined arguments, other simplifications}
\section{Introduction}\label{sec:intro}
The QuickCheck testing framework was a revolutionary step-forward in
type-directed random testing~\cite{qc}. Nowadays, QuickCheck is a standard
aspect of the Haskell development cycle. Because QuickCheck generates random
-values for testing, counterexamples it finds may not be substantially larger
+values for testing, counterexamples it finds may be substantially larger
than a minimal counterexample. In their original QuickCheck paper~\cite{qc},
the authors report the following user experience by Andy Gill:
%
@@ -219,23 +219,23 @@ \section{Introduction}\label{sec:intro}
it is typically defined using structural recursion for composite data types.
For example, consider the \ttp{shrink} instances given to data types \ttp{A} and
\ttp{B} in Figure~\ref{fig:initial}.\footnote{All examples and algorithms in
- this paper are in Haskell~\cite{haskell98}}. Indeed, because of the regular
+ this paper are presented in Haskell~\cite{haskell98}.} Indeed, because of the regular
structure of \ttp{shrink} implementations, the principal motivation for one of
Haskell's first generics package was for providing generic definitions for
\ttp{shrink}~\cite{syb}.
However, this straightforward approach to defining \ttp{shrink} instances may
either miss minimal counterexamples, be too inefficient to be practical, or
-both.\footnote{E.g., for a recent real-world case, see the user question posed
- on StackOverflow:
+both.\footnote{For a recent real-world case, see the user question posed on
+ StackOverflow:
\url{http://stackoverflow.com/questions/8788542/how-do-i-get-good-small-shrinks-out-of-quickcheck}.}
For example, consider again Figure~\ref{fig:initial}. The data type \ttp{B} is
a product type over lists of 16-bit signed integers, wrapped by a constructor.
Consider the property \ttp{prop} in the figure, stating that if the sum of each
-list in a field of \ttp{B} is less than 16, then the sum of the four fields of
-\ttp{B} is less than 64. The property \ttp{prop} seems reasonable at first
-glance, until one realizes that due to underflow, the property can be violated.
-For example, consider the value
+list in a field of \ttp{B} is less than 16, then the sum of the four fields is
+less than 64. The property \ttp{prop} seems reasonable at first glance, until
+one realizes that due to underflow, the property can be violated. For example,
+consider the value
%
\begin{code}
B [A (-32769)] [] [] []
@@ -251,14 +251,14 @@ \section{Introduction}\label{sec:intro}
counterexamples to test; with the definitions and property provided, which also
rely on QuickCheck's default instances for the list type, the list of potential
counterexamples generated by \ttp{shrink} can contain more than $10^{10}$
-elements. This is an enormous state space to search for a smaller example! In
-particular, if a counterexample is not found does not finish in a reasonable
-amount of time.
+elements. This is an enormous state space to search for a smaller example and
+can cause the shrinking phase to take orders-of-magnitude more time than the
+counterexample discovery phase.
It is not always straightforward to determine how to make a more efficient
implementation for \ttp{shrink}. Shrinking presents a programmer's dilemma:
shrinking is used to better understand a counterexample to understand why a
-property failed; but to implement an efficient shrink function in general
+property failed, but to implement an efficient shrink function in general
requires knowing how to generate counterexamples to the property.
Typically, a user tries a brute-force approach to reduce the number of potential
@@ -278,7 +278,7 @@ \section{Introduction}\label{sec:intro}
\begin{code}
len = 10
-shrink (B a b c d) =
+shrink (B a b c d) =
[ B w x y z | w <- tk a, x <- tk b
, y <- tk c, z <- tk d ]
where tk x = take len (shrink x)
@@ -311,8 +311,8 @@ \section{Introduction}\label{sec:intro}
\ttp{[(-d)..d]}). For a product type, SmallCheck must check values produced by
taking the cross-product of the type's fields at a given depth. In
Figure~\ref{fig:initial}, we have defined instances for generating Lazy
-SmallCheck (a more efficient version of SmallCheck) tests by defining instances
-of the \ttp{Serial} class. For the property \ttp{prop} defined above (suitably
+SmallCheck (a variant of SmallCheck) tests by defining instances
+for the \ttp{Serial} class. For the property \ttp{prop} defined above (suitably
redefined to use SmallCheck's types), SmallCheck must check to depth 16 to find
the first counterexample. Unfortunately, after a couple hours of testing, Lazy
SmallCheck is still checking values at depth six, and the number of tests scales
@@ -385,24 +385,26 @@ \section{Introduction}\label{sec:intro}
\emph{SmartCheck}. SmartCheck extends QuickCheck to efficiently and generically
find small counterexamples as well as generalize them as described in the
preceding paragraph. To motivate the benefit of SmartCheck, consider the table
-in Table~\ref{fig:results}. The table characterizes testing the program shown
-in Figure~\ref{fig:initial} that we have described so far. Four trials are
-presented. In the first three, we use QuickCheck to generate counterexamples.
-The first column contains the results of using QuickCheck without shrinking to
-provide a baseline. The next two columns show the result of using QuickCheck
-with definitions of \ttp{shrink} that limit the lists of fields of \ttp{B} to 10
-(Figure~\ref{lst:newshrink}) and 20 elements (Figure~\ref{lst:newshrink}, with
-\ttp{len = 20}), respectively. (Recall that with the original definition for
-shrink from Figure~\ref{fig:initial}, some counterexamples cause QuickCheck to
-not return after hours of computation.) In each row, we show the maximum, mean,
-and median number of seconds required by QuickCheck to generate a counterexample
-over 100 runs. In the last row, we show the mean size of the final value
-returned, measured by the taking the cumulative length of the lists of the
-fields of \ttp{B}. Notice that while typically counterexamples are returned in
-a fraction of a second, there are outliers taking 21 seconds and 125 seconds for
-the two tests. Both tests roughly cut the size of the counterexample in half;
-increasing the number of potential counterexamples can result in a significant
-performance penalty for this example without much benefit.
+in Table~\ref{table:results}, giving the results for using QuickCheck and
+SmartCheck on the program shown in Figure~\ref{fig:initial} (we omit SmallCheck,
+since as we noted, it is infeasible for an example like this). In the first
+three columns, the results of using QuickCheck to generate counterexamples are
+shown. The first column contains the results of using QuickCheck without
+shrinking to provide a baseline. The next two columns show the result of using
+QuickCheck with definitions of \ttp{shrink} that limit the lists of fields of
+\ttp{B} to 10 (Figure~\ref{lst:newshrink}) and 20 elements
+(Figure~\ref{lst:newshrink}, with \ttp{len = 20}), respectively. (Recall that
+with the original definition for shrink from Figure~\ref{fig:initial}, some
+counterexamples cause QuickCheck to not return after hours of computation.) In
+each row, we show the maximum, mean, and median number of seconds required by
+QuickCheck to generate a counterexample over 100 runs. In the last row, we show
+the mean size of the final value returned, measured by the taking the cumulative
+length of the lists of the fields of \ttp{B}. Notice that while typically
+counterexamples are returned in a fraction of a second, there are outliers
+taking 21 seconds and 125 seconds for the two tests. Both tests roughly cut the
+size of the counterexample in half; increasing the number of potential
+counterexamples can result in a significant performance penalty for this example
+without much benefit.
In the last column are the results for SmartCheck. Not only does it execute
faster than QuickCheck with shrinking, it has significantly smaller outliers.
@@ -578,8 +580,7 @@ \subsection{Reduction Algorithm Overview}\label{sec:reduct}
s = sizedArbitrary sz a
>>= return . SubVal
-test :: SubTypes a
- => a -> Idx -> [SubVal]
+test :: SubTypes a => a -> Idx -> [SubVal]
-> (a -> Property) -> Maybe a
test cex idx vs prop = go vs
where
@@ -632,7 +633,7 @@ \subsection{Reduction Algorithm Details and Optimizations}
The reduction algorithm description above omits some minor details and
optimizations we describe here.
-\paragraph{Sum-type counterexample hypothesis}
+\subsubsection{Sum-type counterexample hypothesis}
First, recall that the algorithm begins at the \emph{first} sub-value of the
counterexample rather than the 0th sub-value. The 0th sub-value of \ttp{cex} is
\ttp{cex} itself. No invariant of the algorithm would be violated by beginning
@@ -655,7 +656,7 @@ \subsection{Reduction Algorithm Details and Optimizations}
smaller than the original counterexample is something we have seen before: it is
precisely the naive approach to implementing \ttp{shrink}!
-\paragraph{Conditional properties}
+\subsubsection{Conditional properties}
A QuickCheck property may be a conditional of the form
%
\begin{code}
@@ -670,7 +671,7 @@ \subsection{Reduction Algorithm Details and Optimizations}
counterexample \ttp{cex} to have failed if it passes the property's
precondition and fails the consequent.
-\paragraph{Sub-value counterexample hypothesis}
+\subsubsection{Sub-value counterexample hypothesis}\label{sec:subval}
Sometimes, a counterexample fails a property due to a sub-value nested deep
inside the counterexample. The rest of the value is irrelevant. We call this
the \emph{sub-value counterexample hypothesis.} Thus, one way to efficiently
@@ -717,11 +718,11 @@ \subsection{Reduction Algorithm Details and Optimizations}
and then test the property
%
\begin{code}
-div_prop e = divSubTerms e ==> eval e /= Nothing
+prop_div e = divSubTerms e ==> eval e /= Nothing
\end{code}
%
\noindent
-The property \ttp{div\_prop} turns out to be false; for example,
+The property \ttp{prop\_div} turns out to be false; for example,
%
\begin{code}
Div (C 1) (Div (C 0) (C 1))
@@ -796,7 +797,7 @@ \subsection{Reduction Algorithm Details and Optimizations}
current sub-value and if it fails the property, the function tests the sub-value
directly.
-\paragraph{Base types}
+\subsubsection{Base types}
The number of constructors in an algebraic data value primarily determines how
large and understandable it is. In our experience, the size of fields that come
@@ -816,7 +817,7 @@ \subsection{Reduction Algorithm Details and Optimizations}
user knows that some portion of the structure cannot be shrunk or is irrelevant
to the property.
-\paragraph{Bounding counterexample exploration}
+\subsubsection{Bounding counterexample exploration}
SmartCheck's implementation contains flags to allow the user to customize its
behavior. Three flags that are relevant to the reduction algorithm are the
@@ -923,14 +924,18 @@ \subsection{Universal Sub-Value Generalization}
=> a -> (a -> Property) -> IO [Idx]
extrapolate cex prop = extrapolate' 1 []
where
- extrapolate' idx idxs =
- case index cex idx of
- Nothing -> return idxs
- Just v -> do
- vs <- newVals v scMaxSize scMaxSuccess
- if allFail cex idx vs prop
- then extrapolate' (idx+1) (idx:idxs)
- else extrapolate' (idx+1) idxs
+ extrapolate' idx idxs
+ | or (map (subTree cex idx) idxs) =
+ extrapolate' (idx+1) idxs
+ extrapolate' idx idxs
+ | otherwise =
+ case index cex idx of
+ Nothing -> return idxs
+ Just v -> do
+ vs <- newVals v scMaxSize scMaxSuccess
+ if allFail cex idx vs prop
+ then extrapolate' (idx+1) (idx:idxs)
+ else extrapolate' (idx+1) idxs
allFail :: SubTypes a
=> a -> Idx -> [SubVal]
@@ -949,8 +954,20 @@ \subsection{Universal Sub-Value Generalization}
algorithm is similar to the reduction algorithm in Figure~\ref{fig:reduction},
and in the implementation, the algorithms are generalized and combined. The
function \ttp{extrapolate} returns a list of indexes to be generalized in the
-original counterexample. New sub-values are generated by \ttp{newVals}, shown
-in Figure~\ref{fig:reduction}. Here, we bound the size of values to generate by
+original counterexample. In the recursive function \ttp{extrapolate'}, there is
+a function guard in which a call to \ttp{subTree}
+%
+\begin{code}
+subTree :: SubTypes a => a -> Idx -> Idx -> Bool
+\end{code}
+%
+\ttp{subTree cex idx0 idx1} is true if in \ttp{cex}, the value at index
+\ttp{idx0} is a child of index \ttp{idx1}, in a tree representation of the
+data. This prevents the algorithm from trying to generalize sub-values that are
+abstracted away already since their parents have been generalized.
+
+New sub-values are generated by \ttp{newVals}, shown in
+Figure~\ref{fig:reduction}. Here, we bound the size of values to generate by
the constant \ttp{scMaxSize}, which is independent of the size of the particular
sub-value.
@@ -970,10 +987,69 @@ \subsection{Universal Sub-Value Generalization}
constant \ttp{scMinExtrap} is the minimum number of \ttp{Just False} results
required from \ttp{fail} to extrapolate from failed tests to a universal claim.
-
+\todo{Add discussion here}
\subsection{Existential Sub-Value Generalization}
+Sum types denote choice in a data type. Sometimes, a property over a sum type
+fails because there is a bug for some of the variants but not others. For
+example, recall the calculator language from Section~\ref{sec:subval}. The
+property \ttp{prop\_div} fails only for values that contain a variant tagged
+with the \ttp{Div} constructor. For example, consider the following counterexample:
+%
+\begin{code}
+Div (Add (C 20) (C 30))
+ (Add (C 89) (C -89))
+\end{code}
+%
+\noindent
+The counterexample fails because the divsor evaluates to zero. Using universal
+sub-value generalization described above, the counterexample is generalized as
+follows:
+%
+\begin{code}
+forall x . Div x (Add (C 89) (C -89))
+\end{code}
+%
+\noindent
+So we know there is something particular about the \ttp{Add (C 89) (C -89)},
+because there are some values that we can replace this sub-value with in the
+original counterexample causing it to satisfy the property. But we might wonder
+if there is something special about variants tagged by the \ttp{Add}
+constructor, or might we finding failing sub-values with the other variants
+tagged by the constructors \ttp{C} and \ttp{Div}? If so, then we know there is
+not some bug being triggered by \ttp{Add} variants specifically, so it is more
+likely that the bug involves how the second field of the \ttp{Div} variant is
+evaluated.
+
+We therefore introduce another kind of generalization we call \emph{existential
+ sub-value generalization}. In this generalization, if there a counterexample
+containing every possible variant as a sub-value, then we abstract it. For
+example, the following all characterize counterexamples:
+%
+\begin{code}
+forall x . Div x (Add (C 89) (C -89))
+forall x . Div x (C 0)
+forall x . Div x (Div (C 0) (C 1))
+\end{code}
+%
+\noindent
+Because there are only three variants in the type, we have found counterexamples
+built from each of them in the second field of the outermost \ttp{Div}
+constructor. We therefore return a formula like the following:
+%
+\begin{code}
+forall values x .
+ forall constructors C\(\sb{0}\) .
+ there exist arguments \(\stackrel{\rightarrow}{y}\) .
+ such that Div x (C\(\sb{0} \stackrel{\rightarrow}{y}\))
+\end{code}
+%
+
+
+
+
+
\subsection{Automated Invariant Strengthening}
\begin{figure}[ht!]

0 comments on commit 07bb2f4

Please sign in to comment.