Skip to content

# leepike/SmartCheck

### Subversion checkout URL

You can clone with
or
.

paper updates.

• Loading branch information...
commit c6516f697d4f0e4ea8879a8163e030ac75ea55e0 1 parent 9b7f65f
authored
6 examples/Div0.hs
 @@ -38,9 +38,9 @@ instance Arbitrary Exp where , liftM2 Div mkM' mkM' ] where mkM' = mkM =<< choose (0,n-1) - shrink (C i) = map C (shrink i) - shrink (Add e0 e1) = [e0, e1] - shrink (Div e0 e1) = [e0, e1] + -- shrink (C i) = map C (shrink i) + -- shrink (Add e0 e1) = [e0, e1] + -- shrink (Div e0 e1) = [e0, e1] -- property: so long as 0 isn't in the divisor, we won't try to divide by 0. -- It's false: something might evaluate to 0 still.
26 paper/Example/InitEx2.hs
 @@ -84,7 +84,14 @@ instance Arbitrary T where #if defined(qcNone) || defined(sc) || defined(feat) shrink _ = [] -#else +#endif +#ifdef qcjh + shrink (T w i0 i1 i2 i3) = + let xs = shrink (w, i0, i1, i2, i3) in + let go (w', i0', i1', i2', i3') = T w' i0' i1' i2' i3' in + map go xs +#endif +#if defined(qc10) || defined(qc20) shrink (T w i0 i1 i2 i3) = [ T a b c d e | a <- tk w , b <- tk i0, c <- tk i1 @@ -135,18 +142,21 @@ test rnds run = do let app str = appendFile logFile (str ++ "\n") let avg vals = sum vals / fromIntegral rnds' let med vals = sort vals !! (rnds' div 2) + let stdDev vals = sqrt (avg distances) + where + distances = map (\x -> (x - m)^2) vals + m = avg vals app "***************" print res let times = fst $unzip res' let szs :: [Double] szs = map fromIntegral (snd$ unzip res') - app $"Num : " ++ show rnds' - app$ "Max : " ++ show (maximum times) ++ ", " ++ show (maximum szs) - app $"Avg : " ++ show (avg times) ++ ", " ++ show (avg szs) - app$ "Med : " ++ show (med times) ++ ", " ++ show (med szs) - -- app $"Size : " ++ show (fromIntegral (sum szs) / - -- fromIntegral rnds' :: Double) + app$ "Num : " ++ show rnds' + app $"std dev : " ++ show (stdDev$ map (fromRational . toRational) times) + ++ ", " ++ show (stdDev szs) + app $"Avg : " ++ show (avg times) ++ ", " ++ show (avg szs) + app$ "Med : " ++ show (med times) ++ ", " ++ show (med szs) app "" app "" @@ -189,7 +199,7 @@ main = do #ifdef sc test rnds runSC #endif -#if defined(qcNone) || defined(qc10) || defined(qc20) +#if defined(qcNone) || defined(qc10) || defined(qc20) || defined(qcjh) test rnds (runQC stdArgs prop) #endif
4 paper/Example/Makefile
 @@ -1,5 +1,5 @@ -#TARGETS=qcNone.out qc10.out qc20.out feat.out sc.out -TARGETS=qcNone.out #qc10.out feat.out sc.out +#TARGETS=qcNone.out qc10.out qc20.out feat.out qcjh.out sc.out +TARGETS=qcjh.out sc.out LOG=init.log RNDS=100 SRC=InitEx2
7 paper/Paper.hs
 @@ -62,7 +62,7 @@ class Arbitrary a => SubTypes a where subVals :: a -> Tree SubVal constr :: a -> String constrs :: a -> [String] --- baseType :: a -> Bool + baseType :: a -> Bool -------------------------------------------------------------------------------- @@ -218,13 +218,12 @@ matchesShapes :: SubTypes a => a -> [(a,[Idx])] -> Bool matchesShapes d = any (matchesShape d) --------------------------------------------------------------------------------- - -- | At each index that we generalize (either value generalization or -- constructor generalization), we replace that value from b into a. At this -- point, we check for constructor equality between the two values, decending -- their structures. -matchesShape :: SubTypes a => a -> (a, [Idx]) -> Bool +matchesShape :: SubTypes a + => a -> (a, [Idx]) -> Bool matchesShape a (b, idxs) | constr a /= constr b = False | Just a' <- aRepl
22 paper/paper.bib
 @@ -19,6 +19,28 @@ @inproceedings{feat year = 2012 } +@inproceedings{telecom, + author = {Thomas Arts and + John Hughes and + Joakim Johansson and + Ulf T. Wiger}, + title = {Testing telecoms software with quviq QuickCheck}, + booktitle = {ACM SIGPLAN Workshop on Erlang Erlang Workshop}, + year = {2006}, + publisher = {ACM}, + pages = {2-10} +} + +@inproceedings{qcjh, + author = {John Hughes}, + title = {Software Testing with QuickCheck}, + booktitle = {Central European Functional Programming School (CEFP)}, + year = {2010}, + publisher = {Springer}, + volume = {6299}, + series = {LNCS}, + pages = {183-223} +} @Misc{ghc, key = {GHC},
532 paper/paper.tex
 @@ -119,10 +119,10 @@ test-case generation. Because QuickCheck performs random testing, it can discover very large counterexamples. QuickCheck provides an interface for the user to write \emph{shrink} functions to attempt to reduce the size of -counterexamples discovered by QuickCheck. Implementations of \emph{shrink} -often require significant boilerplate and usually just generate smaller tests -than the discovered counterexample. Therefore, shrinking can be inefficient and -poor at discovering new, smaller counterexamples. In this paper, we introduce +counterexamples discovered by QuickCheck. Implementations of shrink often +require significant boilerplate and usually just generate smaller tests than the +discovered counterexample. Therefore, shrinking can be inefficient and poor at +discovering new, smaller counterexamples. In this paper, we introduce \emph{SmartCheck}, an automatic efficient counterexample reduction algorithm. SmartCheck reduces algebraic data using generic search heuristics to efficiently find smaller counterexamples. In addition to shrinking, SmartCheck generalizes @@ -174,6 +174,21 @@ \section{Introduction}\label{sec:intro} to attempt to find a smaller counterexample. Today, \ttp{smaller} is called \ttp{shrink}. +In industrial uses, shrinking is essential. Hughes, the inventor of QuickCheck +and the CEO of Quiviq, a company that commercializes the technology, has noted +that without it [shrinking], randomly generated failing cases would often be +so large as to be almost useless.''~\cite{qcjh}. Hughes~\emph{et al.} give an +extended example in which shrinking is essential in debugging telecom +software~\cite{telecom}. + +Typically, effective shrinking on complex data requires a good understanding of +how shrinking in QuickCheck works and the semantics of the property and program +being evaluated. Bad definitions can be so inefficient that they are unusable. +Shrinking presents a programmer's dilemma: shrinking is used to better +understand a counterexample to understand why a property failed, but to +implement an efficient shrink function in general requires knowing how to +generate counterexamples to the property. + \begin{figure}[ht] \begin{code} data N a = N a @@ -198,46 +213,41 @@ \section{Introduction}\label{sec:intro} \label{fig:initial} \end{figure} -While \ttp{shrink} is user-defined and can have any type-correct implementation, -it is typically defined using structural recursion for composite data types. -Indeed, because of the regular structure of \ttp{shrink} implementations, the -principal motivation for one of Haskell's first generics package was for -providing generic definitions for \ttp{shrink}~\cite{syb}. However, simple -structural recursion may either miss smaller counterexamples, be too inefficient -to be practical, or both.\footnote{For a recent real-world case, see the user - question posed on StackOverflow: - \url{http://stackoverflow.com/questions/8788542/how-do-i-get-good-small-shrinks-out-of-quickcheck}.} - -Consider the example in Figure~\ref{fig:initial}.\footnote{All examples and - algorithms in this paper are presented in Haskell. We strive to use idiomatic - Haskell, and only rely on the language extensions of existential type - quantification and scoped type variables.~\cite{haskell98}.} Data type +To illustrate, consider the example in Figure~\ref{fig:initial}.\footnote{All + examples and algorithms in this paper are presented in Haskell. We strive to + use idiomatic Haskell, and only rely on the language extensions of existential + type quantification and scoped type variables.~\cite{haskell98}.} Data type \ttp{T} is a product type containing four lists of \ttp{Int16}s and a single -\ttp{Word8}. For our purposes, it is not important what \ttp{T} models, but -for concreteness, suppose it models the values in four linked lists in an -operating system, together with an 8-bit status register. +\ttp{Word8}. For our purposes, it is not important what \ttp{T} models, but for +concreteness, suppose it models the values in four linked lists in an operating +system, together with an 8-bit status register. Now suppose we are modeling some program that serializes values of type \ttp{T}. The input to the program satisfies the invariant \ttp{pre}, that the sum of values in each list of \ttp{Int16}s is less than or equal to 256 (and by definition, the \ttp{Word8} value is less than or equal to 256). Assuming this, we want to show \ttp{post} holds, that the sum of all the values from \ttp{T} is -less than $5 * 256$. At first glance, the property seems reasonable, but we -have forgotten about underflows. For example, the value +less than $5 * 256$, where five is the number of fields in \ttp{T}. At first +glance, the property seems reasonable. But we have forgotten about underflow; +for example, the value % \begin{code} T (N 10) [N (-20000)] [N (-20000)] [] [] \end{code} % +\noindent fails \ttp{prop} (\ttp{==>} is implication from the QuickCheck library). -Unfortunately, a typical counterexample returned by QuickCheck is not quite so -simple. With standard settings, it returns a value containing just under 70 -\ttp{N} constructors, and sometimes values contain over 100 constructors. Thus, -it pays to define \ttp{shrink}! -A typical definition of \ttp{shrink} for a data type like \ttp{T} is to take the -cross-product of shrunk values over the arguments to the constructor \ttp{T}. -This can be expressed using Haskell's list-comprehension notation: +Despite the simplicity of the example, a typical counterexample returned by +QuickCheck is not quite so simple. With standard settings, the median +counterexample discovered contains around 70 \ttp{N} constructors, and +counterexamples can contain over 100 constructors. Thus, it pays to define +\ttp{shrink}! + +We might first naively try to shrink counterexamples for a data type like \ttp{T} +by taking the cross-product of shrunk values over the arguments to the +constructor \ttp{T}. This can be expressed using Haskell's list-comprehension +notation: % \begin{code} shrink (T w i0 i1 i2 i3) = @@ -247,49 +257,59 @@ \section{Introduction}\label{sec:intro} \end{code} % \noindent -The result of (\ttp{shrink t}) with this definition is null if any list -contained in \ttp{t} is null. Still, for typical counterexamples returned by -QuickCheck that do not contain empty lists, together with QuickCheck's default -\ttp{Arbitrary} instances, the length of potential counterexamples returned by -\ttp{shrink} can easily exceed $10^{10}$, which is an intractable number of -tests for QuickCheck to analyze. The time complexity is compounded by the -shrink stage being recursive, so when a new smaller counterexample is found, it -is also shrunk. In practice, QuickCheck appears to loop indefinitely. - -It is not always straightforward to determine how to make a more efficient -implementation for \ttp{shrink}. Shrinking presents a programmer's dilemma: -shrinking is used to better understand a counterexample to understand why a -property failed, but to implement an efficient shrink function in general -requires knowing how to generate counterexamples to the property. - -Typically, a programmer tries a brute-force approach to reduce the number of -potential counterexamples; for example, she may try to truncate lists using the +While the above definition appears reasonable, there are two problems with it. +First, the result of (\ttp{shrink t}) is null if any list contained in \ttp{t} +is null, then \ttp{t} will not be shrunk. More troublesome is that with +QuickCheck's default \ttp{Arbitrary} instances, the length of potential +counterexamples returned by \ttp{shrink} can easily exceed $10^{10}$, which is +an intractable number of tests for QuickCheck to analyze. The reason for the +blowup is that shrinking a list \ttp{[a]} produces a list \ttp{[[a]]}, the +length of which is significantly longer than the original list. Then, we take +the cross-product of the generated lists. For the example above, with some +counterexamples, the shrinking stage may appear to loop for hours, consuming +ever more memory, without returning a result. + +A programmer might try to control the complexity by truncating lists using the (\ttp{take n}) function that returns the first $n$ elements of a list. The trade-off is quicker shrinking with a lower probability of finding a smaller counterexample. For example, we might redefine shrink as follows: % -%\begin{figure}[ht] \begin{code} -len = 10 - shrink (T w i0 i1 i2 i3) = [ T a b c d e | a <- tk w , b <- tk i0, c <- tk i1 , d <- tk i2, e <- tk i3 ] - where tk x = take len (shrink x) + where tk x = take 10 (shrink x) \end{code} -%% \caption{Truncated shrinking function for data type \ttp{B}.} -%% \label{lst:newshrink} -%% \end{figure} +% +\noindent +Call this version truncated shrink''. Truncation controls the blowup of the +input-space; the maximum number of possible new values is $10^5$ in this case. +While truncation controls the blowup, the downside is that potentially smaller +counterexamples may be omitted. -%% \end{lstlisting} -%% \end{minipage} +A more clever programmer that understands the semantics of QuickCheck's +\ttp{shrink} implementation,\footnote{The following approach was suggested to + the author by John Hughes.} defines a \ttp{shrink} instance as follows: +% +\begin{code} +shrink (T w i0 i1 i2 i3) = + let xs = shrink (w, i0, i1, i2, i3) in + let go (w', i0', i1', i2', i3') = + T w' i0' i1' i2' i3' in + map go xs +\end{code} % \noindent -let us call this version truncated shrink''. Truncating the controls the -blowup of the input-space; the number of possible new values is $10^5$ in this -case. While truncation controls the blowup, the downside is that potentially -smaller counterexamples may be omitted. +This tuple shrink'' definition does not suffer the same shortcomings: it +shrinks a value even if it contains an empty list, and the combinatorial blowup +of shrink candidates is avoided, since a pair \ttp{(a,b)} is shrunk by +attempting to shrink \ttp{a} while holding \ttp{b} constant, then attempting to +shrink \ttp{b} while holding \ttp{a} constant, and the approach is generalized +to larger tuples. Still, with this definition, in some cases, shrinking takes +over 20 seconds and returns a result containing over 60 +constructors.\footnote{Results are with a GHC-compiled program, using \ttp{-O2}, + on a 4-core 2.7GHz Intel~i7 processor.} The programmer might ask herself: What if we omit the need for shrinking counterexamples altogether?'' SmallCheck is another testing framework for @@ -324,6 +344,7 @@ \section{Introduction}\label{sec:intro} combines some aspects of SmallCheck and QuickCheck~\cite{feat}; we discuss it more in the related work (Section~\ref{sec:related}). + %% is unable to verify the property from Figure~\ref{fig:initial}, for much the %% same reason that \ttp{shrink} is inefficient @@ -386,67 +407,52 @@ \section{Introduction}\label{sec:intro} %% violate the precondition. \end{itemize} + \begin{table}[ht] \footnotesize \begin{center} \begin{tabular}{|r||c|c|c|c|} \hline - & QC (none) & QC (10) & QC (20) & SmartCheck \\ + & QC (none) & QC (trunc) & QC (tuple) & SmartCheck\\ \hline \hline -Max. & 0.15s & 21.51s & 125.37s & 1.96s\\ -\hline -Mean & 0.07s & 1.21s & 3.80s & 0.30s\\ +Maximum & 0.40s, 124 & 139.15s, 90 & 97.53s, 53 & 0.s, 13\\ \hline -Median & 0.07s & 0.48s & 0.52s & 0.24s\\ +Mean & 0.08s, 69 & 0.73s, 35 & 2.26s, 13 & 0.07s, 6\\ \hline -Mean size & 70 & 34 & 32 & 5\\ +Median & 0.07s, 68 & 0.18s, 32 & 1.90s, 13 & 0.07s, 5\\ \hline \end{tabular} \end{center} - \caption{Results for 100 tests of the program in Figure~\ref{fig:initial} + \caption{Results for 1000 tests of the program in Figure~\ref{fig:initial} using interpreted Haskell.} \label{table:results} \end{table} - +\todo{table generated with InitEx2 on Feb. 3 --- do a final test with 1000} \paragraph{SmartCheck} -Motivated by these limitations of QuickCheck and SmallCheck, we have developed +Motivated by these limitations of QuickCheck and SmallCheck, we developed \emph{SmartCheck}. SmartCheck extends QuickCheck to efficiently and generically find small counterexamples as well as generalize them as described in the preceding paragraph. To motivate the benefit of SmartCheck, consider Table~\ref{table:results}, giving the results for using QuickCheck and SmartCheck on the program shown in Figure~\ref{fig:initial} (we omit SmallCheck, -since as we noted, it is not even feasible for an example like this). In the -first three columns, the results of using QuickCheck to generate counterexamples -are shown. The first column contains the results of using QuickCheck without -shrinking to provide a baseline. The next two columns show the result of using -QuickCheck with definitions of truncated \ttp{shrink} that limit the number of -smaller values to 10 and 20 elements, respectively. (With the original -definition for shrink from Figure~\ref{fig:initial}, some counterexamples cause -QuickCheck to not return during the shrinking phase after hours of computation, -so we omit these.) - -In each row, we show the maximum, mean, and median number of seconds required by -QuickCheck to generate a counterexample over 100 runs. In the last row, we show -the mean size of the final value returned, measured by the taking the cumulative -length of the lists of the fields of \ttp{B}. Notice that while typically -counterexamples are returned in a fraction of a second, there are outliers -taking 21 seconds and 125 seconds for the two tests. Both tests roughly cut the -size of the counterexample in half; increasing the number of potential -counterexamples can result in a significant performance penalty for this example -without much benefit. - -In the last column are the results for SmartCheck. Not only does it execute -faster than QuickCheck with shrinking, it has significantly smaller outliers. -The most striking difference, however, is the size of the counterexample -returned, reducing on average the size of a counterexample found by QuickCheck -without shrinking by a factor of 14. - -The purpose of this paper is to explain first how to efficiently and generically -generate small counterexamples for algebraic data types and second how to -generalize the counterexamples. - -\todo{check paper against this section again, especially generalization sec.} +since as noted, it is not even feasible for the example). We show the maximum, +mean, and median amount of time required to generate a counterexample and to +shrink it (and print it to standard out) as well as the corresponding number of +constructors in the final result. + +In the first three columns, the results of using QuickCheck to generate +counterexamples are shown. The first column contains the results of using +QuickCheck without shrinking to provide a baseline. The next column shows the +result of using QuickCheck with truncating shrink, which each list is truncated +to 10 elements. Next, we show the results of the tuple shrink'' instance. In +the last column are the results for SmartCheck, which includes the time required +by QuickCheck to generate the original counterexample. Not only does SmartCheck +execute faster than the other strategies, it has smaller outliers. In addition, +the size of the counterexamples are significantly smaller. And we stress that +SmartCheck achieves these results \emph{generically}. + +\todo{change max to std dev, look for max info above.} \paragraph{Contributions} This paper, and the library it describes, makes the following contributions: @@ -476,7 +482,9 @@ \section{Introduction}\label{sec:intro} \noindent In Section~\ref{sec:experiments}, we discuss some of our experiences with using -SmartCheck, including checking properties from Xmonad. +SmartCheck, including checking properties from Xmonad, and related work and +concluding remarks are made in Sections~\ref{sec:related} +and~\ref{sec:conclusions}, respectively. %% \noindent Moreover, the inability to quickly uncover understandable %% counterexamples is a real-world problem with QuickCheck (or other functional @@ -501,10 +509,6 @@ \section{Introduction}\label{sec:intro} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Preliminaries}\label{sec:preliminaries} - -\todo{make clear that preconditions shouldn't be considered logical - preconditions} - We focus on \emph{algebraic data types}~\cite{}. Algebraic data types are represented as sums of products''. We sometimes refer to the elements of a sum type as a \emph{variant} that is \emph{tagged} by a constructor. For @@ -519,9 +523,10 @@ \section{Preliminaries}\label{sec:preliminaries} respectively. To present the algorithms in the following sections, we provide some definitions -here. Our presentation simplifies the implementation -(Section~\ref{sec:implementation}) somewhat to improve the presentation. -Consider the following types and type class. +in the form of methods of a type class. SmartCheck requires instances for data +being analyzed. In Section~\ref{sec:implementation}, we describe how instances +for the type class are derived automatically. Consider the following types and +type class. % \begin{code} type Idx = Int @@ -535,6 +540,7 @@ \section{Preliminaries}\label{sec:preliminaries} constr :: a -> String constrs :: a -> [String] baseType :: a -> Bool + subVals :: a -> Tree SubVal \end{code} % \noindent @@ -553,13 +559,20 @@ \section{Preliminaries}\label{sec:preliminaries} \item \ttp{baseType} returns whether the type of the value is an interesting type'', which generally means it has structure (e.g., a sum or product type). We discuss the purpose of base types in Section~\ref{sec:base}. +\item \ttp{subVals} returns a tree of all non base-type sub-values. A tree is a + value with the type +% + \begin{code} +data Tree a = Node \{ rootLabel :: a + , subForest :: [Tree a] \} + \end{code} \end{itemize} \noindent For example, consider a binary tree type: % \begin{code} -data Tree = L | B Tree Tree +data T = L | B T T \end{code} % \noindent @@ -592,6 +605,10 @@ \section{Preliminaries}\label{sec:preliminaries} baseType (3 :: Int) = True baseType L = False + +subVals (B (B L\sub{0} L\sub{1}) L\sub{2}) = + Tree (B L\sub{0} L\sub{1}) [Tree L\sub{0} [], Tree L\sub{1} []] + (Tree L\sub{1} []) \end{code} % \noindent @@ -619,7 +636,8 @@ \section{Preliminaries}\label{sec:preliminaries} In our implementation, \ttp{constr} and \ttp{constrs} depends on GHC Generics~\cite{generics}, which we describe in Section~\ref{sec:implementation}. For simplicity, we omit here Generics-specific super-class constraints on the -\ttp{SubTypes} class. +\ttp{SubTypes} class. Moreover, our presentation simplifies the implementation +(Section~\ref{sec:implementation}) somewhat to improve the presentation. %% Finally, \ttp{baseType} is true for our tree type but false for a %% structureless type, such as \ttp{Int}. @@ -668,24 +686,22 @@ \subsection{Reduction Algorithm Overview}\label{sec:reduct} getSize :: SubVal -> Size getSize (SubVal a) = size a -newVals :: SubVal -> Size -> Int -> IO [SubVal] -newVals (SubVal a) sz tries = - replicateM tries s - where +newVals :: Size -> Int -> SubVal -> IO [SubVal] +newVals sz tries (SubVal a) = + replicateM tries s where s = liftM SubVal (sizedArbitrary sz a) reduce :: SubTypes a - => a -> (a -> Property) -> IO a -reduce cex prop = reduce' 1 + => ScArgs -> (a -> Property) -> a -> IO a +reduce args prop cex = reduce' 1 where - reduce' idx = - case index cex idx of - Nothing -> return cex - Just v -> do - vs <- newVals v (getSize v) scMaxReduce - case test cex idx vs prop of - Nothing -> reduce' (idx+1) - Just cex' -> reduce cex' prop + reduce' idx + | Just v <- index cex idx + = do vs <- newVals (getSize v) + (scMaxReduce args) v + maybe (reduce' (idx+1)) (reduce args prop) + (test cex idx vs prop) + | otherwise = return cex test :: SubTypes a => a -> Idx -> [SubVal] -> (a -> Property) -> Maybe a @@ -701,10 +717,11 @@ \subsection{Reduction Algorithm Overview}\label{sec:reduct} \end{figure} Figure~\ref{fig:reduction} shows the reduction algorithm. The function -\ttp{reduce} takes a counterexample \ttp{cex} and the property \ttp{prop}. At -the beginning of execution, we assume that a counterexample has been obtained -somehow; our implementation relies on QuickCheck as a front-end'' to generate -a counterexample that the algorithm automatically shrinks. The reduction begins +\ttp{reduce} takes arguments to customize the algorithm's behavior, a +counterexample \ttp{cex} and the property \ttp{prop}. At the beginning of +execution, we assume that a counterexample has been obtained somehow; our +implementation relies on QuickCheck as a front-end'' to generate a +counterexample that the algorithm automatically shrinks. The reduction begins at the first proper sub-value of \ttp{cex}; call it \ttp{v}. When the index \ttp{idx} becomes out-of-bounds and returns \ttp{Nothing}, the algorithm terminates. Otherwise, a list of new random values are generated. @@ -720,8 +737,6 @@ \subsection{Reduction Algorithm Overview}\label{sec:reduct} successfully larger counterexamples when generating new values with which to replace a sub-value. -\todo{explain getSize} - \ttp{sizedArbitrary} returns a value inside the \ttp{IO} monad since it depends on a random seed for generation (we could pass in the generator seed, but for simplicity, let us assume one is generated locally). The constant @@ -742,7 +757,7 @@ \subsection{Reduction Algorithm Overview}\label{sec:reduct} \ttp{cex'}, attempting to shrink it further. The algorithm is guaranteed to terminate: intuitively, the measure for the function is that either the index increases or the size of the counterexample being evaluated decreases. The -complexity of the algorithm is $\Omega(n^2)$, where $n$ is the number of +complexity of the algorithm is $\mathcal{O}(n^2)$, where $n$ is the number of constructors in the counterexample. The analysis assumes generating new sub-values and testing them is done in constant time. @@ -934,29 +949,24 @@ \subsubsection{Sub-value Counterexample Hypothesis}\label{sec:subval} \begin{figure} \begin{code} -reduce0 :: forall a . SubTypes a - => a -> (a -> Property) -> IO a -reduce0 cex prop = reduce' 1 +reduceOpt :: forall a . SubTypes a + => ScArgs -> (a -> Property) -> a -> IO a +reduceOpt args prop cex = reduce' 1 where - reduce' idx = - case index cex idx of - Nothing -> return cex - Just v -> case testHole v of - Just v' -> reduce0 v' prop - Nothing -> test' v idx + reduce' idx + | Just v <- index cex idx + = maybe (test' v idx) (reduceOpt args prop) + (testHole v) + | otherwise = return cex test' v idx = do - vs <- newVals v (getSize v) scMaxReduce - case test cex idx vs prop of - Nothing -> reduce' (idx+1) - Just cex' -> reduce0 cex' prop - - testHole (SubVal a) = - case cast a :: Maybe a of - Nothing -> Nothing - Just a' -> - if pass prop a' then Nothing - else Just a' + vs <- newVals (getSize v) (scMaxReduce args) v + maybe (reduce' (idx+1)) (reduceOpt args prop) + (test cex idx vs prop) + + testHole (SubVal a) = do + a' <- cast a :: Maybe a + if pass prop a' then Nothing else Just a' \end{code} \caption{Reduction algorithm with the sub-value counterexample optimization.} \label{fig:reduce0} @@ -967,7 +977,7 @@ \subsubsection{Sub-value Counterexample Hypothesis}\label{sec:subval} sub-value \ttp{cex'} that has the same type as \ttp{cex} and fails the property (while passing any preconditions). In this case, we can return \ttp{cex'} directly, and rerun the reduction algorithm on \ttp{cex'}. In -Figure~\ref{fig:reduce0}, we show an updated reduction algorithm, \ttp{reduce0} +Figure~\ref{fig:reduce0}, we show an updated reduction algorithm, \ttp{reduceOpt} that implements this optimization. The function \ttp{testHole} tests the current sub-value and if it fails the property, the function tests the sub-value directly. @@ -1083,26 +1093,26 @@ \subsection{Universal Sub-Value Generalization}\label{sec:universal} subTrees cex idx = any (subTree cex idx) extrapolate :: SubTypes a - => a -> (a -> Property) -> IO [Idx] -extrapolate cex prop = extrapolate' 1 [] + => ScArgs -> a -> (a -> Property) -> IO [Idx] +extrapolate args cex prop = extrapolate' 1 [] where extrapolate' idx idxs | subTrees cex idx idxs = extrapolate' (idx+1) idxs - | otherwise - = case index cex idx of - Nothing -> return idxs - Just v -> do - vs <- newVals v scMaxSize scMaxExtrap - extrapolate' (idx+1) - (if allFail cex idx vs prop - then idx:idxs else idxs) - -allFail :: SubTypes a - => a -> Idx -> [SubVal] - -> (a -> Property) -> Bool -allFail cex idx vs prop = - length res >= scMinExtrap && and res + | Just v <- index cex idx = mkNewVals v + | otherwise = return idxs + where + mkNewVals v = do + vs <- newVals (scMaxSize args) + (scMaxExtrap args) v + extrapolate' (idx+1) + (if allFail args cex idx vs prop + then idx:idxs else idxs) + +allFail :: SubTypes a => ScArgs -> a -> Idx + -> [SubVal] -> (a -> Property) -> Bool +allFail args cex idx vs prop = + length res >= scMinExtrap args && and res where res = mapMaybe go vs go = fail prop . replace cex idx @@ -1161,61 +1171,6 @@ \subsection{Universal Sub-Value Generalization}\label{sec:universal} of counterexamples to a universal claim, similar to QuickSpec~\cite{qs}. By tuning the parameters, the risk of an unsound generalization is reduced. -\subsection{Automated Precondition Strengthening}\label{sec:precondition} -The extrapolation algorithm generalizes a counterexample, but in the -neighborhood'' of he original counterexample. In particular, all -generalizations are from the same variant as the original counterexample. To -help the programmer in the generalization step, we would also like a way to test -the property again, ensuring we get counterexamples (if they exist) outside of -the neighborhood of the original one. - -\begin{figure}[ht!] - \begin{center} - \includegraphics[scale=0.49]{Figs/cex-gen} - \end{center} - \caption{Counterexample generalization.} - \label{fig:cex-gen} -\end{figure} - -Figure~\ref{fig:cex-gen} illustrates the situation in which we have some values -and a property in the case that some values fail the property of the form -(\ttp{pre ==> prop}). Our goal is to find additional counterexamples in the -shaded region, in which \ttp{pre} is satisfied, \ttp{prop} is not satisfied, and -the counterexample is not already characterized by a universal sub-value -generalization formula. Iterating, we increasingly cover the shaded region -with new formulas until no more uncovered counterexamples exist or the testing -tool fails to find new counterexamples. - -\begin{figure} - \begin{code} -matchesShapes :: SubTypes a => a -> [(a,([Idx],[Idx]))] -> Bool -matchesShapes d = any (matchesShape d) - -matchesShape :: SubTypes a => a -> (a, ([Idx],[Idx])) -> Bool -matchesShape a (b, (idxVals, idxConstrs)) - | constr a /= constr b = False - | Just a' <- aRepl = let x = subForest (subVals a') in - let y = subForest (subVals b) in - all foldEqConstrs (zip x y) - | otherwise = False - - where - foldEqConstrs :: (Tree SubVal, Tree SubVal) -> Bool - foldEqConstrs (Node (SubVal l0) sts0, Node (SubVal l1) sts1) - | constr l0 == constr l1 = next - | otherwise = False - where next = all foldEqConstrs (zip sts0 sts1) - - updateA idx d = fmap (replace d idx) (index b idx) - aRepl = foldl go (Just a) (idxVals ++ idxConstrs) - where go ma idx = maybe Nothing (updateA idx) ma - \end{code} - \caption{Shape matching algorithm.} - \label{fig:matches} -\end{figure} - -\todo{talk about matches shape algorithm, real-eval-print} - \subsection{Existential Sub-Value Generalization} @@ -1269,22 +1224,23 @@ \subsection{Existential Sub-Value Generalization} subConstrs :: SubVal -> [String] subConstrs (SubVal a) = constrs a -sumTest :: SubTypes a - => a -> (a -> Property) -> [Idx] -> IO [Idx] -sumTest cex prop exIdxs = sumTest' 1 [] +sumTest :: SubTypes a => ScArgs -> a + -> (a -> Property) -> [Idx] -> IO [Idx] +sumTest args cex prop exIdxs = sumTest' 1 [] where sumTest' idx idxs | subTrees cex idx (exIdxs ++ idxs) = sumTest' (idx+1) idxs - | otherwise - = case index cex idx of - Nothing -> return idxs - Just v -> do - vs <- newVals v scMaxSize scConstrMax - sumTest' (idx+1) - (if constrFail cex idx vs prop - (subConstr v) (subConstrs v) - then idx:idxs else idxs) + | Just v <- index cex idx = fromSumTest v + | otherwise = return idxs + where + fromSumTest v = do + vs <- newVals (scMaxSize args) + (scConstrMax args) v + sumTest' (idx+1) + (if constrFail cex idx vs prop + (subConstr v) (subConstrs v) + then idx:idxs else idxs) constrFail :: SubTypes a => a -> Idx -> [SubVal] -> (a -> Property) -> String -> [String] -> Bool @@ -1292,15 +1248,12 @@ \subsection{Existential Sub-Value Generalization} constrFail' [con] vs where constrFail' cons vs' - | length cons == length allCons - = True - | null vs' - = False + | length cons == length allCons = True + | null vs' = False | go v == Just True = constrFail' (c:cons) (tail vs') | otherwise = constrFail' cons (tail vs') - where v = head vs' c = subConstr v @@ -1346,6 +1299,65 @@ \subsection{Existential Sub-Value Generalization} is sound. The existential claim is only that for each variant, there exists at least one counterexample, so inductive reasoning is not required. +\subsection{Automated Precondition Strengthening}\label{sec:precondition} +The extrapolation algorithm generalizes a counterexample, but in the +neighborhood'' of he original counterexample. In particular, all +generalizations are from the same variant as the original counterexample. To +help the programmer in the generalization step, we would also like a way to test +the property again, ensuring we get counterexamples (if they exist) outside of +the neighborhood of the original one. + +\begin{figure}[ht!] + \begin{center} + \includegraphics[scale=0.49]{Figs/cex-gen} + \end{center} + \caption{Counterexample generalization.} + \label{fig:cex-gen} +\end{figure} + +Figure~\ref{fig:cex-gen} illustrates the situation in which we have some values +and a property in the case that some values fail the property of the form +(\ttp{pre ==> prop}). Our goal is to find additional counterexamples in the +shaded region, in which \ttp{pre} is satisfied, \ttp{prop} is not satisfied, and +the counterexample is not already characterized by a universal sub-value +generalization formula. Iterating, we increasingly cover the shaded region +with new formulas until no more uncovered counterexamples exist or the testing +tool fails to find new counterexamples. + +\begin{figure} + \begin{code} +matchesShapes :: SubTypes a + => a -> [(a,[Idx])] -> Bool +matchesShapes d = any (matchesShape d) + +matchesShape :: SubTypes a + => a -> (a, [Idx]) -> Bool +matchesShape a (b, idxs) + | constr a /= constr b = False + | Just a' <- aRepl + = let x = subForest (subVals a') in + let y = subForest (subVals b) in + all foldEqConstrs (zip x y) + | otherwise = False + where + updateA idx d = + fmap (replace d idx) (index b idx) + aRepl = foldl go (Just a) idxs where + go ma idx | Just x <- ma = updateA idx x + | otherwise = Nothing + foldEqConstrs ( Node (SubVal l0) sts0 + , Node (SubVal l1) sts1 ) + | constr l0 == constr l1 = next + | otherwise = False + where next = all foldEqConstrs (zip sts0 sts1) + \end{code} + \caption{Shape matching algorithm.} + \label{fig:matches} +\end{figure} + +\todo{talk about matches shape algorithm, real-eval-print} + + \subsection{Generalization Details} \todo{flags, optimizations} @@ -1450,6 +1462,9 @@ \section{Related Work}\label{sec:related} Feat provides a functional representation of enumerations, allowing for fast indexing of large values in an enumeration. +\todo{Instances for \ttp{shrink} are user-defined for complex data types (although a motivation for +one of Haskell's first generics package was to provide generic +shrinking~\cite{syb}).} \todo{avoid complexities \url{http://stackoverflow.com/questions/14006005/idiomatic-way-to-shrink-a-record-in-quickcheck}% % @@ -1477,12 +1492,37 @@ \section{Related Work}\label{sec:related} %------------------------------------------------------------------------------- \section{Conclusions and Future Work}\label{sec:conclusions} -\todo{how to display counterexamples} -\todo{Use Bayes' rule to do extrapolation?} - -\todo{future work: higher-kinded data, functions} - -\todo{add lazy testing---proof value is unused} +We have presented new approaches for generically shrinking and generalizing +counterexamples over algebraic data. Shrinking has thus far been as much an art +as a science, requiring experience with QuickCheck and deep understanding about +the property being tested. In many cases, SmartCheck automates the laborious +task of shrinking. + +We envision a number of potential extensions and improvements to SmartCheck. +First, we have considered only the simplest kind of data, algebraic data types, +but we would like to extend these results to shrinking functions, too. As noted +in Section~\ref{sec:implementation}, SmartCheck does not work with GADTs +currently, due to limitations with GHC Generics. + +Lazy SmallCheck can test partially-defined inputs by detecting the evaluation of +undefined values~\cite{sc}. This capability is useful in shrinking, too. For +example, the universal sub-value generalization algorithm +(Section~\ref{sec:universal}) could be extended to shortcut testing and +generalize a sub-value if it is not evaluated in testing the property. Not only +does this shortcut the generalization phase, but it gives a proof that the +sub-value can be generalized. + +SmartCheck displays (generalized) counterexamples in a form similar to default +\ttp{Show} instances or in a tree form, which can be helpful to parse the +components of the value. Better approaches for showing large data types are +needed. + +Finally, we believe there are interesting connections between counterexample +discovery and probabilistic reasoning. Often, when a bug in a program is fixed, +the fix is not correct, or it introduces new bugs. Such knowledge might be +useful in guiding the distribution of test cases or shrinking. + +\todo{final summary} \section*{Acknowledgements} \todo{Thank Rehger for comments/pointing to paper.}
Please sign in to comment.
Something went wrong with that request. Please try again.