L33.tex

\documentclass[11pt]{article}
\usepackage{listings}
\usepackage{tikz}
\usepackage{url}
%\usepackage{algorithm2e}
\usetikzlibrary{arrows,automata,shapes}
\tikzstyle{block} = [rectangle, draw, fill=blue!20, 
    text width=5em, text centered, rounded corners, minimum height=2em]
\tikzstyle{bt} = [rectangle, draw, fill=blue!20, 
    text width=1em, text centered, rounded corners, minimum height=2em]

\newtheorem{defn}{Definition}
\newtheorem{crit}{Criterion}
\newcommand{\true}{\mbox{\sf true}}
\newcommand{\false}{\mbox{\sf false}}

\newcommand{\handout}[5]{
  \noindent
  \begin{center}
  \framebox{
    \vbox{
      \hbox to 5.78in { {\bf Software Testing, Quality Assurance and Maintenance } \hfill #2 }
      \vspace{4mm}
      \hbox to 5.78in { {\Large \hfill #5  \hfill} }
      \vspace{2mm}
      \hbox to 5.78in { {\em #3 \hfill #4} }
    }
  }
  \end{center}
  \vspace*{4mm}
}

\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{#4}{Lecture #1}}
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in

\parindent 0in
\parskip 1.5ex
%\renewcommand{\baselinestretch}{1.25}

\lstdefinelanguage{JavaScript}{
  keywords={typeof, new, true, false, catch, function, return, null, catch, switch, var, if, in, while, do, else, case, break},
  keywordstyle=\color{blue}\bfseries,
  ndkeywords={class, export, boolean, throw, implements, import, this},
  ndkeywordstyle=\color{darkgray}\bfseries,
  identifierstyle=\color{black},
  sensitive=false,
  comment=[l]{//},
  morecomment=[s]{/*}{*/},
  commentstyle=\color{purple}\ttfamily,
  stringstyle=\color{red}\ttfamily,
  morestring=[b]',
  morestring=[b]''
}

\begin{document}

\lecture{33 --- March 27, 2015}{Winter 2015}{Patrick Lam}{version 0}

\section*{Fuzzing}

Consider the following JavaScript code\footnote{\url{http://webkit.sed.hu/blog/20130710/fuzzinator-mutation-and-generation-based-browser-fuzzer}}.
\begin{lstlisting}[language=JavaScript]
function test() {
    var f = function g() {
        if (this != 10) f();
    };
    var a = f();
}
test();
\end{lstlisting}
Turns out that it can crash WebKit (\url{https://bugs.webkit.org/show_bug.cgi?id=116853}).
Plus, it was automatically generated by the Fuzzinator tool, based on a grammar for JavaScript.

Fuzzing is the modern-day implementation of the input space based
grammar testing that we talked about in Lecture 17. While the
fundamental concepts were in the earlier lecture, we will see how
those concepts actually work in practice.  Fuzzing effectively finds
software bugs, especially security-based bugs (caused, for instance,
by a lack of sufficient input validation.)

\paragraph{Origin Story.} It starts with line noise.
In 1988, Prof. Barton Miller was using a 1200-baud dialup modem to
communicate with a UNIX system on a dark and stormy night.  He found
that the random characters inserted by the noisy line would cause his
UNIX utilities to crash. He then challenged graduate students in his
Advanced Operating Systems class to write a fuzzer---a program which
would generate (unstructured ASCII) random inputs for other programs.
The result: the students observed that 25\%-33\% of UNIX utilties
crashed on random inputs\footnote{\url{http://pages.cs.wisc.edu/~bart/fuzz/Foreword1.html}}.
% footnote: fuzzing book foreword

(That was not the earliest known example of fuzz testing. Apple
implemented ``The Monkey'' in 1983\footnote{\url{http://www.folklore.org/StoryView.py?story=Monkey_Lives.txt}}
to generate random events for
MacPaint and MacWrite. It found lots of bugs. The limiting factor was
that eventually the monkey would hit the Quit command.  The solution
was to introduce a system flag, ``MonkeyLives'', and have MacPaint and
MacWrite ignore the quit command if MonkeyLives was true.)
% footnote: folklore.org

\paragraph{How Fuzzing Works.} 
Two kinds of fuzzing: \emph{mutation-based} and
\emph{generation-based}. Mutation-based testing starts with
existing test cases and randomly modifies them to explore new behaviours.
Generation-based testing starts with a grammar and generates
inputs that match the grammar.

One detail that I didn't mention in Lecture 17 was the bug detection
part.  Back then, I just talked about generating interesting
inputs. In fuzzing, you feed these inputs to the program and find
crashes, or assertion failures, or you run the program under a dynamic
analysis tool such as Valgrind and observe runtime errors.

\paragraph{The Simplest Thing That Could Possibly Work.}
Consider generation-based testing for HTML5.  The simplest grammar---actually a regular
expression---that could possibly
work\footnote{\url{http://trevorjim.com/a-grammar-for-html5/}} is {\tt
  .*}, where {\tt .} is ``any character'' and {\tt *} means ``0 or
more''. Indeed, that grammar found the following WebKit assertion failure:
\url{https://bugs.webkit.org/show_bug.cgi?id=132179}.

The process is as described previously. Take the regular expression
and generate random strings from it.  Feed them to the browser and see
what happens. Find an assertion failure/crash.

\paragraph{More sophisticated fuzzing.} Let's say that we're trying to
generate C programs. One could propose the following hierarchy of inputs\footnote{\url{http://www.cs.dartmouth.edu/~mckeeman/references/DifferentialTestingForSoftware.pdf}}:
\begin{enumerate}
\item sequence of ASCII characters;
\item sequence of words, separators, and white space;
\item syntactically correct C program;
\item type-correct C program;
\item statically conforming C program;
\item dynamically conforming C program;
\item model conforming C program.
\end{enumerate}
Each of these levels contains a subset of the inputs from
previous levels. However, as the level increases, we are more likely
to find interesting bugs that reveal functionality specific
to the system (rather than simply input validation issues).

While the example is specific to C, the concept applies to all
generational fuzzing tools. Of course, the system under test
shouldn't ever crash on random ASCII characters. But it's hard
to find the really interesting cases without incorporating knowledge
about correct syntax for inputs (or, as in the Apple case, excluding
the ``quit'' command). Increasing the level should also increase
code coverage.

John Regehr discusses this issue at greater
length\footnote{\url{blog.regehr.org/archives/1039}} and concludes
that generational fuzzing tools should operate at all levels.

\paragraph{Mutation-based fuzzing.}
In mutation-based fuzzing, you develop a tool that randomly modifies existing
inputs. You could do this totally randomly by flipping bytes in the input,
or you could parse the input and then change some of the nonterminals.
If you flip bytes, you also need to update any applicable
checksums if you want to see anything interesting (similar to
level 3 above).

Here's a description of a mutation-based fuzzing workflow by the author of Fuzzinator.
{\small
\begin{quote}
  More than a year ago, when I started fuzzing, I was mostly focusing on mutation-based fuzzer technologies since they were easy to build and pretty effective. Having a nice error-prone test suite (e.g. LayoutTests) was the warrant for fresh new bugs. At least for a while. As expected, the test generator based on the knowledge extracted from a finite set of test cases reached the edge of its possibilities after some time and didn't generate essentially new test cases anymore. At this point, a fuzzer girl can reload her gun with new input test sets and will probably find new bugs. This works a few times but she will soon find herself in a loop testing the common code paths and running into the same bugs again and again.\footnote{\url{http://webkit.sed.hu/blog/20141023/fuzzinator-reloaded}}
\end{quote}
}

%% \paragraph{Example tool: Fuzzinator.}


%% Currently the 
%% received failing tests are too large to post without minimization to 
%% bugzilla. On the other hand, reporting every found bug automatically and 
%% immediately, regardless of its type (security or not), might not be a 
%% wise thing. However, what's sure for now is that all found bugs will be 
%% reported: security issues tagged appropriately, and others as publicly 
%% visible.

%Example Fuzzing Tool

%application: Month of Browser Bugs

\paragraph{Fuzzing Summary.} Fuzzing is a useful technique for finding
interesting test cases. It works best at interfaces between components.
Advantages: it runs automatically and really works. Disadvantages: without
significant work, it won't find sophisticated domain-specific issues.

\section*{Related: Chaos Monkey}
Instead of thinking about bogus inputs, consider instead what happens
in a distributed system when some instances (components) randomly fail
(because of bogus inputs, or for other reasons). Ideally, the system
would smoothly continue, perhaps with some graceful degradation until
the instance can come back online. Since failures are inevitable, it's
best that they occur when engineers are around to diagnose them and
prevent unintended consequences of failures.

Netflix has implemented this in the form of the Chaos
Monkey\footnote{\url{http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html}}
and its relatives. The Chaos Monkey operates at instance level, while
Chaos Gorilla disables an Availability Zone, and Chaos Kong knocks out
an entire Amazon region. These tools, and others, form the Netflix
Simian
Army\footnote{\url{http://techblog.netflix.com/2011/07/netflix-simian-army.html}}.
%https://github.com/Netflix/SimianArmy/tree/master/assets

Jeff Atwood (co-founder of StackOverflow) writes about experiences with a Chaos Monkey-like system\footnote{\url{http://blog.codinghorror.com/working-with-the-chaos-monkey/}}. Why inflict such a system on yourself? ``Sometimes you don't get a choice; the Chaos Monkey chooses you.'' In his words, software engineering benefits of the Chaos Monkey included:

\begin{itemize}
\item    ``Where we had one server performing an essential function, we switched to two.''
\item    ``If we didn't have a sensible fallback for something, we created one.''
\item    ``We removed dependencies all over the place, paring down to the absolute minimum we required to run.''
\item     ``We implemented workarounds to stay running at all times, even when services we previously considered essential were suddenly no longer available.''
\end{itemize}

\end{document}