# Nonparametric Inference, Auditing, and Litigation
## Short course at [XXIX International Forum on Statistics](http://www.upaep.mx/micrositios/29foroestadistica/)
## UPAEP, Puebla, Mexico, 29 September&ndash;3 October 2014
### [Philip B. Stark](www.stat.berkeley.edu/~stark)
### [Department of Statistics](statistics.berkeley.edu), [University of California, Berkeley](www.berkeley.edu)


### These materials are available at https://github.com/pbstark/MX14 

*Abstract:* Many problems that arise in financial and election auditing,
civil litigation,
and causal inference can be reduced to statistical inferences about
the mean of a nonnegative or bounded finite population.
A variety of sampling plans can be combined with common probability
inequalities to test hypotheses or make confidence intervals in these applications,
in a fully nonparametric, conservative way.
I will illustrate these methods with real and cartoon examples from election auditing,
healthcare auditing, intellectual property litigation, wage and hour litigation,
and online advertising.
An especially useful class of methods can be derived from Wald's (1945)
sequential probability ratio test (SPRT), which hinges on a generalization of the
problem of _gambler's ruin_.
Methods based on Wald's SPRT allow samples to be drawn incrementally and adaptively,
often reducing the cost of financial and electoral audits, litigation discovery,
and experiments without incurring any penalty from multiple testing.

## Index
1. [Canonical examples of real-world problems we will consider](canonical.ipynb)
1. [Why not use the normal approximation?](normApprox.ipynb)
1. [The duality between confidence sets and hypothesis tests](duality.ipynb)
1. [Confidence bounds for the mean of a bounded population: Binomial and Hypergeometric](binom.ipynb)
1. [Confidence bounds from the Chebychev and Hoeffding Inequalities](hoeffding.ipynb)
1. [Lower confidence bounds for the mean of a nonnegative population: Markov's Inequality & methods based on the empirical distribution](markov.ipynb)
1. [Wald's Sequential Probability Ratio Test](sprt.ipynb)
    + [Wald's Sequantial Probability Ratio Test for the population percentage, sampling without replacement](pSPRTnoReplacement.ipynb)
1. [The Kaplan-Wald Confidence Bound for a Nonnegative Mean](kaplanWald.ipynb)
1. [Dollar-unit sampling and taint](dus.ipynb)
1. [Penny Sampling and Continuous Penny Sampling](pennySampling.ipynb)
1. [Method shootout](shootout.ipynb)
1. [Bibliography](bib.ipynb)

These notes were last updated and tested using
+ Python version 2.7.11
+ IPython version 5.1.0
+ numpy version 1.11.2
+ scipy version 0.18.1
+ pandas version 0.19.1
+ matplotlib version 1.5.0
+ Mac OS Darwin 13.4.0 x86_64 i386 64bit

In [1]:
%load_ext version_information
%version_information scipy, numpy, pandas, matplotlib



Software,Version
Python,2.7.11 64bit [GCC 4.2.1 (Apple Inc. build 5577)]
IPython,5.1.0
OS,Darwin 16.1.0 x86_64 i386 64bit
scipy,0.18.1
numpy,1.11.2
pandas,0.19.0
matplotlib,1.5.0
Thu Oct 27 17:23:56 2016 PDT,Thu Oct 27 17:23:56 2016 PDT


In [2]:
%run talkTools.py