# Plausability of Lottery Luck--Group Computations

#### [Dylan D. Daniels](http://statistics.berkeley.edu/people/dylan-david-daniels) and [Philip B. Stark](www.stat.berkeley.edu/~stark), Department of Statistics, University of California, Berkeley
#### Based on MATLAB code by [Skip Garibaldi](http://www.garibaldibros.com/)

This tool appraises whether it is plausible that a list of individuals each won a set of lottery prizes honestly. 

User Inputs:

   + A comma-separated values file (CSV) of individuals, wins, and odds.
   + An upper bound on the potential number of players (for instance, one might assume that the number of people playing the lottery isn't greater than the number of residents of the state)
   + A tiny "threshold" probability.

The code outputs, for each individual, a lower bound on the amount every potential player
would have had to spend for _any_ of them to have a tiny chance of winning so often, where "tiny" is the threshold number chosen by the user.

If the required spending amount is, for example, several times the median house price in the state, it may call into question whether the winner won honestly.

This version can analyze data for a group of players. 

The code implements the mathematics described in the first link below. The third link is to a public lecture about the method, and results for reported lottery winners in Florida. 
The fourth and fifth links are news stories that relied on such calculations.

See:
+ Arratia, R., S. Garibaldi, L. Mower, and P.B. Stark, 2015. Some people have all the luck. _Mathematics Magazine_, _88_ 196–211. doi:10.4169/math.mag.88.3.196.c, Reprint: http://www.stat.berkeley.edu/~stark/Preprints/luck15.pdf http://www.jstor.org/stable/10.4169/math.mag.88.3.196
+ Arratia, R., S. Garibaldi, L. Mower, and P.B. Stark, 2015. Some people have all the luck &hellip; or do they? _MAA Focus_, August/September, 37–38. http://www.maa.org/sites/default/files/pdf/MAAFocus/Focus_AugustSeptember_2015.pdf
+ https://www.youtube.com/watch?v=s8cHHWNblA4
+  Lottery odds: To win, you’d have to be a loser. Lawrence Mower, _Palm Beach Post_, 28 March 2014. http://www.mypalmbeachpost.com/news/news/lottery-odds-to-win-youd-have-to-be-a-loser/nfL57
+ Against all Odds, Gavin Off and Adam Bell, _The Charlotte Observer_, 29 September 2016.
http://www.charlotteobserver.com/news/special-reports/against-all-odds/


## Instructions:
1. Create a CSV file with results for all gamblers. The CSV file should contain five columns:  

> "name", "game", "probability," "wins," and "cost" 

Each row corresponds to one type of wager for one gambler. 

+ "name" is an identifier for each gambler.
+ "game" is the name of the wager (it's OK to leave this blank).
+ "probability" is the chance of winning that wager. 
+ "wins" is the number of times the gambler collected on that wager.
+ "cost" is the cost per ticket or play on that wager. 

The computations assume that the gambler did not win any dependent bets, for instance, two bets on the same drawing.

2. Put the filename of your CSV file in the box below, along with the values of POPULATION and THRESHOLD.

3. On the toolbar of this browser window (under the jupyter logo), click "Cell" --> "Run All". Wait a bit for your results to appear at the bottom of this page. 

In [1]:
from __future__ import print_function, division

# Put the name of your CSV file here:
# CSV_FILENAME = 'FILL_ME_IN.csv'
CSV_FILENAME = 'brenda_baker.csv'

COL_NAMES = ['name', 'game', 'probability', 'wins', 'cost']

# set the population size and overall cutoff probability
POPULATION = 27862600  # Texas population, 2016
# POPULATION = 6794000   # population of MA in 2015
# POPULATION = 12784000   # population of Pennsylvania, 2016
THRESHOLD =  10**(-7) # one in ten million threshold

In [2]:
import csv
import numpy as np
from scipy.special import betainc
from scipy.optimize import minimize

def binTail(p, n, t): # upper tail probability for a vector of Binomial(n,p) random variables
    return betainc(n, t - n + 1, p)

def binTailln(p, n, t): # logarithm of the upper tail probability for a vector of Binomial(n,p) random variables
    return np.log(binTail(p, n, t))

def constraintFn(p, n): # constraint function: probability of vector of wins must be at least CUT
    return lambda x: np.sum(binTailln(p, n, x)) - np.log(CUT)

def objectiveFn(c):  # construct function that gives cost of vector x of bets, for cost-per-bet vector c
    return lambda x: np.dot(x, c)

def solve(x0, upperBoundVec, p, n, c, eps, debugMode, maxiter, method='SLSQP'):  
    # invoke the constrained optimizer
    # 
    #    x0:     starting guess
    #    p:      vector of game probabilities
    #    n:      vector of number of wins of each game
    #    c:      vector of game costs
    #    eps:    stepsize for Hessian approximation
    #    debugMode: True for verbose output
    #    maxiter: maximum iterations in optimizer
    #    method: underlying minimization algorithm
    #       
    cons = ({'type': 'ineq', 'fun': constraintFn(p, n)})   # overall probability constraint
    bnds = tuple((n[i], upperBoundVec[i]) for i in range(len(n)))  # must bet at least n times to win n times
    return minimize(objectiveFn(c), x0, method=method, jac=(lambda x: c),
                    constraints=cons, bounds=bnds,
                    options={'disp': debugMode, 'maxiter': maxiter, 'eps': eps})

def readCsv(filename):  # read the csv file of data for a player
    with open(filename, 'rU') as f:
        reader = csv.DictReader(f)
        gamblers = []; games = []; pValues = []; nValues = []; cValues = []
        for row in reader:
            gamblers.append(row[COL_NAMES[0]])
            games.append(row[COL_NAMES[1]])
            pValues.append(float(row[COL_NAMES[2]]))
            nValues.append(float(row[COL_NAMES[3]]))
            cValues.append(float(row[COL_NAMES[4]]))
    return(np.array(gamblers), np.array(games), np.array(pValues),\
           np.array(nValues), np.array(cValues))

def solveProblem(tries=5, debugMode=False, epsilon = 1e-7, epsFac=8, maxiter=10**4):
    # Try up to epsFac values of the Hessian step size, related by powers of 10 (Hessian approximation step sizes)
    optimalValues = []     # candidate optima
    optimalProbs = []      # probabilities associated with those optima
    optimalSolutions = []  # detailed optimization output for candidate optima
    if debugMode:
        print("n: {} \np: {}".format(n,p))
    for meth in methods:   # try different optimization methods
        for epsIndex in range(epsFac):  # try different step sizes in the Hessian
            x0 = np.array(that/divisor) # starting guess
            for i in range(tries):
                while (np.sum(np.log(binTail(p, n, x0))) - np.log(CUT)) < 0:  # ensure x0 is a feasible point
                    x0 = np.add(x0,np.ones_like(x0))  # increment every element of x
                if (debugMode):
                    print("method: {} try: {} \nx0: {} \nprobability {}:".format(meth,i,x0,\
                                np.prod(binTail(p, n, x0))))
                optimOutput = solve(x0, that, p, n, c, epsilon*10**epsIndex, debugMode, maxiter, method=meth)
                if optimOutput['success']:
                    attainedProb = np.prod(binTail(p, n, optimOutput['x']))
                    if attainedProb <= CUT:
                        optimalValues.append(optimOutput['fun'])
                        optimalProbs.append(attainedProb)
                        optimalSolutions.append(optimOutput)
                    if debugMode:
                        print(optimOutput)
                        print("attained probability: {}".format(attainedProb))
                x0 = [np.random.randint(low=n[i], high=that[i]) for i in range(len(n))] # update x0 randomly
    if len(optimalValues) == 0:
        raise Exception('No candidate optimal solution found.')
    bestValue = np.min(optimalValues)
    largestProb = np.max(tuple(optimalProbs))
    if debugMode:
        print("\nFound {} candidate minima: {}".format(len(optimalValues), optimalValues))
        print("Best value: {}".format(bestValue))
    print("{} \t {} \t {} \t ${:,.0f} \t {}".format(g, int(np.sum(n)), len(n), np.int(bestValue), POPULATION*attainedProb))
    return optimalValues, optimalProbs

In [7]:
# parameters common to the calculations for all players

np.random.RandomState(seed=418023456) # setting seed explicitly, for reproducibility

debugMode = True  # verbose output if True; set to False for less output

CUT = THRESHOLD / POPULATION # Bonferroni cutoff probability

divisor = 5 # initial value for optimizer is expected number divided by divisor (modified to ensure feasibility)

tries = 7 # number of times to run the optimization code from different starting points

methods = ['SLSQP','COBYLA']  # COBYLA will ignore the individual bounds, but should honor the probability constraint

In [8]:
(gg, mm, pp, nn, cc) = readCsv(CSV_FILENAME)  # read the data for all players

print("Found", len(np.unique(gg)), "gamblers:\n", np.unique(gg))
print("Assumptions:\n If {:,} people bet the amount in column 4, chance would be no larger than {} that any of them would win as much as this person".format(POPULATION, THRESHOLD))
print("Name\t wins \t games \t minimum spend \t attained probability")

for g in np.unique(gg):
    gambler = gg==g
    p = pp[gambler]
    n = nn[gambler]
    c = cc[gambler]
    that = n/p  # expected number of wagers on each bet required to win that bet n times
    if debugMode:
        print ("initial t_hat: {} \ninitialprobability: {}".format(that,np.prod(binTail(p, n, that))))

# 'that' is used as an upper bound; ensure that it's compatible with the probability constraint
    while np.prod(binTail(p, n, that)) < CUT:
        that = 2*that
    
    if debugMode:
        print ("adjusted t_hat: {} \nadjusted probability: {}".format(that,np.prod(binTail(p, n, that))))

    optimalValues, optimalProbs = solveProblem(tries = tries, debugMode=debugMode, epsilon = 1e-7, epsFac=8, maxiter=10**4)

Found 1 gamblers:
 ['B_BAKER']
Assumptions:
 If 27,862,600 people bet the amount in column 4, chance would be no larger than 1e-07 that any of them would win as much as this person
Name	 wins 	 games 	 minimum spend 	 attained probability
initial t_hat: [  5999.97771437   4444.44115227   3750.00234375   1499.99925      2999.9985
   9574.50271629  11250.00703125  35769.21701184   9863.10016965
  12727.24958682  23999.98080002   6083.00088812   5454.54049587
   9230.76923077   8999.9955      17999.96400007  30000.03000003
   1497.57991086   1374.0598568    3025.19991529   3664.15961812
   5041.99985882] 
initialprobability: 5.68554129779e-06
adjusted t_hat: [  5999.97771437   4444.44115227   3750.00234375   1499.99925      2999.9985
   9574.50271629  11250.00703125  35769.21701184   9863.10016965
  12727.24958682  23999.98080002   6083.00088812   5454.54049587
   9230.76923077   8999.9955      17999.96400007  30000.03000003
   1497.57991086   1374.0598568    3025.19991529   3664.15961812

Optimization terminated successfully.    (Exit mode 0)
            Current function value: 3934256.18877
            Iterations: 191
            Function evaluations: 547
            Gradient evaluations: 191
     fun: 3934256.1887693424
     jac: array([ 50.,  50.,  50.,  20.,  20.,  50.,  50.,  50.,  50.,  50.,  20.,
        50.,  50.,  50.,  50.,  50.,  50.,  20.,  50.,  50.,  50.,  50.,
         0.])
 message: 'Optimization terminated successfully.'
    nfev: 547
     nit: 191
    njev: 191
  status: 0
 success: True
       x: array([ 4631.66700633,  3726.82538783,  2254.91295164,  1499.99925   ,
        2999.99849999,  7869.69833038,  5953.14525333,  1701.96738303,
        7799.77034383,  6609.87630658,  3779.04493341,  5459.4172228 ,
        2511.75478458,  3660.79287481,  5438.26672457,  4293.99437123,
        1675.95910979,  1497.57991077,  1374.0598568 ,  2457.9858313 ,
        3577.80866744,  3776.57233146])
attained probability: 3.58904033078e-15
method: SLSQP try: 0 
x0: [ 

Optimization terminated successfully.    (Exit mode 0)
            Current function value: 3930947.23569
            Iterations: 109
            Function evaluations: 312
            Gradient evaluations: 109
     fun: 3930947.2356857592
     jac: array([ 50.,  50.,  50.,  20.,  20.,  50.,  50.,  50.,  50.,  50.,  20.,
        50.,  50.,  50.,  50.,  50.,  50.,  20.,  50.,  50.,  50.,  50.,
         0.])
 message: 'Optimization terminated successfully.'
    nfev: 312
     nit: 109
    njev: 109
  status: 0
 success: True
       x: array([ 4669.32870855,  3759.1317086 ,  2184.41777741,  1499.99925   ,
        2999.9985    ,  7773.17723191,  5877.73809638,  1637.22699959,
        8305.47326689,  6702.0397468 ,  3860.32923871,  5356.78346586,
        2432.48314421,  3637.83251299,  5359.27292527,  4184.81899637,
        1629.9181493 ,  1497.57991086,  1374.05985678,  2431.44575653,
        3581.56747594,  3779.06613451])
attained probability: 3.58904019997e-15
method: SLSQP try: 1 
x0: [ 

Optimization terminated successfully.    (Exit mode 0)
            Current function value: 3930947.00697
            Iterations: 81
            Function evaluations: 122
            Gradient evaluations: 81
     fun: 3930947.0069697294
     jac: array([ 50.,  50.,  50.,  20.,  20.,  50.,  50.,  50.,  50.,  50.,  20.,
        50.,  50.,  50.,  50.,  50.,  50.,  20.,  50.,  50.,  50.,  50.,
         0.])
 message: 'Optimization terminated successfully.'
    nfev: 122
     nit: 81
    njev: 81
  status: 0
 success: True
       x: array([ 4669.66366459,  3758.01988647,  2184.72639441,  1499.99925   ,
        2999.9985    ,  7775.84531045,  5877.72551424,  1637.24156361,
        8305.87392688,  6698.7748386 ,  3860.85047262,  5355.25837499,
        2432.92705974,  3637.97689786,  5360.14566345,  4183.67883411,
        1630.22701904,  1497.57991086,  1374.05985677,  2432.04774755,
        3581.75020066,  3779.62613258])
attained probability: 3.58904039979e-15
method: SLSQP try: 1 
x0: [ 1896

Optimization terminated successfully.    (Exit mode 0)
            Current function value: 3930947.00586
            Iterations: 97
            Function evaluations: 113
            Gradient evaluations: 96
     fun: 3930947.0058607846
     jac: array([ 50.,  50.,  50.,  20.,  20.,  50.,  50.,  50.,  50.,  50.,  20.,
        50.,  50.,  50.,  50.,  50.,  50.,  20.,  50.,  50.,  50.,  50.,
         0.])
 message: 'Optimization terminated successfully.'
    nfev: 113
     nit: 97
    njev: 96
  status: 0
 success: True
       x: array([ 4669.5007675 ,  3758.02899591,  2184.71512253,  1499.99925   ,
        2999.9985    ,  7775.92579008,  5878.09110926,  1637.28865453,
        8305.90164737,  6698.74572974,  3860.5884164 ,  5355.13592973,
        2432.92658002,  3637.96762485,  5360.47755754,  4183.37598468,
        1630.23039334,  1497.57991086,  1374.0598568 ,  2431.99487169,
        3581.72755942,  3779.57951133])
attained probability: 3.58904050171e-15
method: SLSQP try: 2 
x0: [ 3889

Optimization terminated successfully.    (Exit mode 0)
            Current function value: 3930947.00593
            Iterations: 93
            Function evaluations: 95
            Gradient evaluations: 92
     fun: 3930947.0059267683
     jac: array([ 50.,  50.,  50.,  20.,  20.,  50.,  50.,  50.,  50.,  50.,  20.,
        50.,  50.,  50.,  50.,  50.,  50.,  20.,  50.,  50.,  50.,  50.,
         0.])
 message: 'Optimization terminated successfully.'
    nfev: 95
     nit: 93
    njev: 92
  status: 0
 success: True
       x: array([ 4669.49451394,  3758.01772956,  2184.71783881,  1499.99925   ,
        2999.9985    ,  7775.93385665,  5878.10387123,  1637.28382558,
        8305.91952939,  6698.7569678 ,  3860.51391663,  5355.15101581,
        2432.92746113,  3637.97504611,  5360.4689765 ,  4183.38217002,
        1630.22395208,  1497.57991086,  1374.0598568 ,  2431.98715608,
        3581.72530285,  3779.57441719])
attained probability: 3.58904050581e-15
method: SLSQP try: 3 
x0: [ 5416  

Optimization terminated successfully.    (Exit mode 0)
            Current function value: 3930947.00599
            Iterations: 87
            Function evaluations: 91
            Gradient evaluations: 86
     fun: 3930947.0059874626
     jac: array([ 50.,  50.,  50.,  20.,  20.,  50.,  50.,  50.,  50.,  50.,  20.,
        50.,  50.,  50.,  50.,  50.,  50.,  20.,  50.,  50.,  50.,  50.,
         0.])
 message: 'Optimization terminated successfully.'
    nfev: 91
     nit: 87
    njev: 86
  status: 0
 success: True
       x: array([ 4669.49242177,  3758.0026637 ,  2184.70162188,  1499.99925   ,
        2999.9985    ,  7775.94258763,  5878.11639688,  1637.27033804,
        8305.93060804,  6698.79759152,  3860.53166873,  5355.1410456 ,
        2432.91005875,  3637.9799146 ,  5360.48805256,  4183.40421271,
        1630.21305036,  1497.57991086,  1374.0598568 ,  2431.96917701,
        3581.70691991,  3779.56987014])
attained probability: 3.5890405059e-15
method: SLSQP try: 3 
x0: [ 6742  2

Optimization terminated successfully.    (Exit mode 0)
            Current function value: 3930947.0098
            Iterations: 77
            Function evaluations: 96
            Gradient evaluations: 77
     fun: 3930947.0098028481
     jac: array([ 50.,  50.,  50.,  20.,  20.,  50.,  50.,  50.,  50.,  50.,  20.,
        50.,  50.,  50.,  50.,  50.,  50.,  20.,  50.,  50.,  50.,  50.,
         0.])
 message: 'Optimization terminated successfully.'
    nfev: 96
     nit: 77
    njev: 77
  status: 0
 success: True
       x: array([ 4669.3820748 ,  3757.78298547,  2184.63610915,  1499.99924994,
        2999.9984999 ,  7776.00458625,  5878.17319267,  1637.17823754,
        8306.06233522,  6699.17009969,  3860.72750985,  5354.93189729,
        2432.83726974,  3638.08936844,  5360.70278741,  4183.63137485,
        1630.10428857,  1497.57991075,  1374.05985679,  2431.87683691,
        3581.50531947,  3779.48950762])
attained probability: 3.58904040808e-15
method: SLSQP try: 3 
x0: [ 6856  4



method: COBYLA try: 1 
x0: [ 4763  1470  3133  2149  2573  9950  9496 10543  5005 13886 22589  2785
  1548  5386  8785  6576  1490  1337  1801  2218  2598  5877] 
probability 3.61676159852e-15:
method: COBYLA try: 2 
x0: [ 5459  1643  3152  1911  2853  5567  2647 22713  9068 11177 24567  5573
  1840  2878  5040 18538  8995  1325  1435  3097  1833  3740] 
probability 3.61898825082e-15:
method: COBYLA try: 3 
x0: [ 1828  5252  3077  2042  2007  5179  9116 23046  4805 11680 14518  3891
  4756  4407  3487 18988  4452  2714  2660  2750  3101  5138] 
probability 3.6039433539e-15:
method: COBYLA try: 4 
x0: [ 3621  4027  5016  3424  4646  4929  6074 33967  6748  3752 13768  3007
  5836  3875  3056 11565  4829  3421  3230  5641  4915  5578] 
probability 3.62917914521e-15:
method: COBYLA try: 5 
x0: [ 8241  4587  4003  2735  5508  3011 10287 10256  4051  5434  8570  3367
  4695  4434  6440 19130 27019  2714  3503  3572  6101  6619] 
probability 3.60491005737e-15:
method: COBYLA try: 6 
x0: [ 56

method: COBYLA try: 3 
x0: [ 3173  5649  2351  2890  2293  4312  2166 19126  4723 12101  9802  4224
  2315  9557  9650 11363 20109  2377  2148  3255  4331  5711] 
probability 3.63140500947e-15:
method: COBYLA try: 4 
x0: [ 4093  4978  3493  1538  2827  4780  3339 33730 10095  3968 13717  5988
  1796  5397  9277 16274  3720  1644  1681  3446  2294  1923] 
probability 3.60935926411e-15:
method: COBYLA try: 5 
x0: [ 4895  3761  5877  3617  5706  3222  9642  5220  7639  4195 25863  2929
  7221  2878  4668 18309 32380  4153  3657  2900  3483  3621] 
probability 3.61662002946e-15:
method: COBYLA try: 6 
x0: [ 4629  2865  2604  2351  2974  5316  4865 32619  6322  7240 18444  2798
  3658  5897  3645  8878  2918  3583  3133  4179  2865  6742] 
probability 3.60122084574e-15:
method: COBYLA try: 0 
x0: [  4255.99554287   3944.88823045   3806.00046875   3355.99985      3655.9997
   4970.90054326   5306.00140625  10209.84340237   5028.62003393
   5601.44991736   7855.99616      4272.60017762   4146

In [5]:
# version information
%load_ext version_information
%version_information scipy, numpy, csv, pandas, matplotlib, notebook



Software,Version
Python,2.7.11 64bit [GCC 4.2.1 (Apple Inc. build 5577)]
IPython,5.3.0
OS,Darwin 16.7.0 x86_64 i386 64bit
scipy,0.17.1
numpy,1.10.4
csv,1.0
pandas,0.18.1
matplotlib,1.5.0
notebook,5.0.0
Fri Jul 28 08:45:34 2017 PDT,Fri Jul 28 08:45:34 2017 PDT
