# Lucky Numbers: An Indirect Look at Lottery Players' Preferences

In a lottery game, the numbers that the lottery selects are random, but the numbers that players choose to play are not. To the best of my knowledge, data on player selections are not publicly available. However, lotteries do publish data on the numbers they draw and the amounts of the prizes they award. In games where prizes are parimutuel, that is when a certain percentage of sales is divided equally among the winners, one can infer the popularity of the numbers drawn from the prize amounts: popular numbers result in smaller prizes because there are more winners splitting the prize money. The primary component of this project is scraping a variety of lottery websites using a variety of techniques in order to gather data for an analysis that relates prizes amounts to the numbers drawn. Ultimately, I would like to build machine learning models that predict prize amounts as a function of the numbers drawn. However, here I simply do some hypothesis tests to investigate whether there is a relationship between prize amounts and the sum of the numbers drawn.

## Scraping Strategies

In this project a single observation is a lottery drawing, with the data comprising a date, the numbers drawn by the lottery, the number of winners at each prize level, and the prize amount at each level. In order to get all of these data components, one has to visit a separate page for each drawing. Beautiful Soup can easily scrape each ot these pages, so the primary challenge was visiting each page within a site in an automated fashion.

Since I was accessing several different websites, I had to employ several different strategies. In increasing order of complexity they were: encoding dates into URL's, using Selenium to click a link, and using Selenium to fill in a form.

### Encoding Dates into a URL

Florida's Fantasy 5 game is a typical example of a website well sutied to this strategy.

In [4]:
from IPython.display import HTML
HTML('<iframe src=http://www.flalottery.com/fantasy5 width=900 height=600></iframe>')

While it is possible to access individual pages using the menus on the right, visiting one of these pages reveals that the URL's have a particular format that encodes game name and the date of the drawing. For example, 
```
http://www.flalottery.com/site/winningNumberSearch?searchTypeIn=date&gameNameIn=FANTASY5&singleDateIn=10%2F13%2F2015&fromDateIn=&toDateIn=&n1In=&n2In=&n3In=&n4In=&n5In=&submitForm=Submit
```
is the URL for the page that displays the data for the Fantasy 5 drawing that occurred on October 13, 2015, the key portion of the address being the string 
```
10%2F13%2F2015
```
The following code uses the ```datetime``` library to create a date object that it uses to iterate through a specified range of dates, creating a URL string for each one that can be used to access a page which is then processed using Beautiful Soup.


In [None]:

from datetime import timedelta, date
import requests
from bs4 import BeautifulSoup
import re

def encodeDate(dateob):
    answer = dateob.strftime('%m') + '%2F'
    answer = answer + dateob.strftime('%d') + '%2F'
    answer = answer + dateob.strftime('%Y') + '&submitForm=Submit'
    return answer

fl5 = open('fl_fant_5.csv','w')
fl5.write(','.join(['drawdate','n1','n2','n3','n4','n5','winners5','winners4','winners3','prize5','prize4','prize3'])+'\n')
url_stem = 'http://www.flalottery.com/site/winningNumberSearch?searchTypeIn=date&gameNameIn=FANTASY5&singleDateIn='
start_date = date(2007,1,1)
end_date = date(2015,10,26)
current = start_date
while current < end_date:
    url = url_stem + encodeDate(current)
    page = requests.get(url).text
    bsPage = BeautifulSoup(page)
    numbers = bsPage.find_all("div",class_="winningNumbers")
    temp = numbers[0].get_text()
    draws = re.split('[-\n]',temp)
    draws = draws[1:6]
    winners = bsPage.find_all("td",class_="column2")
    winners = [tag.get_text().replace(',','') for tag in winners[:-1]]
    prizes = bsPage.find_all("td", class_="column3 columnLast")
    prizes = [tag.get_text().replace('$','').replace(',','') for tag in prizes[:-1]]
    fl5.write(','.join([current.strftime('%Y-%m-%d')] + draws + winners + prizes)+'\n')
    print current.strftime('%Y-%m-%d')
    current = current + timedelta(1)
    
fl5.close()
print 'done'


The code for Florida's Lucky Money game is very similar. The only meaningful difference is that Lucky Money draws happen on Tuesdays and Fridays only, so the code checks the day of the week before building the URL in order to avoid getting an error caused by trying to access a non-existent page.

In [None]:
from datetime import timedelta, date
import requests
from bs4 import BeautifulSoup
import re

def encodeDate(dateob):
    answer = dateob.strftime('%m') + '%2F'
    answer = answer + dateob.strftime('%d') + '%2F'
    answer = answer + dateob.strftime('%Y') + '&submitForm=Submit'
    return answer

fllm = open('fl_lucky_money.csv','w')
fllm.write(','.join(['drawdate','n1','n2','n3','n4','luckyball','win41','win40','win31','win30','win21','win11','win20','prize41','prize40','prize31','prize30','prize21','prize11','prize20'])+'\n')
url_stem = 'http://www.flalottery.com/site/winningNumberSearch?searchTypeIn=date&gameNameIn=LUCKYMONEY&singleDateIn='
start_date = date(2014,7,4)
end_date = date(2015,10,24)
current = start_date
while current < end_date:
    while current.strftime('%w') not in ['2','5']:
        current = current + timedelta(1)
    url = url_stem + encodeDate(current)
    page = requests.get(url).text
    bsPage = BeautifulSoup(page)
    numbers = bsPage.find_all("div",class_="winningNumbers")
    temp = numbers[0].get_text()
    draws = re.split('[-\n]',temp)
    draws = draws[1:6]
    winners = bsPage.find_all("td",class_="column2")
    winners = [tag.get_text().replace(',','') for tag in winners[:-1]]
    prizes = bsPage.find_all("td", class_="column3 columnLast")
    prizes = [tag.get_text().replace('$','').replace(',','') for tag in prizes[:-1]]
    fllm.write(','.join([current.strftime('%Y-%m-%d')] + draws + winners + prizes)+'\n')
    print current.strftime('%Y-%m-%d')
    current = current + timedelta(1)

fllm.close()

North Carolina's Cash 5 game requires the same strategy. The structure of the code is the same as the Fantasy 5 code, with the differences coming from the differences in the page structures and tags. A sample data page can be found <a href = 'http://www.nc-educationlottery.org/cash5_payout.aspx?drawDate=10/21/2015'>here.</a>

In [None]:
from datetime import timedelta, date
import requests
from bs4 import BeautifulSoup
import re

ncc5 = open('nc_cash_5.csv','w')
ncc5.write(','.join(['drawdate','n1','n2','n3','n4','n5','winners5','winners4','winners3','prize5','prize4','prize3'])+'\n')
url_stem = 'http://www.nc-educationlottery.org/cash5_payout.aspx?drawDate='
start_date = date(2006,10,27)
end_date = date(2015,10,27)
current = start_date
p = re.compile('[,$]')
while current < end_date:
    print current.strftime('%Y-%m-%d')
    url = url_stem + current.strftime('%m/%d/%Y')
    page = requests.get(url).text
    bsPage = BeautifulSoup(page)
    
    draws = []
    draws.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Num1")[0].get_text()))
    draws.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Num2")[0].get_text()))
    draws.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Num3")[0].get_text()))
    draws.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Num4")[0].get_text()))
    draws.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Num5")[0].get_text()))
    
    winners = []
    winners.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Match5")[0].get_text()))
    winners.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Match4")[0].get_text()))
    winners.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Match3")[0].get_text())) 
    
    prizes = []
    prizes.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Match5Prize")[0].get_text()))
    prizes.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Match4Prize")[0].get_text()))
    prizes.append(p.sub('',bsPage.find_all("span",id="ctl00_MainContent_lblCash5Match3Prize")[0].get_text()))
    if prizes[0] == 'Rollover':
        prizes[0] = '0'
    ncc5.write(','.join([current.strftime('%Y-%m-%d')] + draws + winners + prizes)+'\n')
    current = current + timedelta(1)
    
ncc5.close()
print 'finished'

### Using Selenium to Click a Link

Take a look at the Tennessee Cash website <a href = 'https://www.tnlottery.com/winningnumbers/TennesseeCashlist.aspx?TCShowall=y#TennesseeCashball'>here</a> or below.

In [3]:
HTML('<iframe src=https://www.tnlottery.com/winningnumbers/TennesseeCashlist.aspx?TCShowall=y#TennesseeCashball width=900 height=600></iframe>')

There are two types of links that are of interest here. First there are the "details" links to the right. I chose to deal with these by having Beautiful Soup read the URL's encoded in the tags and use them to access each page. A more challenging problem is to use the "Next Page" link at the bottom of the page to access the next set of 40 "details" links. For this I used the Selenium package. (Read the documentation <a href = 'http://selenium-python.readthedocs.org'>here.</a>) Fortunately, the link has an id that remains the same no matter how many times we click, so the code is straightforward.

In [None]:
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from datetime import date
from time import sleep


def GetTnCashData(url):
    page = requests.get(url).text
    bsPage = BeautifulSoup(page)
    temp = bsPage.find_all("td",class_="SmallBlackText")
    winners = []
    prizes = []
    for i in range(1,8):
        winners.append(temp[3*i+1].get_text())
        prizes.append(temp[3*i+2].get_text().replace('$','').replace(',',''))
    return winners + prizes

def cleanDate(strdate):
    temp = strdate.split('/')
    return date(int(temp[2]),int(temp[0]),int(temp[1])).strftime('%Y-%m-%d')

tnc = open('tn_cash.csv','w')
tnc.write(','.join(['drawdate','n1','n2','n3','n4','n5','cashball','win51','win50','win41','win40','win31','win30','win21','prize51','prize50','prize41','prize40','prize31','prize30','prize21'])+'\n')
driver = webdriver.Firefox()
driver.get('https://www.tnlottery.com/winningnumbers/TennesseeCashlist.aspx?TCShowall=y#TennesseeCashball')
html = driver.page_source
nextLink = "navTennesseeCashNextPage"
soup = BeautifulSoup(html)
for pg in range(0,20):
    temp = soup.find_all("td",align="center")
    top = (len(temp)-4)/3 + 1
    print pg, len(temp)
    for i in range(1,top):
        drawDate = [cleanDate(temp[3*i].get_text())]
        NumsDrawn = temp[3*i+1].get_text().replace('-',' ').split(' ')
        drawID = temp[3*i+2].a.get('href')
        drawID = drawID[drawID.index('=')+1:]
        drawID = drawID[:drawID.index("'")]
        drawData = GetTnCashData('https://www.tnlottery.com/winningnumbers/TennesseeCashdetails_popup.aspx?id='+drawID)
        tnc.write(','.join(drawDate + NumsDrawn + drawData) + '\n')
    driver.find_element_by_id(nextLink).click()
    sleep(30)
    soup = BeautifulSoup(driver.page_source)

tnc.close()
print 'Done'

Note that this code builds each data point from two different sources: the date and numbers drawn are read from the main page while the winner counts and prize amounts are read from the pop-up window you see when you click a "details" link.

### Using Selenium to Fill in a Form

Past results from the  <a href='http://www.oregonlottery.org/games/draw-games/megabucks/past-results'>Oregon Lottery</a> website can be accessed only by using a form on the results page. Once again, Selenium is up to the challenge. Like in the Florida and North Carolina cases, the code iterates through a date object and checks for a valid day of the week (Monday, Wednesday, or Saturday.) However, here Selenium enters the date into the form in two places, "Start Date" and "End Date." (Using the same date in both parts of the form simplifies both the iteration and the Beautiful Soup processing.) Then Selenium clicks the submit button.

While testing this I noticed that sometimes the code repeats results from a previous selection, most likely due to a failure of the new page to load fast enough. The code deals with this issue in two ways. First, the ```sleep``` function from the ```date``` module pauses the code for 30 seconds, greatly reducing the likelihood of the problem occuring. As a extra safety measure, the also checks that the date on the page matches the one entered into the form before writing the results to a file. If the dates don't match, the desired date, i.e. the one Selenium entered on the form, is written to an error log.

In [None]:
from selenium import webdriver
from datetime import timedelta, date
import requests
from bs4 import BeautifulSoup
import re
from time import sleep

ormb = open('or_megabucks.csv','a')
ormb_err = open('or_megabucks_errors.csv','w')
ormb.write(','.join(['drawdate','n1','n2','n3','n4','n5','n6','winners6','winners5','winners4','prize6','prize5','prize4'])+'\n')
start_date = date(2012,10,30)
end_date = date(2015,10,29)
current = start_date

driver = webdriver.Firefox()
driver.get('http://www.oregonlottery.org/games/draw-games/megabucks/past-results')

while current < end_date:
    while current.strftime('%w') not in ['1','3','6']:
        current = current + timedelta(1)    
    driver.find_element_by_id("FromDate").clear()
    driver.find_element_by_id("ToDate").clear()
    driver.find_element_by_id("FromDate").send_keys(current.strftime('%m/%d/%Y'))
    driver.find_element_by_id("ToDate").send_keys(current.strftime('%m/%d/%Y'))
    driver.find_element_by_css_selector(".viewResultsButton").click()
    sleep(30)
    soup = BeautifulSoup(driver.page_source)
    test1 = soup.find_all("td")
    numbers = [test1[i].get_text() for i in range(2,8)] 
    test2 = soup.find_all("strong")
    winners = [test2[1].get_text().replace(',','')]
    prizes = [test2[0].get_text().replace('$','').replace(',','')]
    for i in range(0,2):
        winners.append(test2[4*i+3].get_text().replace(',',''))
        prizes.append(test2[4*i+2].get_text().replace('$','').replace(',','')) 
    testdate = test1[0].get_text().split('/')
    testdate = date(int(testdate[2]),int(testdate[0]),int(testdate[1]))
    if current.strftime('%Y-%m-%d') == testdate.strftime('%Y-%m-%d'):
        ormb.write(','.join([testdate.strftime('%Y-%m-%d')] + numbers + winners + prizes)+'\n')
    else:
        ormb_err.write(current.strftime('%Y-%m-%d') + '\n')
    
    current = current + timedelta(1)

ormb.close()
ormb_err.close()

## Statistical Testing

Building models that predict prize amounts from the numbers drawn is beyond the scope of this project, but there are some statistical tests we can do that will provide evidence that such models are possible. The contention is that large prize amounts are associated with large drawn numbers and small prize amounts are associated with small drawn numbers. 

The testing framework is as follows. For each game we will select a prize level and examine the draws with prize amounts less than or equal to the 25th percentile and the draws with prize amounts greater than or equal to the 75th percentile. For each of the draws with small prizes amounts, we will calculate the sum of the numbers drawn and find the mean of these sums. Then we will calculate the probability that the same number of random draws would result in a mean as low or lower than what we observed. We will do the same for draws with large prizes, except that the probability calculation will be the probability that the same number of random draws would result in a mean as high or higher than what we observed.

We will need some functions to carry out these tests, starting with the the mean and variance of the sum of $k$ integers drawn at random, without replacement, from the set $\{1,...,n\}$. The formula for the mean follows from linearity of expectation and the fact that each draw has an expected value of $\frac{n+1}{2}$. The formula for the variance is a slight modification of the formula for the variance of the test statistic in the Wilcoxon Rank Sum test. One source for more details is the <a href = 'https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Wilcoxon.html'>R documentation</a>.

In [2]:
def DrawMean(n,k):
    return (n+1)*k/2.0

def DrawVar(n,k):
    return (k*(n-k)*(n+1))/12.0

The key function is the following, which calculates the probabilities described above. Note that we are assuming that the mean of the sums from many draws is normally distributed.

In [3]:
import scipy
from scipy import stats

def FormulaTest(n, k, draws, muObserved, tail = 'upper'):
    if tail == 'upper':
        return 1 - stats.norm.cdf(muObserved, loc = DrawMean(n,k), scale = (DrawVar(n,k)/draws)**0.5)
    else:
        return stats.norm.cdf(muObserved, loc = DrawMean(n,k), scale = (DrawVar(n,k)/draws)**0.5)

### North Carolina Cash 5

Of the games I scraped for this project, North Carolina Cash 5 is the most straightforward. In it the lottery chooses 5 numbers from 1 to 39. I'll explain the analysis step-by-step, later I'll define a function that streamlines the process. 

First we need to read the data from a file to a data frame. 

In [4]:
import pandas as pd
ncCash5 = pd.read_csv("nc_cash_5.csv")

In order to easily pick out the 25th and 75th percentiles, let's use the ```describe``` method.

In [5]:
ncCash5Desc = ncCash5.describe()
print(ncCash5Desc)

                n1           n2           n3           n4           n5  \
count  3287.000000  3287.000000  3287.000000  3287.000000  3287.000000   
mean      6.672346    13.298144    19.967752    26.620018    33.360511   
std       5.223668     6.583094     6.907540     6.507134     5.133097   
min       1.000000     2.000000     3.000000     6.000000    10.000000   
25%       3.000000     8.000000    15.000000    22.000000    31.000000   
50%       5.000000    13.000000    20.000000    27.000000    35.000000   
75%       9.000000    18.000000    25.000000    32.000000    37.000000   
max      28.000000    35.000000    37.000000    38.000000    39.000000   

          winners5     winners4      winners3          prize5       prize4  \
count  3287.000000  3287.000000   3287.000000     3287.000000  3287.000000   
mean      0.280803    47.976270   1583.690295    39221.962580   270.958016   
std       0.593077    27.698062    809.249324   100219.929977    69.884372   
min       0.000000   

In this case, we'll apply the percentiles to the  ```prize4``` field and select draws that are below the 25th percentile or above the 75th percentile.

In [6]:
small = ncCash5[ncCash5['prize4'] < ncCash5Desc.loc['25%','prize4']].loc[:,'n1':'n5']
print small.head()
large = ncCash5[ncCash5['prize4'] > ncCash5Desc.loc['75%','prize4']].loc[:,'n1':'n5']
print large.head()

    n1  n2  n3  n4  n5
2   11  13  21  23  32
11   4   6   9  14  25
12   4  11  15  23  34
13   5   7   9  30  36
20   2   4   7  23  36
    n1  n2  n3  n4  n5
3    2  14  18  25  32
5    6   7  30  32  33
10   4  19  20  36  37
15   2   8  31  34  35
18   1  13  22  29  33


We can sum the numbers for each draw and take the average in one line.

In [7]:
print small.apply(sum,1).mean()
print large.apply(sum,1).mean()

82.9532019704
113.423357664


We now have all the inputs we need for the tests.

In [8]:
print FormulaTest(39,5,len(small),small.apply(sum,1).mean(),tail='lower')

7.40156748152e-93


In [9]:
print FormulaTest(39,5,len(large),large.apply(sum,1).mean(),tail='upper')

0.0


These tests show that it is extremely unlikely that the means calculated above would be generated by random draws.

The following function summarizes and generalizes the foregoing analysis.

In [23]:
def GameTest(df, n, k, PrizeField, LowField, HighField):
    desc = df.describe()
    small = df[df[PrizeField] < desc.loc['25%',PrizeField]].loc[:,LowField:HighField]
    large = df[df[PrizeField] > desc.loc['75%',PrizeField]].loc[:,LowField:HighField]
    LowTest = FormulaTest(n,k,len(small),small.apply(sum,1).mean(),tail='lower')
    HighTest = FormulaTest(n,k,len(large),large.apply(sum,1).mean(),tail='upper')
    return (len(small),small.apply(sum,1).mean(),LowTest,len(large),large.apply(sum,1).mean(), HighTest)

To illustrate this function using the detailed example:

In [24]:
NC5_results = GameTest(ncCash5, 39, 5, 'prize4', 'n1', 'n5')
print 'Small number of draws: %d' % NC5_results[0]
print 'Small mean: %f' % NC5_results[1]
print 'Small p-value: %f' % NC5_results[2]
print 'Large number of draws: %d' % NC5_results[3]
print 'Large mean: %f' % NC5_results[4]
print 'Large p-value: %f' % NC5_results[5]

Small number of draws: 812
Small mean: 82.953202
Small p-value: 0.000000
Large number of draws: 822
Large mean: 113.423358
Large p-value: 0.000000


### Oregon Megabucks
This game also has a simple structure, drawing 6 numbers from 1 to 48. Again we'll do the analysis using the prize for 4 matches.

In [25]:
ormb = pd.read_csv("or_megabucks.csv")
ormb_results = GameTest(ormb, 48, 6,'prize4', 'n1', 'n6')
print 'Small number of draws: %d' % ormb_results[0]
print 'Small mean: %f' % ormb_results[1]
print 'Small p-value: %f' % ormb_results[2]
print 'Large number of draws: %d' % ormb_results[3]
print 'Large mean: %f' % ormb_results[4]
print 'Large p-value: %f' % ormb_results[5]

Small number of draws: 93
Small mean: 113.043011
Small p-value: 0.000000
Large number of draws: 115
Large mean: 174.704348
Large p-value: 0.000000


In this case it is also highly unlikely that the mean we observed would be generated at random.

### Tennessee Cash

Tennessee Cash is an example of a "double matrix" game. In this case the lottery draws 5 numbers from 1 to 35 and 1 number (the "Cash Ball") from 1 to 5. While the models I'd ultimately like to develop would include the Cash Ball data, for this anaysis it's enough to consider the first 5 numbers, which fit the testing framework. The test here will define "large" and "small" in terms of the prize for matching 4 of the first 5 numbers but not the Cash Ball

In [27]:
tc = pd.read_csv("tn_cash.csv")
tc_results = GameTest(tc, 35, 5, 'prize40', 'n1', 'n5')
print 'Small number of draws: %d' % tc_results[0]
print 'Small mean: %f' % tc_results[1]
print 'Small p-value: %f' % tc_results[2]
print 'Large number of draws: %d' % tc_results[3]
print 'Large mean: %f' % tc_results[4]
print 'Large p-value: %f' % tc_results[5]

Small number of draws: 145
Small mean: 80.579310
Small p-value: 0.000000
Large number of draws: 174
Large mean: 98.977011
Large p-value: 0.000000


While the means in this case are not as unlikely as in the previous examples, there is still less than a 1% probability that they would be produced by chance,

### Florida Lucky Money

Lucky Money is also a double matrix game, where the lottery selects 4 numbers from 1 to 47 and 1 number from 1 to 17 (the "Lucky Ball"). Again our analysis will define "large" and "small" in terms of matching 3 without the Lucky Ball, but we need to be careful. The rules of the game state that whenever the top prize reaches \$2,000,000 it is not funded further, and money that would have gone to the top prize gets distributed among the lower prize levels. Whenever this happens the lower prizes, including the one used here, will be unusually large. We will deal with that by filtering out any draws where the top prize is \$2,000,000.

In [28]:
lm = pd.read_csv("fl_lucky_money.csv")
lm = lm[lm['prize41'] != '2 Million']
lm_results = GameTest(lm, 47,4, 'prize30', 'n1', 'n4')
print 'Small number of draws: %d' % lm_results[0]
print 'Small mean: %f' % lm_results[1]
print 'Small p-value: %f' % lm_results[2]
print 'Large number of draws: %d' % lm_results[3]
print 'Large mean: %f' % lm_results[4]
print 'Large p-value: %f' % lm_results[5]

Small number of draws: 29
Small mean: 74.206897
Small p-value: 0.000004
Large number of draws: 29
Large mean: 121.379310
Large p-value: 0.000000


Once again, the means are not likely under the assumption of random selection.

### Florida Fantasy 5

Fantasy 5 is a single matrix game where the the lottery select 5 numbers from 1 to 36. The complication arises from the fact that all the prize money is awarded in every draw. If there is no top prize winner, the money that would have been awarded is added to the second-prize pool and divided among the players who matched 4 numbers. Moreover, since mid-September of 2008 the second prize has been capped at \$555, and if there is any extra money left it is addded to the third-prize pool. So we will restrict the dataset to draws where the current rules apply and there was at least one top-prize winner. This is the vast majority of cases.

In [29]:
ff = pd.read_csv("fl_fant_5.csv")
ff = ff[ff['drawdate'] > '2008-09-15']
ff_top = ff[ff['winners5'] > 0]
print len(ff_top)
print len(ff)
ff_top_results = GameTest(ff_top, 36, 5, 'prize3', 'n1', 'n5')
print 'Small number of draws: %d' % ff_top_results[0]
print 'Small mean: %f' % ff_top_results[1]
print 'Small p-value: %f' % ff_top_results[2]
print 'Large number of draws: %d' % ff_top_results[3]
print 'Large mean: %f' % ff_top_results[4]
print 'Large p-value: %f' % ff_top_results[5]

2195
2596
Small number of draws: 399
Small mean: 64.258145
Small p-value: 0.000000
Large number of draws: 288
Large mean: 116.704861
Large p-value: 0.000000


Once again, the sums of the drawn numbers when the prizes are unusually large or small are themselves unusually large or small.

## Conclusion

The tests presented here provide multiple examples of parimutuel lotteries where there is a relationship between the numbers drawn and the prize amounts. Therefore the project of predicting prize amounts from the drawn numbers is likely to produce some results, and using the sum of the drawn numbers appears to be a great starting point.