# Estimate the chances of winning a lottery

In this project, we contribute to the development of a mobile app that is meant to help lottery addicts better estimate their chances of winning.

   We create the logical core of the app and calculate the probabilities.For the first version of the app, we focus on 6/49 lottery game of Canada and build functions that enable users to answer probability questions like:
        
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we -play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?


# Data 

For the purpose of this project, we consider historical data coming from the national **6/49 lottery game** in Canada. The data set has data for 3,665 drawings, dating from *1982 to 2018*.


## Factorial and combinations functions

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement. 

To find the number of combinations when we're sampling without replacement and taking only k objects from a group of n objects, we can use the combinations formula.

Let's start by writing two functions that we'll use often:

- A function that calculates factorials; and
- A function that calculates combinations.

In [1]:
##factorial function
def factorial(n):
    f = 1
    for i in range(n,0,-1):
        f = f*i
    return f

#combination function
def combinations(n,k):
    return factorial(n)/(factorial(n-k)*factorial(k))


## Function that calculates the probability of winning the big prize

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn

Let's go ahead and build a function that calculates the probability of winning the big prize for any given ticket (for each ticket a player chooses six numbers out of 49).


we need to be aware of the following details when we write the function:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- we print the probability value in a friendly way(in terms of percentage) — in a way that people without any probability training are able to understand.


In [16]:
#calculate the probability of winning the big prize
def one_ticket_probability(list_6):
    n_c = combinations(49,6)
    p = 1/n_c
    p_per = round(p*100,8)
    return "The chances of winning the big prize is {0}".format(p_per)

In [17]:
#testing with data
one_ticket_probability([1,2,3,4,5,6])

'The probability of winning the big prize is 7.15e-06'

Till now, we wrote a function that can tell users what is the probability of winning the big prize with a single ticket. However, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In the follwing steps, we'll focus on exploring the historical data coming from the Canada 6/49 lottery. The data set can be downloaded from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset)

## Read in the data and Explore

In [18]:
import pandas as pd
from csv import reader
lottery_data = pd.read_csv("649.csv")
print(lottery_data.shape)
lottery_data.head()

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


Here,we're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

We need to be aware of the following:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- We write a function that prints:
  * the number of times the combination selected occurred in the Canada data set; and
  * the probability of winning the big prize in the next drawing with that combination.


In [19]:
def extract_numbers(row):
    return set(row)

In [21]:
lottery_data.columns

Index(['PRODUCT', 'DRAW NUMBER', 'SEQUENCE NUMBER', 'DRAW DATE',
       'NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
       'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'BONUS NUMBER'],
      dtype='object')

In [36]:
new_data = lottery_data.loc[:, 'NUMBER DRAWN 1':'NUMBER DRAWN 6'].apply(extract_numbers, axis = 1)
new_data.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [39]:
def check_historical_occurence(input_6,data):
    input_set = set(input_6)
    match_boolean = (data == input_set) 
    n_occurence = sum(match_boolean)
    print("The number of times the combination inputted by the user occurred in the past is %d" % n_occurence)
    p = n_occurence/len(data)
    p_per = p*100
    print("The chances of winning the big prize in the next drawing with that combination is %f percentage" % p_per)

In [42]:
check_historical_occurence([3, 9, 10, 43, 13, 20], new_data)

The number of times the combination inputted by the user occurred in the past is 1
The probability of winning the big prize in the next drawing with that combination is 0.027285 percentage


## Estimating the probability of winning by playing multiple tickets 
Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly

Now, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

The following needs to be kept in mind:

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- The function shall print information about the probability of winning the big prize depending on the number of different tickets played.

In [51]:
def multi_ticket_probability(n_tickets):
    n_outcomes = combinations(49,6)
    n_success = n_tickets
    p = n_success/n_outcomes
    p_per = p*100
    print("The chances of winning the big prize in percentage terms is {0}".format(p_per))
    

In [52]:
multi_ticket_probability(13983816)

The probability of winning the big prize in percentage terms is 100.0


## Estimating probabilities of 2,3,4 or 5 winning numbers
In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In the following steps, let's write one more function that allows the users to calculate probabilities for two, three, four, or five winning numbers


In [65]:
def probability_less_6(n):
    n_success = combinations(6,n)*(49-n-1)
    print("The number of successful outcomes is %d" % n_success)
    n_outcomes = combinations(49,6)
    p = n_success/n_outcomes
    p_per = p*100
    print("The chances of", n, "winning numbers is", p_per, "percentage")
    

In [70]:
probability_less_6(5)

The number of successful outcomes is 258
The probability of 5 winning numbers is 0.0018449899512407771 percentage
