# How likely is it that you'll win the lottery?

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to use [historical data (1982-2018)](https://www.kaggle.com/datascienceai/lottery-dataset) coming from the national 6/49 lottery game in Canada to build functions that will enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

## Calculating probabilities

In [1]:
import pandas as pd
import numpy as np

### Writing functions that we'll use to calculate probabilities

In [32]:
def factorial(n):
    base = 1
    for i in range(n,0,-1):
        base *= i
    return base

def combinations(n,k):
    return factorial(n) / (factorial(k) * factorial(n-k))


## Probability of winning the big prize with one ticket

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. 

For the first version of the app, we want players to be able to calculate the probability of winning the big prize, so we'll start by building a function that calculates the probability of winning the big prize for any given ticket. While doing so, we need to be aware of the following details:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [49]:
def one_ticket_probability(numbers):
    all_outcomes = combinations(49,6)
    probability = 1/all_outcomes
    return "Your chances of winning the big prize with the numbers {0} are {1:.7f}%. This means you have a 1 in {2} chance of winning".format(numbers,probability*100,int(all_outcomes))


In [62]:
# Testing the function
entry_1 = [2,36,25,17,9,48]
one_ticket_probability(entry_1)

'Your chances of winning the big prize with the numbers [2, 36, 25, 17, 9, 48] are 0.0000072%. This means you have a 1 in 13983816 chance of winning'

In [63]:
entry_2 = [4,16,20,38,31,5]
one_ticket_probability(entry_2)

'Your chances of winning the big prize with the numbers [4, 16, 20, 38, 31, 5] are 0.0000072%. This means you have a 1 in 13983816 chance of winning'

To build this function, I calculated the total number of possible outcomes for the lottery i.e. the total number of combinations for a six-number lottery ticket. Then I calculated the probability that a user wiht 1 ticket would win. To make it easy for the user to understand their chances, I converted this to percentages and simple language that juxtaposed their chance of winning (1) vs the total possible winning combinatsion (1398186).

### Has your combination of numbers ever won before?

We also want users to be able to compare their ticket against the historical lottery in Canada to see if they would have won by now, so we'll build a function for that.

In [55]:
# Reading the data into a dataframe
hist_data = pd.read_csv('649.csv')
hist_data.shape

(3665, 11)

In [56]:
hist_data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [57]:
hist_data.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


We can see that the frequency of the draws increased over time. It was every 7 days in 1982, and every 3 days in 2018. The product number is consistent, and the numbers we are interested in are in the 'Number Drawn' columns.

For this function, the engineering team told us that we need to be aware of the following details:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
    - the number of times the combination selected occurred in the Canada data set; and
    - the probability of winning the big prize in the next drawing with that combination.

In [58]:
def extract_numbers(row): 
    entry = set(row[4:10].values)
    return entry

hist_data['sets'] = hist_data.apply(extract_numbers,axis=1)

In [67]:
def check_historical_occurrence(user_num,series):
    user_set = set(user_num)
    same = 0
    for x in series:
        if user_set == x:
            same += 1
    return "Your numbers {0} have won the big prize {1} time(s) before".format(user_num,same)

    

In [66]:
# Testing the function
check_historical_occurrence(entry_1,hist_data['sets'])

'Your numbers [2, 36, 25, 17, 9, 48] have won the big prize 0 time(s) before'

In [65]:
check_historical_occurrence(entry_2,hist_data['sets'])

'Your numbers [4, 16, 20, 38, 31, 5] have won the big prize 0 time(s) before'

To build this function, I counted the number of times a given combination had won the big prize between 1982-2018.

## Do your chances of winning increase with more tickets?

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We've talked with the engineering team and they gave us the following information:

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [80]:
def multi_ticket_probability(n):
    outcomes = combinations(49,6)
    tickets = n
    p_chances = n/outcomes
    print ("Your chances of winning the big prize with {0} tickets are {1:.7f}%. This means you have {0} in {2} chances of winning".format(tickets,p_chances*100,int(outcomes)))


In [81]:
tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for t in tickets:
    multi_ticket_probability(t)
    print('\n')
    

Your chances of winning the big prize with 1 tickets are 0.0000072%. This means you have 1 in 13983816 chances of winning


Your chances of winning the big prize with 10 tickets are 0.0000715%. This means you have 10 in 13983816 chances of winning


Your chances of winning the big prize with 100 tickets are 0.0007151%. This means you have 100 in 13983816 chances of winning


Your chances of winning the big prize with 10000 tickets are 0.0715112%. This means you have 10000 in 13983816 chances of winning


Your chances of winning the big prize with 1000000 tickets are 7.1511238%. This means you have 1000000 in 13983816 chances of winning


Your chances of winning the big prize with 6991908 tickets are 50.0000000%. This means you have 6991908 in 13983816 chances of winning


Your chances of winning the big prize with 13983816 tickets are 100.0000000%. This means you have 13983816 in 13983816 chances of winning




For this step, I calculated how a person's chances of winning change with the number of tickets bought, so that users can understand that they need to be a very large number of tickets for their chances of winning to improve significantly.

## Chances of getting a winning number

Now we're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:

- Inside the app, the user inputs:
    - six different numbers from 1 to 49; and
    - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

In [105]:
def probability_less_6(i):
    num_combi = combinations(6,i)
    combi_outcomes = combinations(43,6-i)
    success_outcomes = num_combi*combi_outcomes
    all_possible = combinations(49,6)
    p = success_outcomes / all_possible
    print ("Your chances of having {0} winning numbers are {1:.5f}%. This means you have a 1 in {2:.0f} chance of winning".format(i,p*100,all_possible/success_outcomes))
    print('\n')
    

In [106]:
# Testing the function
for a in [2,3,4,5]:
    probability_less_6(a)

Your chances of having 2 winning numbers are 13.23780%. This means you have a 1 in 8 chance of winning


Your chances of having 3 winning numbers are 1.76504%. This means you have a 1 in 57 chance of winning


Your chances of having 4 winning numbers are 0.09686%. This means you have a 1 in 1032 chance of winning


Your chances of having 5 winning numbers are 0.00184%. This means you have a 1 in 54201 chance of winning




## Chances of getting at least a certain number of winning numbers

Now we're going to write one more function to allow the users to calculate probabilities for having at least two, three, four, or five winning numbers.

We will base this on the logic that the number of successful outcomes for having at least four winning numbers is the sum of these three numbers:

- The number of successful outcomes for having four winning numbers exactly
- The number of successful outcomes for having five winning numbers exactly
- The number of successful outcomes for having six winning numbers exactly

In [107]:
def probability_winning_num(i):
    num_combi = combinations(6,i)
    combi_outcomes = combinations(43,6-i)
    success_outcomes = num_combi*combi_outcomes
    all_possible = combinations(49,6)
    p = success_outcomes / all_possible
    return p
    

def probability_at_least(i):
    all_possible = combinations(49,6)
    p = 0
    for x in range (i,6,1):
        p += probability_winning_num(i) 
    print ("Your chances of having at least {0} winning numbers are {1:.5f}%.".format(i,p*100))
    print('\n')
    

In [108]:
# Testing the function
for a in [2,3,4,5]:
    probability_at_least(a)

Your chances of having at least 2 winning numbers are 52.95121%.


Your chances of having at least 3 winning numbers are 5.29512%.


Your chances of having at least 4 winning numbers are 0.19372%.


Your chances of having at least 5 winning numbers are 0.00184%.




## Conclusion

For the first version of the app, we coded 5 main functions:

- one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
- check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set
- multi_ticket_probability() — calculates the probability for any number of of tickets between 1 and 13,983,816
- probability_less_6() — calculates the probability of having two, three, four or five winning numbers exactly
- probability_at_least() - calculates the probability of having at least two, three, four or five winning numbers
