## Analyzing Probability of winning the lottery

In an effort to fight addiction to the lottery, this analysis aims to provide a quick way to figure out answers for various questions related to the probability of winning the lottery

The intent is to build an app for users of the Canadian Lottery, to allow them to easily calculate the probability of them winning the lottery with the numbers that they choose.

In [1]:
# Import libraries tha twe will use
import pandas as pd
# from math import factorial

# Writing functions to calculate factorials and combinations

# Factorial, though not sure why we aren't using math.factorial
def factorial(n):
    product = 1
    for x in range(n):
        product *= (x + 1)
    return product

def combinations(n, k):
    return factorial(n) // (factorial(k) * factorial(n - k))

### Probability of winning on one ticket

The first step for us is to create a function which the App developers will be able to call. The function returns the probability that the user of the app can with the lottery with any specific 6 numbers out of 49.

The function takes in a list of 6 numbers and prints a message for the user indicating their chances of winning the lottery with that specific combination of 6 numbers.

In [2]:
def one_ticket_probability(numbers):
    total_outcomes = combinations(49, 6)
    the_probability_of_one = 1 / total_outcomes
    print('The chances of you winning the lottery with the numbers {0!s} is {1:.7f}%'.format(numbers, the_probability_of_one *100))

In [3]:
one_ticket_probability([1,2,3,4,5,6])

The chances of you winning the lottery with the numbers [1, 2, 3, 4, 5, 6] is 0.0000072%


### Studying Historical Winnings

We will also allow users to figure out if the numbers they have chosen has ever been on a winning ticket in the past.

For this we will use the historical data from the Canadian Lottery.

In [4]:
lot_wins = pd.read_csv('649.csv')

NUMBERS = ['NUMBER DRAWN {0}'.format(x) for x in range(1, 7)]

print(lot_wins.shape)

lot_wins.head(3)


(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [5]:
lot_wins.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [6]:
def extract_numbers(row):
    return set(row[NUMBERS])


def check_historical_occurence(numbers, history):
    set_numbers = set(numbers)
    return sum(history == set_numbers)

def print_advice(numbers, history):
    past_wins = check_historical_occurence(numbers, history)
    if past_wins > 3:
        print('The numbers {0!s} have won the lottery {1} times in the past.'.format(numbers, past_wins))
    elif past_wins > 2:
        print('The numbers {0!s} have won the lottery thrice in the past.'.format(numbers, past_wins))
    elif past_wins > 1:
        print('The numbers {0!s} have won the lottery twice in the past.'.format(numbers, past_wins))
    elif past_wins > 0:
        print('The numbers {0!s} have won the lottery once in the past.'.format(numbers, past_wins))
    else:
        print('The numbers {0!s} have never won the lottery in the past.'.format(numbers, past_wins))
        
    one_ticket_probability(numbers)

win_nums = lot_wins.apply(extract_numbers, axis=1)

print_advice([14,24,31,35,48,37], win_nums)

The numbers [14, 24, 31, 35, 48, 37] have won the lottery once in the past.
The chances of you winning the lottery with the numbers [14, 24, 31, 35, 48, 37] is 0.0000072%


We study the historical data from the Canadian lottery to provide some advice to the users on the numbers they have selected.

We provide data on how many times these numbers have actually won the lottery in the past, and also data regarding the chances they have of winning the lottery in the next draw.

The key consideration here, is that the next draw is in no way influenced by previous draws. And hence, the probability of winning in the next draw is not influenced by past results. That probability stays the same irrespective of whether the numbers have been successful in the past or not.

### Probability of winning with multiple tickets

In [7]:
def multi_ticket_probability(number_of_tickets):
    print('The chances of you winning the lottery with {0} tickets is {1:.7f}%'.format(number_of_tickets, number_of_tickets * 100 / combinations(49, 6)))
    
for x in [1, 10, 100, 10000, 1000000, 6991908, 13983816]:
    multi_ticket_probability(x) 

The chances of you winning the lottery with 1 tickets is 0.0000072%
The chances of you winning the lottery with 10 tickets is 0.0000715%
The chances of you winning the lottery with 100 tickets is 0.0007151%
The chances of you winning the lottery with 10000 tickets is 0.0715112%
The chances of you winning the lottery with 1000000 tickets is 7.1511238%
The chances of you winning the lottery with 6991908 tickets is 50.0000000%
The chances of you winning the lottery with 13983816 tickets is 100.0000000%


As can be seen from the above numbers, the probability of winning the lottery goes up with the number of tickets the user is willing to invest in. However, it is very evident that to have a non-trivial chance of winning the lottery, the user would have to purchase a million tickets, and even then their chances are only in the single digits!

### Probability of less than an exact match

There is also a possibility of lottery users winning smaller prizes when only a few of the numbers on their tickets match.

For example, if the user selects the numbers { 1, 16, 24, 36, 29, 47 }, and the winning draw is { 4, **16**, **29**, **36**, 37, **47** }, then we see that the user has 4 out of 6 numbers matching, and hence receives a smaller prize.

#### Calculating the Probability of a smaller match

We will need a mechanism to determine the probability of only *n* numbers matching, where *n* < 6. For this, we have to determine how many possible outcomes of 6-number draws would have *n* numbers matching.

###### Example: 4-number Match

In any 6-number ticket, there are 15 possible combinations of 4 numbers each. If the actual number drawn in the lottery included any of these 15 combinations, that could net the user with a smaller prize.

If we go back to our example of the user's numbers of { 1, 16, 24, 36, 29, 47 }.

The possible 4-number combinations on this ticket are:
- { 1, 16, 24, 29 }
- { 1, 16, 24, 36 }
- { 1, 16, 24, 47 }
- { 1, 16, 29, 36 }
- { 1, 16, 29, 47 }
- { 1, 16, 36, 47 }
- { 1, 24, 29, 36 }
- { 1, 24, 29, 47 }
- { 1, 24, 36, 47 }
- { 1, 29, 36, 47 }
- { 16, 24, 29, 36 }
- { 16, 24, 29, 47 }
- { 16, 24, 36, 47 }
- { 16, 29, 36, 47 }
- { 24, 29, 36, 47 }

Let us derive the possible outcomes of the Lottery draw that could possibly match **any one** of the above fifteen possible 4-number combinations. For example, if we focus on one of the above fifteen possibilities, say { 1, 16, 24, 36 }, and try to derive the results of the lottery draw that could match these four numbers those possibilities would be:
- { 1, 16, 24, 36, 2, 3 }
- { 1, 16, 24, 36, 2, 4 }
- { 1, 16, 24, 36, 2, 5 }
- { 1, 16, 24, 36, 2, 6 }
- ...
- { 1, 16, 24, 36, 3, 4 }
- { 1, 16, 24, 36, 3, 5 }
- { 1, 16, 24, 36, 3, 6 }
- ...
- { 1, 16, 24, 36, 4, 5 }
- { 1, 16, 24, 36, 4, 6 }
- ...
and so on...

We observe that any combination of numbers, which would give us a match of our 4-number combination, should adhere to the following rules:
- It should contain all the 4 numbers in our 4-number combination (i.e. in our example the winning draw **must** contain the numbers 1, 16, 24, 36 from our user's ticket)
- It should NOT contain any of the other numbers in our user's ticket. This is because if it did contain any of the other numbers then it would not be a 4-number match, but rather a 5-number match or a winning ticket! (i.e. in our example the winning draw **must not** contain the numbers 29 and 47).

We must decide how many combination of 6 digit numbers can be formed by using the first four numbers from our user's ticket and any combination of the remaining numbers other than the ones the user has already selected.

We can derive how many combinations of the remaining 2 digits are possible from the remaining set of numbers in the lottery pool. Since we already know that the user has drawn 6 numbers, only 43 numbers remain in the pool. (i.e. "43 choose 2")

$\begin{pmatrix}
43 \\
2
\end{pmatrix} = \dfrac{43!}{2!(43 - 2)!} = 903$

Hence, for any 4-number combination on our user's ticket, there are 903 possible lottery draws that can result in a 4-number match.

Since there are a total of 15 possible 4-number matches, the total number of possible outcomes on the lottery draw that can yield a 4-number match would be

$15 \times 903 = 13545$

There would be **13,545 possible successful outcomes** for a 4-number match on the lottery.

We need to build a generic mechanism following the above logic for any number of matches.

In [9]:
15 * 903 / combinations(49, 6)

0.000968619724401408

In [19]:
# A generic method to calculate the probability of
# a less than exact match
def probability_less_6(n):
    # First determine the number of possible n-number combinations
    # possible in the 6-number ticket
    n_number_combos = combinations(6, n)
    
    # How many numbers are left over that should not match?
    remaining_numbers = 6 - n
    
    # How many combintations of the other numbers are possible
    # from the lottery pool, leaving out the 6 numbers already
    # on the ticket?
    remaining_number_combos = combinations(43, remaining_numbers)
    
    # Ticket combinations that will yield a n-number match
    successfull_outcomes = n_number_combos * remaining_number_combos
    
    # Probability of getting a n-number match
    return successfull_outcomes / combinations(49, 6)

for x in ['Probability of a {0}-number match: {1:.7f}%'.format(x, y) for (x, y) in [(n, probability_less_6(n) * 100) for n in range(1, 7)]]:
    print(x)

Probability of a 1-number match: 41.3019450%
Probability of a 2-number match: 13.2378029%
Probability of a 3-number match: 1.7650404%
Probability of a 4-number match: 0.0968620%
Probability of a 5-number match: 0.0018450%
Probability of a 6-number match: 0.0000072%
