# Lottery: what are the chances?

A medical institute aims to help treat gambling addiction through a new app. They asked us to create the logical core of the app and calculate probabilities. The questions we are posed to answer:

1. What is the probability of winning the big prize with a single ticket?
2. What is the probability of winning the big prize if we play 40 (or any other number) different tickets?
3. What is the probability of at least 5, 4, 3, or 2 winning numbers on a single ticket?

The institute asked us to consider historical data from the 6/49 lottery game in Canada, the dataset contains data from 1982 to 2018.

In [1]:
import pandas as pd
import numpy as np

In [2]:
def factorial(n):
    '''
    Calculates the factorial for any given integer.
    '''
    fact = 1
    for i in range(n):
        fact = fact * (n-i)
    return fact


In [3]:
def combinations(n,k):
    '''
    Calculates the number of possible combinations when taking k objects 
    from a group of n objects, without replacement.
    '''
    c = (factorial(n))/(factorial(k)* factorial(n-k))
    return c

In [4]:
def one_ticket_probability(my_list):
    '''
    Calculates the probability of winning the lottery with a list of
    6 integers, from 49 possible integers.
    '''
    total_combo = combinations(49,6)
    success_outcome = 1
    p = (100*(success_outcome/total_combo))
    print('There is a {prob:f}% chance of these numbers being the winning ticket'.format(prob=p))

We wrote three functions:

1. `factorial()` calculates the factorial for any integer
2. `combinations()` calculates the number of combinations for a limited group within a n objects, without replacement
3. `one_ticket_probability` provides the user feedback on their chances of winning the lottery. The actual numbers picked doesn't really matter, as the there is only one successful outcome (one list of correct number, with order mattering)

# Chances of Jackpot

## Historical comparisons

Would the user have won with their chosen numbers if those numbers were used in any of the previous lotteries?

We are asked to write a function that:

* Returns the number of times the selected combination occurred in the dataset
* The probability of winning the big prize in the next drawing with that combination

In [5]:
df = pd.read_csv('649.csv')

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


In [7]:
df.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [8]:
df.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [9]:
def extract_numbers(row):
    '''
    Returns the winning numbers for a given row
    '''
    lot_set = row.loc['NUMBER DRAWN 1':'NUMBER DRAWN 6'].values
    return set(lot_set)

In [10]:
winning_numbers = df.apply(extract_numbers,axis=1)

In [11]:
def check_historical_occurence(user_nums, win_nums):
    for i in user_nums:
        if (i<1)|(i > 49):
            return ("The lottery only accepts numbers between 1 and 49")
        else:
            pass
    user_nums = set(user_nums)
    num = user_nums == win_nums
    print("This set of numbers occurred {num} times in the past".format(num=num.sum()))
    one_ticket_probability(user_nums)

In [12]:
check_historical_occurence([3, 41, 11, 12, 43, 14]
                           ,winning_numbers)

This set of numbers occurred 1 times in the past
There is a 0.000007% chance of these numbers being the winning ticket


Two new functions were written:

1. `extract_numbers()` that returns the winning set of numbers for a given row in the dataset
2. `check_historical_occurence()` that provides feedback to the user on the amount of times a chosen set of numbers won in the past, and the chance that it will win again in the future.

## Number of tickets bought

What are the chances of winning the lottery if the user buys more than one ticket? We are asked to write a function that:

1. Takes in the number of different tickets they want to play. The actual numbers on the ticket are not needed.
2. Returns the probability of winning

In [13]:
def multi_ticket_probability(num_list):
    '''
    Calculates the probability of winning with any number of tickets, 
    assuming each ticket has different numbers.
    '''
    poss_outcomes = combinations(49,6)
    for i in num_list:
        success = i
        p = 100*(success/poss_outcomes)
        print("Your chances of a winning ticket from {0} tickets is {1:f}%".format(i,p))

In [14]:
multi_ticket_probability([1, 10, 100, 10000, 1000000, 6991908, 13983816])

Your chances of a winning ticket from 1 tickets is 0.000007%
Your chances of a winning ticket from 10 tickets is 0.000072%
Your chances of a winning ticket from 100 tickets is 0.000715%
Your chances of a winning ticket from 10000 tickets is 0.071511%
Your chances of a winning ticket from 1000000 tickets is 7.151124%
Your chances of a winning ticket from 6991908 tickets is 50.000000%
Your chances of a winning ticket from 13983816 tickets is 100.000000%


If buying 1000 tickets at the lottery, the chances of a winning ticket is still only 0.07%. 

# Chances of Smaller Prizes

The lottery allows for smaller prizes in case the user has a winning match for 1-5 numbers from a ticket. We are asked to write a function that returns the probability of have 5 or less winning numbers.

In [15]:
def probability_less_6(num):
    '''
    Takes integer 2-5, returns the chances of having that many winning numbers
    '''
    if (num > 6)|(num<1):
        print("The maximum number of numbers per ticket is 6, the minimum is 1")
    else:
        player_combinations = combinations(6, num) # possible combinations with (num) correct choices

        # the last numbers can be anything as long as it doesn't repeat our numbers, 
        # and as long as its not any of the correct numbers.
        # Of the 6 correct numbers, we're only looking at (num) numbers. 
        # Of the remaining 49-num possibilities, we are not allowed to have 6-num correct choices.
        # This will always bring the total number of possible successful outcomes to 43.
        
        player_combinations_remaining = combinations(43, 6-num) # possible failures with (6-num) choices
        
        n_correct_possible = player_combinations * player_combinations_remaining
        
        total_poss_outcomes = combinations(49,6)
                
        p = 100*(n_correct_possible/total_poss_outcomes)
        print("There is a {0:f}% chance that {1} of your 6 numbers are winning numbers".format(p,num))
        
        

In [16]:
probability_less_6(2)

There is a 13.237803% chance that 2 of your 6 numbers are winning numbers


In [17]:
probability_less_6(3)

There is a 1.765040% chance that 3 of your 6 numbers are winning numbers


In [18]:
probability_less_6(4)

There is a 0.096862% chance that 4 of your 6 numbers are winning numbers


In [19]:
probability_less_6(5)

There is a 0.001845% chance that 5 of your 6 numbers are winning numbers


In [20]:
probability_less_6(6)

There is a 0.000007% chance that 6 of your 6 numbers are winning numbers


# Logic

## Question

__Possible choices:__ (1,2,3,4,5,...,45,46,47,48,49)

__Winning numbers:__ (42,32,22,12,2,8)

Question: Player choses 6 numbers from the set (1-49). Of these chosen numbers, what is the probability that *exactly five* are winners?

## Explanation

__Player's choice:__ (42,32,22,12,2,X), where X is an irrelevant number as we're only interested in the 5 numbers.

* Possible number of combinations for the player's 5 numbers from 6:

\begin{equation}
_6C_5 = {6 \choose 5} =  \frac{6!}{5!(6-5)!} =  6
\end{equation}

* The last number, X, can be any number without repeating our chosen numbers. Thus, there are `49-5 = 44` possible *correct* choices for the player to make. If the player gets (42,32,22,12,2,X) in their ticket, X can be anything that is not 42,32,22,12,2.
* We are interested in *exactly five* correct choices, the last number has to be wrong. This means `X != 8`. So the actual number of correct possibilities is `49-5-1=43`, where:
    * `49` represents the total number of possible choices in the set (1-49)
    * `5` represents the five numbers the player already chose, as they cannot be repeated
    * `1` represents the last actual winning number, which is 8. If the player selects 8 as their last number, would win the Big Win, not the Smaller Prize. For this reason 8 is excluded from the possible correct choices.
* Since there are six combinations possible of the player's correct choices, and each combination has 43 possible successful outcomes, the *total* number of successful outcomes is `6*43 = 258`.
* Finally, to calculate the probability of guessing 5 numbers correct on a six number ticket, we can use the probability formula: `p = s/f`:

\begin{equation}
P_{(5winning numbers)} = \frac{258}{49 \choose 6} = 0.00001845
\end{equation}

## Interpretation

The probability of choosing 6 numbers from the range of `1-49` where 5 of the 6 match the winning numbers, is `0.001845%`

# At Least

In [21]:
def probability_at_least_6(num):
    '''
    Calculates the probability that at least (num) numbers are correct of our 6 choices.
    '''
    if (num > 6)|(num<=1):
        print("The maximum number of numbers per ticket is 6, the minimum is 2")
    else:
        at_least_possible = 0
        for i in range(num,7):
            player_combinations = combinations(6, num) # possible combinations with (num) correct choices
            player_combinations_remaining = combinations(43, 6-num) # possible failures with (6-num) choices
            n_correct_possible = player_combinations * player_combinations_remaining
            at_least_possible += n_correct_possible
        
        total_poss_outcomes = combinations(49,6)
                
        p = 100*(at_least_possible/total_poss_outcomes)
        print("There is a {0:f}% chance that at least {1} of your 6 numbers are winning numbers".format(p,num))
        
        

In [22]:
probability_at_least_6(2)

There is a 66.189015% chance that at least 2 of your 6 numbers are winning numbers


In [23]:
probability_at_least_6(3)

There is a 7.060162% chance that at least 3 of your 6 numbers are winning numbers


In [24]:
probability_at_least_6(4)

There is a 0.290586% chance that at least 4 of your 6 numbers are winning numbers


In [25]:
probability_at_least_6(5)

There is a 0.003690% chance that at least 5 of your 6 numbers are winning numbers


In [26]:
probability_at_least_6(6)

There is a 0.000007% chance that at least 6 of your 6 numbers are winning numbers
