# Lottery probability calculations

The goal of this project is to calculate probability functions for the lottery 6/49.

Here are examples of scenarios we would like to calculate in this project :

What is the probability of winning the big prize with a single ticket?
What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?


### Common functions

There are some formulas we will use many times, the factorial formula and the combinations formula, so let's create a function for each of these formulas.

In [1]:
#Function to calculate factorials
def factorial(n):
    r = 1
    for i in range(n,0,-1):
        r *= i
    return r

#Function to calculate combinations
def combinations(n,k):
    r = factorial(n) / (factorial(k) * factorial(n-k))
    return r

### One ticket probability

We now want to write a function with the following conditions :

Inside the app, the user inputs six different numbers from 1 to 49.
Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
We want the function to print the probability value in a friendly way, meaning in a way that without any knowledge of probabilities, people will still be able to understand.

In [2]:
def one_ticket_probability(x1,x2,x3,x4,x5,x6):
    sequence = [x1,x2,x3,x4,x5,x6]
    length = len(sequence)
    prob = int(combinations(49, length))
    percent_form = 1 / prob * 100
    message = "The probability of winning with your numbers {} is {:.7f}%. ".format(sequence, 
  percent_form) + "In other words you have 1 chance in {:,}.".format(int(prob))
    return print(message)

#test the function
print(one_ticket_probability(23,24,25,35,46,54))
print()
print(one_ticket_probability(3,2,5,25,4,44))

The probability of winning with your numbers [23, 24, 25, 35, 46, 54] is 0.0000072%. In other words you have 1 chance in 13,983,816.
None

The probability of winning with your numbers [3, 2, 5, 25, 4, 44] is 0.0000072%. In other words you have 1 chance in 13,983,816.
None


### Historical Data Check for Canada Lottery

Let's compare the probabilities with some historical data draws.

In [3]:
import pandas as pd
import numpy as np

lottery_canada = pd.read_csv("649.csv")

In [4]:
print(lottery_canada.shape)
lottery_canada.head(3)

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [5]:
lottery_canada.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


We're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

We want the following conditions:

Inside the app, the user inputs six different numbers from 1 to 49.
Under the hood, the six numbers will come as a Python list and serve as an input to our function.
We want to write a function that prints:
the number of times the combination selected occurred in the Canada data set; and
the probability of winning the big prize in the next drawing with that combination.

We first write a function that extracts the historical winning numbers.

In [6]:
#This initial way of writing the function does not work, so I rewrote the function
def extract_numbers_wrong(x):
    #seq = []
    seq = [lottery_canada.loc[x,'NUMBER DRAWN 1'], 
           lottery_canada.loc[x,'NUMBER DRAWN 2'], 
           lottery_canada.loc[x,'NUMBER DRAWN 3'], 
           lottery_canada.loc[x,'NUMBER DRAWN 4'],
           lottery_canada.loc[x,'NUMBER DRAWN 5'], 
           lottery_canada.loc[x,'NUMBER DRAWN 6']]
    return seq

#test function extract_numbers
print(extract_numbers_wrong(0))

[3, 11, 12, 14, 41, 43]


In [7]:
def extract_numbers(ser):
    vals = ser[4:10]
    seq = {vals[0],vals[1],vals[2],vals[3],vals[4],vals[5]}
    return seq

win_numbers = lottery_canada.apply(extract_numbers, axis = 1)

In [8]:
win_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Now we write the function that compares the users input with historical winning sequences.

In [9]:
#user is a list that the user inputs. hist is the historical winning sequences.
def check_historical_occurence(user, hist):
    vals_u = set(user)
    bool_match = vals_u == hist
    tot = bool_match.sum()
    message = print("In the past, your number won {} time/s. \
Your chances to win the big prize in the next drawing using \
your combination are 0.0000072%".format(tot))
    return message

#Here we test an input
user_input = [3,41,43,12,11,14]
check_historical_occurence(user_input, win_numbers)


In the past, your number won 1 time/s. Your chances to win the big prize in the next drawing using your combination are 0.0000072%


### Multi ticket chances

We're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [10]:
def multi_ticket_probability(x) :
    possible_outcomes = combinations(49, 6)
    chances_win = x / possible_outcomes
    percent_format = chances_win * 100
    message = "Your chances of winning with {} tickets is {:.7f}% .".format(x,percent_format)
    return message

#Testing the function

test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for i in test_inputs:
    print(multi_ticket_probability(i))

Your chances of winning with 1 tickets is 0.0000072% .
Your chances of winning with 10 tickets is 0.0000715% .
Your chances of winning with 100 tickets is 0.0007151% .
Your chances of winning with 10000 tickets is 0.0715112% .
Your chances of winning with 1000000 tickets is 7.1511238% .
Your chances of winning with 6991908 tickets is 50.0000000% .
Your chances of winning with 13983816 tickets is 100.0000000% .


### n winning numbers

Now we're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

Inside the app, the user inputs:
six different numbers from 1 to 49; and
an integer between 2 and 5 that represents the number of winning numbers expected.
Our function prints information about the probability of having exactly the inputted number of winning numbers.

In [11]:
def probability_less_6(user_guess):
    a = combinations(6,user_guess)
    
    diff = 6-user_guess
    b = 49-user_guess-diff
    
    c = factorial(b) / (factorial(diff) * factorial(b-diff))
    #c could also be written : c = combinations(43,diff)
    
    final = a * c / combinations(49,6)
    percent_final = final * 100
    return percent_final

user_input = [3,41,43,12,11,14]
user_number = 5

print("Chances of having {} correct numbers : {:.7f}%".format(user_number,probability_less_6(user_number)))

Chances of having 5 correct numbers : 0.0018450%


Let's test the function for either 2, 3 , 4 or 5 winning numbers :

In [12]:
for i in range(1,6):
    print("Chances of having {} correct numbers : {:.7f}%".format(i,probability_less_6(i)))

Chances of having 1 correct numbers : 41.3019450%
Chances of having 2 correct numbers : 13.2378029%
Chances of having 3 correct numbers : 1.7650404%
Chances of having 4 correct numbers : 0.0968620%
Chances of having 5 correct numbers : 0.0018450%
