Development of a mobile app that is meant to help lottery addicts better estimate their chances of winning.

Questions to answer:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

[The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

The 6/49 lottery 6 numbers are drawn from a set of 49 numbers without replacement. We will need two main functions:
- a function that calculates factorials
- a function that calculates combinations

In [1]:
#imports
import pandas as pd

In [117]:
# create factorial function
def factorial(n):
    ''' Return the Permuations of n. Permutations are a way 
    of several possible variations, in which a set or number of 
    things can be ordered or arranged without regarding the order
    '''
    permutations = 1
    for i in range(n):
        num = n - i
        permutations = permutations * num 
    return permutations
factorial(3)

6

In [116]:
def permutations(n,k):
    numerator = factorial(n)
    demoninator = factorial(n-k)
    return numerator/demoninator

permutations(3,3)

6.0

In [120]:
def combinations(n,k):
    '''Taking only k objects from a group of n objects
    Combinations are like permuations, but the order does not matter.
    Therefore the formula is the same for combinations, except we divide by the number of
    ways a single combination can be rearanged. 
    '''
    numerator = factorial(n) 
    demoninator = factorial(k) * factorial(n-k)
    return numerator/demoninator

print((1/combinations(3,3)))

1.0


In [4]:
# write a function that intake the users numbers and outputs the probabilty of winning
def one_ticket_probability(num):
    
    chance_to_win = (1/combinations(49,6))
    'Change from a percentage to a probabilty is 1 divided by (the percentage as nondecimal number divided by 100)'
    odds_ratio = 1/(chance_to_win)
    
    print(f"You have {chance_to_win:.8} chance to win with the numbers {num}")
    print(f"You have {chance_to_win:.8f} chance to win with the numbers {num}")
    print(f"You have {chance_to_win:.8%} chance to win with the numbers {num}") 
    print(f"You have 1 out of {odds_ratio:,} chances of winning")
    

In [5]:
one_ticket_probability([2, 43, 22, 23, 11, 5])

You have 7.1511238e-08 chance to win with the numbers [2, 43, 22, 23, 11, 5]
You have 0.00000007 chance to win with the numbers [2, 43, 22, 23, 11, 5]
You have 0.00000715% chance to win with the numbers [2, 43, 22, 23, 11, 5]
You have 1 out of 13,983,816.0 chances of winning


# Examing Historical Data of Lottery Winnings

In [63]:
lottery_canada_import  = pd.read_csv("649.csv").rename(columns=str.lower)
lottery_canada = lottery_canada_import.copy()
lottery_canada

Unnamed: 0,product,draw number,sequence number,draw date,number drawn 1,number drawn 2,number drawn 3,number drawn 4,number drawn 5,number drawn 6,bonus number
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45
...,...,...,...,...,...,...,...,...,...,...,...
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8


In [64]:
lottery_canada.columns = lottery_canada.columns.str.replace(" ", "_")
lottery_canada

Unnamed: 0,product,draw_number,sequence_number,draw_date,number_drawn_1,number_drawn_2,number_drawn_3,number_drawn_4,number_drawn_5,number_drawn_6,bonus_number
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45
...,...,...,...,...,...,...,...,...,...,...,...
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8


In [65]:
lottery_canada["product"].unique()
lottery_canada = lottery_canada.drop(["product", "sequence_number"], axis = 1)

In [80]:
lottery_canada.columns

Index(['draw_number', 'draw_date', 'number_drawn_1', 'number_drawn_2',
       'number_drawn_3', 'number_drawn_4', 'number_drawn_5', 'number_drawn_6',
       'bonus_number', 'all_numbers'],
      dtype='object')

In [86]:
lottery_canada['all_numbers'] = lottery_canada[['number_drawn_1', 'number_drawn_2',
       'number_drawn_3', 'number_drawn_4', 'number_drawn_5', 'number_drawn_6',]].values.tolist()

In [95]:
lottery_canada['all_numbers']= set(lottery_canada['all_numbers'])

TypeError: unhashable type: 'list'

In [92]:
lottery_canada.head()

Unnamed: 0,draw_number,draw_date,number_drawn_1,number_drawn_2,number_drawn_3,number_drawn_4,number_drawn_5,number_drawn_6,bonus_number,all_numbers
0,1,6/12/1982,3,11,12,14,41,43,13,"[3, 11, 12, 14, 41, 43]"
1,2,6/19/1982,8,33,36,37,39,41,9,"[8, 33, 36, 37, 39, 41]"
2,3,6/26/1982,1,6,23,24,27,39,34,"[1, 6, 23, 24, 27, 39]"
3,4,7/3/1982,3,9,10,13,20,43,34,"[3, 9, 10, 13, 20, 43]"
4,5,7/10/1982,5,14,21,31,34,47,45,"[5, 14, 21, 31, 34, 47]"


In [91]:
# Create a new df that holds the winning numbers
def extract_numbers(row):
    row = row[2:7]
    row = set(row.values)
    return row

winning_numbers = lottery_canada.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 14}
1    {33, 36, 37, 39, 8}
2     {1, 6, 23, 24, 27}
3     {3, 9, 10, 13, 20}
4    {34, 5, 14, 21, 31}
dtype: object

In [100]:
# turning a list into a set
some_list = [1,2]
some_set  = set(some_list)

print(some_set)

{1, 2}


In [96]:
# write a functio that takes in the users numbers and checks them vs the historical numbers 
def check_historical_occurrence(user_numbers, historical_numbers):   
    '''
    user_numbers: a Python list
    historical numbers: a pandas Series
    '''
    
    user_numbers_set = set(user_numbers)
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

In [97]:
# testing function
test_input_3 = [33, 36, 37, 39, 8, 41]
check_historical_occurrence(test_input_3, winning_numbers)

test_input_4 = [3, 2, 44, 22, 1, 44]
check_historical_occurrence(test_input_4, winning_numbers)

The combination [33, 36, 37, 39, 8, 41] has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination [33, 36, 37, 39, 8, 41] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.
The combination [3, 2, 44, 22, 1, 44] has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination [3, 2, 44, 22, 1, 44] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [101]:
#WIRTE A FUNCTION SO USERS KNOW THE PROBABILTY FOR HAVING MULTI TICKETS
def multi_ticket_probability(n_tickets):
    
    n_combinations = combinations(49, 6)
    
    probability = n_tickets / n_combinations
    percentage_form = probability * 100
    
    if n_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(percentage_form, int(n_combinations)))
    
    else:
        combinations_simplified = round(n_combinations / n_tickets)   
        print('''Your chances to win the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_tickets, percentage_form,
                                                               combinations_simplified))

In [102]:
# TESTING FUNCTION
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter

Your chances to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chances to win.
------------------------
Your chances to win the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chances to win.
------------------------
Your chances to win the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chances to win.
------------------------
Your chances to win the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chances to win.
------------------------
Your chances to win the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chances to win.
------------------------
Your chances to win the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chances to win.
------------------------
Your chances to win the big prize with 13,983,816 different ti

In [112]:
combinations(6,2)

15.0

In [104]:
# write a function that calculates a players ticket will match 5 or less numbers
def probability_less_6(n_winning_numbers):
    
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))

In [121]:
#test
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------')

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances to win.
--------------------------
Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances to win.
--------------------------
Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances to win.
--------------------------
Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances to win.
--------------------------


Conclusion & Next Steps
In this project we coded four main functions for our app:

one_ticket_probability() to calculate the probability of winning the jackpot with a single ticket
check_historical_occurrence() to checks if a certain combination has occurred before in the dataset
multi_ticket_probability() to calculate the probability for having any number of tickets for a drawing
small_prize_probability() to calculate the probability of having a two, three, four or five winning number match
If we wanted to continue building more features into our app, some next steps could be:

Make the outputs even easier for the user to understand by adding memorable analogies of strange events that occur in life at similar probabilities.
Combine the one_ticket_probability() and check_historical_occurrence() functions to output probability and historical occurrence information at the same time.
Create another function similar to small_prize_probability(), but one that calculates the probability of having at least two, three, four, or five winning numbers instead of the quantity entered exactly.

The idea for this project comes from the DATAQUEST Probability: Fundamentals course.