# Guided Project: Mobile App for Lottery Addiction

In this project, we are making a probability or estimation of chances of winning in the lottery. Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

Throughout the project, we'll need to calculate repeatedly probabilities and combinations.

In [1]:
# Making a function to calculates factorials and combinations
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the variuos numbers they play on a single ticket

In [2]:
def one_ticket_probability(list_six_unique_num):
    total_outcomes = combinations(49, 6)
    probablity = (1 / total_outcomes) * 100
    return "The probability of your six-number lottery ticket is {:.7f}%".format(probablity)

In [3]:
# Make a random six-number lottery and test the function
number_1 = [13, 22, 24, 27, 42, 44]
one_ticket_probability(number_1)

'The probability of your six-number lottery ticket is 0.0000072%'

In [4]:
# Import library and dataset
import pandas as pd
canada_649 = pd.read_csv('649.csv')
print(canada_649.shape)
canada_649.head(3)

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [5]:
canada_649.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


We're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have eer won by now

Write a function that prints
- the number of times the combination selected occurred in the Canada data set
- the probability of winning the big prize in the next drawing with that combination

In [6]:
# Create a function to extract all winning six number from an input of row
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

In [7]:
winning_numbers = canada_649.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [8]:
def check_historical_occurence(user_number, winning_numbers):
    user_set = set(user_number)
    occurence = 0
    for item in winning_numbers:
        if user_set == item:
            occurence += 1
    '''from solution that can be reference in the future
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    '''
    text_one_winning_probability = "And the probability of winning your six-number lottery ticket is 0.0000072%"
    if occurence == 0:
        print("Your six-number lottery ticket {} never occurred. ".format(user_number) + text_one_winning_probability)
    else:
        print("Your six-number lottery ticket {} occured {} time(s). ".format(user_number, occurence) + text_one_winning_probability)

In [9]:
# No occurence
check_historical_occurence(number_1, winning_numbers)

Your six-number lottery ticket [13, 22, 24, 27, 42, 44] never occurred. And the probability of winning your six-number lottery ticket is 0.0000072%


In [10]:
# With occurence
number_2 = [3, 9, 10, 43, 13, 20]
check_historical_occurence(number_2, winning_numbers)

Your six-number lottery ticket [3, 9, 10, 43, 13, 20] occured 1 time(s). And the probability of winning your six-number lottery ticket is 0.0000072%


Lottery addicts usually play more than one ticket. We're going to calculate the chances of winning for any number of different tickets
- the user will input the number of different tickets they want to play(without inputting the specific combinations they intend to play)
- our function will see an integer between 1 and 13,983,816(the maximum number of different tickets)
- the function should print information about the probability of winning the big pirze depending on the number of different tickets played

In [11]:
# Create a function to check the integer between 1 to 13983816
def valid_number_tickets(list_tickets):
    set_tickets = set(list_tickets)
    for item in set_tickets:
        if (item < 0) or (item > 13983817):
            return False # return false if out of bound in the range

In [12]:
# Create a function to print the probability based on the number of tickets played
def multi_ticket_probability(list_tickets):
    set_tickets = sorted(set(list_tickets))
    total_outcomes = combinations(49, 6)
    for item in set_tickets:
        probablity = (item / total_outcomes) * 100
        print("The probability of {:,} different tickets is {:.7f}%".format(item, probablity))

In [13]:
list_tickets_1 = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
multi_ticket_probability(list_tickets_1)

The probability of 1 different tickets is 0.0000072%
The probability of 10 different tickets is 0.0000715%
The probability of 100 different tickets is 0.0007151%
The probability of 10,000 different tickets is 0.0715112%
The probability of 1,000,000 different tickets is 7.1511238%
The probability of 6,991,908 different tickets is 50.0000000%
The probability of 13,983,816 different tickets is 100.0000000%


Write a function that calculates the probability for less winning numbers

In [14]:
def probability_less_6(number):
    number_combination = combinations(6, number)
    remaining_combination = combinations(49 - number, 6 - number)
    successful_outcomes = number_combination * remaining_combination
    total_outcomes = combinations(49, 6)
    probability = (successful_outcomes / total_outcomes) * 100
    print("Your chances of winning {} numbers are {:.6f}%".format(number, probability))

In [15]:
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)

Your chances of winning 2 numbers are 19.132653%
Your chances of winning 3 numbers are 2.171081%
Your chances of winning 4 numbers are 0.106194%
Your chances of winning 5 numbers are 0.001888%


That was all for the guided part of the project! We managed to write four main functions for our app:

- one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
- check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set
- multi_ticket_probability() — calculates the probability for any number of of tickets between 1 and 13,983,816
- probability_less_6() — calculates the probability of having two, three, four or five winning numbers