# Dataquest Guided Project: Mobile App for Lottery Addiction

This is a Dataquest guided project completed as part of the probability module. Our goal is to compute the probability of winning the Canadian 6/49 lottery for a variety of scenarios:

* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

We will start by defining functions for calculating factorials and combinations.

In [1]:
def factorial(n):
    x = 1
    for i in range(1,n+1):
        x *= i
    return x

def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k)*factorial(n-k)
    return numerator/denominator


Now, we will use these functions to determine the probability of one ticket winning the lottery. Since this will be used for an app, we will define a function `one_ticket_probability()` to do this calculation based on number the user inputs.

In [4]:
def one_ticket_probability(user_input):
    total_outcomes = combinations(49,6)
    successful_outcomes = 1
    probability_success = successful_outcomes/total_outcomes
    probability_success *= 100 #convert probability to percentage
    print('''Your chances of winning with numbers {} are {:.7f}%.'''.format(
    user_input,probability_success))
    

We will now test our function with a couple of different inputs.

In [5]:
test_1 = [1,2,3,4,5,6]
one_ticket_probability(test_1)

Your chances of winning with numbers [1, 2, 3, 4, 5, 6] are 0.0000072%.


In [6]:
test_2 = [2,4,6,8,10,12]
one_ticket_probability(test_2)

Your chances of winning with numbers [2, 4, 6, 8, 10, 12] are 0.0000072%.


Our function works as expected. The answer should be the same with all user inputs of length 6.

We will now incorporate [historical lottery data](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) from Kaggle into our function. First we will import and explore the dataset. This data includes the winning numbers for the 6/49 Canadian lottery from 1982 to 2018. Each row represents a single drawing. Each number drawn has its own column.

In [8]:
import pandas as pd
df = pd.read_csv('649.csv')
df.shape #number of rows and columns

(3665, 11)

In [9]:
#first 3 rows
df.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [10]:
#last 3 rows
df.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


Now we will extract each drawing in the dataset into a series of Python sets. We will do this by defining a function `extract_numbers()` and applying this function to the dataset using `DataFrame.apply()`.

In [16]:
def extract_numbers(row):
    num_list = row[4:10]
    num_set = set(num_list)
    return num_set
    
set_list = df.apply(extract_numbers, axis=1)
set_list

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3         {3, 9, 10, 43, 13, 20}
4        {34, 5, 14, 47, 21, 31}
5        {8, 41, 20, 21, 25, 31}
6       {33, 36, 42, 18, 25, 28}
7        {7, 40, 16, 17, 48, 31}
8        {5, 38, 37, 10, 23, 27}
9        {4, 37, 46, 15, 48, 30}
10        {33, 38, 7, 9, 42, 21}
11      {36, 11, 43, 17, 19, 20}
12       {37, 7, 14, 47, 17, 20}
13      {35, 44, 25, 28, 29, 30}
14       {36, 39, 8, 41, 47, 18}
15       {9, 12, 13, 14, 44, 48}
16       {4, 40, 43, 44, 14, 18}
17      {34, 35, 36, 13, 16, 18}
18      {36, 11, 23, 25, 28, 29}
19       {37, 7, 45, 18, 23, 25}
20      {37, 11, 45, 18, 19, 31}
21       {8, 14, 16, 48, 18, 31}
22       {4, 11, 45, 23, 24, 25}
23        {33, 34, 3, 4, 48, 19}
24       {5, 43, 17, 21, 28, 30}
25       {36, 6, 38, 46, 17, 24}
26        {4, 9, 10, 11, 43, 46}
27       {32, 33, 7, 13, 45, 23}
28      {35, 37, 11, 18, 22, 28}
29      {35, 45, 48, 25, 26, 31}
          

Now that we have extracted the historical winning numbers, we can check the user input against this data. We will create a function `check_historical_occurence()` which takes user input and sets of winning numbers as inputs.

In [24]:
def check_historical_occurence(user_input,historical):
    user_set = set(user_input)
    boolean_list = []
    for drawing in historical:
        if user_set == drawing:
            boolean_list.append(True)
        else:
            boolean_list.append(False)
    num_wins = sum(boolean_list)
    print('''This number has won {} time(s) in the past.'''.format(num_wins))
    return one_ticket_probability(user_input)

Now we will test this function with a few inputs.

In [25]:
check_historical_occurence(test_1,set_list)

This number has won 0 time(s) in the past.
Your chances of winning with numbers [1, 2, 3, 4, 5, 6] are 0.0000072%.


In [26]:
#checking numbers that we know have won.
test_3 = [3, 41, 11, 12, 43, 14]
check_historical_occurence(test_3,set_list)

This number has won 1 time(s) in the past.
Your chances of winning with numbers [3, 41, 11, 12, 43, 14] are 0.0000072%.


Most people who struggle with lottery addiction usually play more than one ticket on a single drawing, hoping to increase their chances of winning. We want to write a function `multi_ticket_probability()` that predicts the probability of winning depending on the number of different tickets played.

In [29]:
def multi_ticket_probability(tickets):
    total_outcomes = combinations(49,6)
    successful_outcomes = tickets
    probability_success = successful_outcomes/total_outcomes
    probability_success *= 100 #convert to percent
    print('''Your chances of winning with {} ticket(s) are {:.7f}%.'''.format(
    tickets,probability_success))
    

Now we will test this function with a number of different numbers of tickets.

In [30]:
input_list = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for x in input_list:
    multi_ticket_probability(x)

Your chances of winning with 1 ticket(s) are 0.0000072%.
Your chances of winning with 10 ticket(s) are 0.0000715%.
Your chances of winning with 100 ticket(s) are 0.0007151%.
Your chances of winning with 10000 ticket(s) are 0.0715112%.
Your chances of winning with 1000000 ticket(s) are 7.1511238%.
Your chances of winning with 6991908 ticket(s) are 50.0000000%.
Your chances of winning with 13983816 ticket(s) are 100.0000000%.


For the 6/49 lottery, there are also smaller prizes for two, three, four, and five winning numbers. Thus, users might be interested in the probability of these wins. We will define a function `probability_less_6()` which takes in a number between 2 and 5 and returns the chances of winning that many numbers in this game.

In [35]:
def probability_less_6(n):
    combinations_per_ticket = combinations(6,n) #6 numbers per ticket
    combinations_remaining = combinations(43,6-n) #49-6=43 numbers remaining
    successful_outcomes = combinations_per_ticket*combinations_remaining
    total_outcomes = combinations(49,6)
    probability = successful_outcomes/total_outcomes
    probability *= 100
    print('''Your chances of winning {} numbers are {:.7f}%.'''.format(
    n,probability))
    
    

Now we will test this function for all possible inputs `[2, 3, 4, 5]`.

In [36]:
possible_inputs = [2, 3, 4, 5]
for x in possible_inputs:
    probability_less_6(x)

Your chances of winning 2 numbers are 13.2378029%.
Your chances of winning 3 numbers are 1.7650404%.
Your chances of winning 4 numbers are 0.0968620%.
Your chances of winning 5 numbers are 0.0018450%.
