# Mobile App for Lottery Addiction

**In this project will implement Probability to contribute to the development of a mobile app that is meant to help lottery 
addicts better estimate their chances of winning.**

> Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into 
> addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to 
> accumulate debts, and eventually engage in desperate behaviors like theft.

> A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery 
> addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they 
> need us to create the logical core of the app and calculate probabilities.

> For the first version of the app, they want us to focus on the [6/49 lottery](<https://en.wikipedia.org/wiki/Lotto_6/49>) and 
> build functions that enable users to answer questions like:

> * What is the probability of winning the big prize with a single ticket?
> * What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
> * What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

> The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](<https://www.kaggle.com/datascienceai/lottery-dataset>) has data for 3,665 drawings, dating from 1982 to 2018.

> The scenario we're following throughout this project is fictional — the main purpose is to applying probability, permutations 
> and combinations concepts in the setting that simulates a real-world scenario.

> Thus **our goal in this project is to write code that can enable users to answer probability questions about playing the 
> lottery.** 

### Core Functions

> Throughout this project, we'll need to calculate repeatedly probabilities and combinations. As a consequence, we'll start by 
> writing two functions that we'll use often:

> * factorial() - A function that calculates factorials and
> * combinations() - A function that calculates combinations.

In [1]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n*factorial(n-1)

> In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without 
> replacement, which means once a number is drawn, it's not put back in the set.

> To find the number of combinations when we're sampling without replacement and taking only k objects from a group of n 
> objects, we can use the formula:

> ```n! / k!(n-k)!```

In [2]:
def combinations(n, k):
    return factorial(n) / (factorial(k) * factorial(n-k))

> Above we wrote factorial() and combinations(), two core functions that we're going to need repeatedly in this project. Now we 
> need to write a function that calculates the probability of winning the big prize.

> As discuss above in the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. Now a player 
> wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the 
> numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. But if any 
> one number differs, he doesn't win.

> For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the 
> various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by 
> building a function that calculates the probability of winning the big prize for any given single ticket only.

> We have discussed with the engineering team of the medical institute, and they told us we need to be aware of the following 
> details when we write the function:

> * Inside the app, the user inputs six different numbers from 1 to 49. 
> * Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function. 
> * The engineering team wants the function to print the probability value in a friendly way — in a way that any lay people 
> without any probability training are able to understand.

### So let's write this function that calculates the probability of winning the big prize for any given single ticket only.

In [3]:
def one_ticket_probability(lotterynum):
    #all possible outcomes from the set of 49 numbers that range from 1 to 49.
    total_possible_outcomes = combinations(49 , 6)
    
    #calculating the probability for lotternum outcome as winning number.
    
    '''
    as the list of argument passed in lotterynum parameter of this method is just one combination, which means the number  
    of successful outcomes is 1 for it (that is to say in total_possible_outcomes we calculated above, lotterynum
    will surely be one of combination from all combinations in total_possible_outcomes, thus success rate of lotterynum is 1 
    in the combination)
    
    '''
    #to know the probability of the lotterynum to have winning 
    #chance from all combinations of 49 numbers (1-49) we need to
    #divide success rate of lotterynum by the total number of possible 
    #outcomes
    probability_lotterynum = 1/total_possible_outcomes
    
    #calculate the percentage of probability
    percentage_probabilitylotterynum = probability_lotterynum * 100
    
    #user-friendly message for lay person to understand the probability of winning chance of the lotterynum
    print('Your chance to win the big prize with the lottery number {} is just {:.7f}% only.'.format(lotterynum, 
                                                                                       percentage_probabilitylotterynum))
    

In [4]:
testcases1 = [12, 5, 21, 10, 13, 18]
one_ticket_probability(testcases1)

Your chance to win the big prize with the lottery number [12, 5, 21, 10, 13, 18] is just 0.0000072% only.


In [5]:
testcases2 = [1, 33, 26, 7, 19, 2]
one_ticket_probability(testcases1)

Your chance to win the big prize with the lottery number [12, 5, 21, 10, 13, 18] is just 0.0000072% only.


> Above we wrote a function that can tell users what is the probability of winning the big prize with his lottery number. 

> For the first version of the app, however, users should also be able to compare their lottery number against the historical 
> lottery data in Canada to find whether this lottery number of his was ever been won previously.

> So let's explore the historical data coming from the Canada 6/49 lottery. The dataset can be downloaded from [Kaggle](<https://www.kaggle.com/datascienceai/lottery-dataset>) website.

### Exploring the historical data coming from the Canada 6/49 lottery.

##### First let's get familiar with its structure.

In [6]:
#reading the dataset
import pandas as pd
lotterydraw_data = pd.read_csv('649.csv')

In [7]:
lotterydraw_data.shape

(3665, 11)

In [8]:
lotterydraw_data.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [9]:
lotterydraw_data.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


> The dataset contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 
> 2018.

>For each drawing, we can find the six numbers drawn in the following six columns:
> * NUMBER DRAWN 1
> * NUMBER DRAWN 2
> * NUMBER DRAWN 3
> * NUMBER DRAWN 4
> * NUMBER DRAWN 5
> * NUMBER DRAWN 6

> We are now going to write a function that will enable users to compare their ticket against the historical lottery data in 
> Canada and determine whether it was previously ever have been won.

> The engineering team told us that we need to be aware of the following details which needs to be implemented in our function:

> * Inside the app, the user inputs six different numbers from 1 to 49.
> * Under the hood, the six numbers will come as a Python list and serve as an input to our function.
> * The engineering team wants us to write a function that prints:
>     * the number of times the combination selected occurred in 
>      the Canada dataset; and 
>     * the probability of winning the big prize in the next 
>      drawing with that combination.

##### Now let us find the occurence of ticket/lottery number in the historical Canada dataset 

In [10]:
#Extract all the winning six numbers from the historical dataset as Python sets
def extract_numbers(dfrow):
    df_lotterynum = set(dfrow[4:10])
    return df_lotterynum

winning_lotterynum = lotterydraw_data.apply(extract_numbers, axis =1)
winning_lotterynum.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [11]:
def check_historical_occurence(lotterynum, winning_lotterynum):
    '''
    lotterynum: a Python list
    winning_lotterynum: a pandas Series having each datapoints as set
    '''
    #converting lotterynum as set
    lotterynum = set(lotterynum)
    #Comparing the lotterynum set against the pandas Series winning_lotterynum that contains the sets with the winning numbers 
    #to find the number of matches. The output is a Series of Boolean values that will be returned as a result of the comparison. 
    #(the value will be True each time there'll be a match).
    check_occurrence = winning_lotterynum == lotterynum
    findoccurrence = check_occurrence.sum()
    
    if findoccurrence == 0:
        print("The combination {} has never occured.\
              \nThis doesn't guarantee that it is more likely to occur now.\
              \nYour chances to win the big prize in the next drawing using the combination {} is still 0.0000072% only."
              .format(lotterynum, lotterynum))
        
    else:
        print("The combination {} has occured {} number of times in the past.\
              \nBut your chances to win the big prize in the next drawing using the combination {} is still 0.0000072% only."
              .format(lotterynum, findoccurrence, lotterynum))

In [12]:
check_historical_occurence(testcases1, winning_lotterynum)

The combination {5, 10, 12, 13, 18, 21} has never occured.              
This doesn't guarantee that it is more likely to occur now.              
Your chances to win the big prize in the next drawing using the combination {5, 10, 12, 13, 18, 21} is still 0.0000072% only.


In [13]:
check_historical_occurence(testcases2, winning_lotterynum)

The combination {1, 33, 2, 7, 19, 26} has never occured.              
This doesn't guarantee that it is more likely to occur now.              
Your chances to win the big prize in the next drawing using the combination {1, 33, 2, 7, 19, 26} is still 0.0000072% only.


> So far, we wrote two functions:

> * one_ticket_probability() — calculates the probability of winning the big prize with a single ticket 
> * check_historical_occurrence() — checks whether a certain combination has occurred as winning number in the Canada lottery 
> dataset.

> However, the lottery addicts usually play with more than one ticket on a single drawing, thinking that this might increase 
> their chances of winning significantly. Our purpose is to help them better estimate their chances of winning.

> Now we are going to write a function that will allow the users to calculate the chances of winning for any number of 
> different tickets.

> We've talked with the engineering team and they gave us the following information:

> * The user will input the number of different tickets they want to play (without inputting the specific combinations they 
> intend to play).
> * Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
> * The function should print information about the probability of winning the big prize depending on the number of different 
> tickets played.

> Let's now start writing this function.

### Function that prints the probability of winning the big prize depending on the number of different tickets played.

In [14]:
def multi_ticket_probability(num_tickets):
    #all possible outcomes from the set of 49 numbers 
    #that range from 1 to 49.
    total_possible_outcomes = combinations(49 , 6)
    
    #the number of successful outcomes is given by the number 
    #of tickets the user intends to play - num_tickets.
    #Use the number of successful outcomes and the total number 
    #of possible outcomes to calculate the probability for the 
    #number of tickets inputted.
    probability_numtickets = num_tickets / total_possible_outcomes
    
    #calculate the percentage of probability
    percentage_probabilitynumtickets = probability_numtickets * 100
    
    #user-friendly message for lay person to understand the
    #probability of winning chance based on the number of 
    #different tickets played.
    if num_tickets == 1:
        print("Your chances to win the big prize with one ticket is {:.6f}%."
        .format(percentage_probabilitynumtickets))
    else:   
        print("Your chances to win the big prize with {:,} different tickets are {:.6f}%."
              .format(num_tickets, percentage_probabilitynumtickets))

In [15]:
testcases = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for cases in testcases:
    multi_ticket_probability(cases)
    print()

Your chances to win the big prize with one ticket is 0.000007%.

Your chances to win the big prize with 10 different tickets are 0.000072%.

Your chances to win the big prize with 100 different tickets are 0.000715%.

Your chances to win the big prize with 10,000 different tickets are 0.071511%.

Your chances to win the big prize with 1,000,000 different tickets are 7.151124%.

Your chances to win the big prize with 6,991,908 different tickets are 50.000000%.

Your chances to win the big prize with 13,983,816 different tickets are 100.000000%.



> So far, we wrote three main functions:

> * one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
> * check_historical_occurrence() — checks whether a certain combination has occurred as winning number in the Canada lottery 
> dataset
> * multi_ticket_probability() — calculates the probability for any number of tickets between 1 and 13,983,816

> Now we are going to write one more function to allow the users to calculate probabilities for two, three, four, or five 
> winning numbers.

> For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of 
> the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, 
> or five winning numbers.

> These are the engineering details we'll need to be aware of:

> * Inside the app, the user inputs:
>     * six different numbers from 1 to 49; and
>     * an integer between 2 and 5 that represents the number of winning numbers expected
> * Our function prints information about the probability of having the inputted number of winning numbers.

> How we'll be calculating the probability for having exactly five winning numbers?

> First, we need to differentiate between these two probability questions:

> * What is the probability of having exactly five winning numbers?
> * What is the probability of having at least five winning numbers?

> For our purposes here, we want to answer the first question.

> For the sake of example, let's say a player chose these six numbers on a ticket: (1, 2, 3, 4 ,5 ,6). Out of these six 
> numbers, we can form six five-number combinations:

> * (1, 2, 3, 4, 5)
> * (1, 2, 3, 4, 6)
> * (1, 2, 3, 5, 6)
> * (1, 2, 4, 5, 6)
> * (1, 3, 4, 5, 6)
> * (2, 3, 4, 5, 6)

> We can also find the total number of five-number combinations by calculating ("6 choose 5"):

> 6! / 5!(6-5)! = 6

> For each one of the six five-number combinations above, there are 44 possible successful outcomes in a lottery drawing. For 
> the combination (1, 2, 3, 4, 5), for instance, there are 44 lottery outcomes that would return a prize:

> * (1, 2, 3, 4, 5, 6)
> * (1, 2, 3, 4, 5, 7)
> * ...
> * (1, 2, 3, 4, 5, 30)
> * (1, 2, 3, 4, 5, 31)
> * ...
> * (1, 2, 3, 4, 5, 49)

> However, we need to leave out the outcome (1, 2, 3, 4, 5, 6) because we're only interested in outcomes that match exactly 
> five numbers, not at least five numbers. This means that for each of our six five-number combinations we have 43 possible 
> successful outcomes, not 44.

> Since there are six five-number combinations and each combination corresponds to 43 successful outcomes, we need to multiply 
> 6 by 43 to find the total number of successful outcomes:

> 6 x 43 = 258

> Since there are 258 successful outcomes and there are 13,983,816 total possible outcomes (the result of "49 choose 6"), the 
> probability of having exactly five winning numbers for a single lottery ticket is:

> 258 / (49 choose 6) = 0.00001845

> Now let's try to code the function. To calculate the probabilities, we tell the engineering team that the specific 
> combination on the ticket is irrelevant behind the scenes, and we only need the integer between 2 and 5 representing the 
> number of winning numbers expected.

### Function to allow the users to calculate probabilities for two, three, four, or five winning numbers

In [16]:
def probability_less_6(n_winningnum):
    
    nwinningnum_combinations = combinations(6, n_winningnum)
    nwinningnum_combinations_remaining = combinations(43, 6 - n_winningnum)
    successful_outcomes = nwinningnum_combinations * nwinningnum_combinations_remaining
    
    #all possible outcomes from the set of 49 numbers 
    #that range from 1 to 49.
    total_possible_outcomes = combinations(49 , 6)
    
    #Calculate the probability using the number of 
    #successful outcomes and the number of total possible outcomes.
    probability_nwinningnum = successful_outcomes / total_possible_outcomes
    
    #calculate the percentage of probability
    nwinningnumprobability_percentage = probability_nwinningnum * 100
    
    
    #user-friendly message for lay person to understand the
    #probability of winning chance for two, three, four, 
    #or five winning numbers
    print("Your chances of having {} winning numbers with this ticket are {:.6f}%."
          .format(n_winningnum, nwinningnumprobability_percentage))

In [17]:
nwinningnum_testcases = [2, 3, 4, 5]

for cases in nwinningnum_testcases:
    probability_less_6(cases)
    print()

Your chances of having 2 winning numbers with this ticket are 13.237803%.

Your chances of having 3 winning numbers with this ticket are 1.765040%.

Your chances of having 4 winning numbers with this ticket are 0.096862%.

Your chances of having 5 winning numbers with this ticket are 0.001845%.



# Conclusion

> We had set out with a goal in this project to write code that can enable users to answer probability questions about playing 
> the lottery. We were working for the first version of the app and we managed to write four main functions for our app:

> * one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
> * check_historical_occurrence() — checks whether a certain combination has occurred as winning number in the Canada lottery 
> dataset
> * multi_ticket_probability() — calculates the probability for any number of tickets between 1 and 13,983,816
> * probability_less_6() — calculates the probability of having two, three, four or five winning numbers

> For the second version of the app there are a lots of other possible features we can include to enhance our app. But for this 
> first version we indeed did a fair work what we had set out to do:

> *Through Probability help lottery addicts better estimate their chances of winning the lottery so that they can think 
> logically before purchasing a ticket for themselves and thus get a bit control on their addiction.*