# MOBILE APP FOR LOTTERY ADDICTION

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:

What is the probability of winning the big prize with a single ticket?
What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).


Since our goal is to write code that can enable users to answer probability questions about playing the lottery, it makes sense for us to define two functions that we will use to help us.

In [1]:
def factorial(n):
    count = 1
    for i in range(n,0,-1):
        count*= i
    return count

def combinations(n,k):
    top = factorial(n)
    bottom = factorial(k) * factorial(n-k)
    return top/bottom


In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all six numbers drawn. The player will not win the prize if any of their six numbers do not match to the numbers drawn. 

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket, so we will start by building a function that calculates the probability of winning the big prize for any given ticket.

We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:

Inside the app, the user inputs six different numbers from 1 to 49.
Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

## CHECK WHETHER TICKET IS VALID

Since no numbers can be repeated, we need to create a function that determines whether the numbers that are input by the user create a valid ticket, rather then just allowing the user to input 6 random characters into our code, reducing how reliable our app comes off to be.

In [2]:
def valid_ticket():
    print("Please enter your 6 ticket numbers: ")
    print("*********************************")
    numbers = []
    while len(numbers) < 6:
        try:
            number=input('Enter ticket number {}: '.format(len(numbers)+1))
            print("*********************************")
            if int(number) in range(1,50) and int(number) not in numbers:
                numbers.append(int(number))
            else:
                if int(number) not in range(1,50):
                    print("The number must be in the range from 1 to 49.")
                    print("*********************************")
                else:
                    print("The number exists already.")
                    print("*********************************")
        except :
            print("The input is not valid.")
            print("*********************************")
    return numbers

In [3]:
#there are 49 possible numbers, with 6 being chosen without replacement,
#meaning the same number cannot be repeated
def one_ticket_probability():
    ticket = valid_ticket()
    total_outcomes = combinations(49,6)
    successful_outcomes = 1/total_outcomes
    print('Your chance of winning the big prize is: ' + str(round(successful_outcomes *100,6)) + '%. In other words, you have a 1 in 13,983,816 chance of winning.')
    return ticket

## USING THE CANADA DATA SET

In the previous cells, we created functions that allowed us to show users what their chance of winning the big prize of the lottery was for a single ticket. We also want users to be able to compare their ticket against the historical data in Canada and determine whether they would have ever won by now.

This data set contains historical data for 3,665 drawings dating from 1982 to 2018 (the data set can be seen through the link: https://www.kaggle.com/datascienceai/lottery-dataset).

In [4]:
import pandas as pd
lottery_df = pd.read_csv('649.csv')
lottery_df.shape

(3665, 11)

In [5]:
print(lottery_df.head(3))
print(lottery_df.tail(3))

   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   
2      649            3                0  6/26/1982               1   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   
2               6              23              24              27   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  
2              39            34  
      PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
3662      649         3589                0  6/13/2018               6   
3663      649         3590                0  6/16/2018               2   
3664      649         3591                0  6/20/2018              14   

     

Now that we have opened and explored the lottery data, we're going to write a function that will enable users to compare their ticket against the historial lottery data in Canada to determine whether their ticket would have every won by now. 

The engineering team told us that we need to be aware of the following details:
-Inside the app, the user inputs six different numbers from 1-49
-Under the hood, the six numbers will come as a Python list and serve as an input to our function
-The engineering team wants us to write a function that prints:
    -The number of times the combination selected occurred in the Canada data set
    -The probability of winning the big prize in thte next drawing with that combination.

##EXTRACTING THE NUMBERS AND CREATING THE FUNCTION

We will need to first extract all the winning six numbers from the historical data set as Python sets.

In [None]:
def extract_numbers(row):
    winning_numbers = set() #needs to be a set, otherwise it would sort the values via append()
    for i in range (4,10): #these are the columns that contain the winning numbers
        winning_numbers.add(row.iloc[i])
    return winning_numbers

lottery_df['winning_numbers'] = lottery_df.apply(extract_numbers,axis=1)
lottery_df.winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
Name: winning_numbers, dtype: object

##CHECKING HISTORIAL OCCURRENCE

Now that we have extracted all the winning number sets and saved them to another column, we need to determine whether the user's ticket has ever won.

In [None]:
def check_historical_occurrence(winning_numbers = lottery_df.winning_numbers):
    ticket = one_ticket_probability()
    numbers = set(ticket)
    matches = lottery_df.winning_numbers == numbers
    print('The combination {} has occurred {} time(s) previously'.format(numbers, matches.sum()))
    if matches.sum() == 0:
        print('The combination has never occurred before. This does not make it more likely to win in the next drawing.')
    print('********************************')
    print('Your chance to win the big prize in the next drawing with these numbers is still 0.0000072%')

check_historical_occurrence()


Please enter your 6 ticket numbers: 
*********************************


## MULTIPLE TICKET PROBABILITY

Lottery addicts usually play more than one ticket on a single drawing, believing that this might increase their chances of winning significantly. We have already written functions designed to show their winning chances with a single ticket as well as show the historical winnings associated with this single ticket. Our main purpose of the app is to help them better understand their chances of winning and we therefore need to create a function that will allow them to calculate the chances of winning for any number of difference tickets.

The engineering team has given us the following feedback:
-The user will input the number of different tickets they want to play
-Our function will see an integer between 1 and 13,983,816 (maximum number of different tickets)
-The function should print information about the probability of winning the big prize depending on the number of different tickets played


In [None]:
def multi_ticket_probability():
    total_outcomes = combinations(49,6)
    n = input('Enter the number of different tickets you are going to play: ')
    if int(n) in range (1,13983817):
        chances = int(n) * 100 / total_outcomes
        print('Your chances of winning by playing {} ticket(s) are {:.8f} %. '.format(n, chances))
    else:
        print('Please enter a valid and reasonable number of tickets')
    
multi_ticket_probability()


## SMALLER PRIZE PROBABILITY

So far, we have developed functions that allow users to determine their chances of winning the big prize by matching all six numbers. However, most lotteries have smaller prizes if the player's ticket matches, 2, 3, 4, or 5, of the six numbers drawn. Therefore, the user might want to know the probability of having 2,3,4, or 5 winning numbers.

The engineering details we need to be aware of:
-Inside the app, the user inputs:
    -six different numbers between 1 and 49; and
    -an integer between 2 and 5 that represents the number of winning numbers expected
-Our function prints information about the probability of having the inputted number of winning numbers.

To calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant behind the scenes, and we only need the integer between 2 and 5 representing the number of winning numbers expected.

In [None]:
def probability_less_6(n):
    n_combinations = combinations(6,n)
    n_combinations_remaining = combinations(43,6-n)
    successful_outcomes = n_combinations * n_combinations_remaining
    
    n_combinations_total = combinations(49,6)
    probability_percent = (successful_outcomes / n_combinations_total *100)
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    print('Your chances of having {} winning numbers with this ticket are {:.6f}%. \n In other words, you have a 1 in {} chance to win.'.format(n, probability_percent, int(combinations_simplified)))

for number in [2,3,4,5]:
    probability_less_6(number)
    print('****************************')
          
    

## NEXT STEPS

For the first version of the app, we coded four main functions:

one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set
multi_ticket_probability() — calculates the probability for any number of of tickets between 1 and 13,983,816
probability_less_6() — calculates the probability of having two, three, four or five winning numbers exactly
Possible features for a second version of the app include:

Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in lottery; for instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than winning the lottery")
Combining the one_ticket_probability() and check_historical_occurrence() to output information on probability and historical occurrence at the same time
Create a function similar to probability_less_6() which calculates the probability of having at least two, three, four or five winning numbers. Hint: the number of successful outcomes for having at least four winning numbers is the sum of these three numbers:
The number of successful outcomes for having four winning numbers exactly
The number of successful outcomes for having five winning numbers exactly
The number of successful outcomes for having six winning numbers exactly