# Mobile App for Lottery Addiction

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the [6/49](https://en.wikipedia.org/wiki/Lotto_6/49) lottery and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

Throughout this project, we'll need to calculate repeatedly probabilities and combinations. So we'll start by writing two functions that we'll use often:
- A function that calculates factorials
- A function that calculates combinations

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

In [1]:
def factorial(n):
    result = 1
    for i in range(n,0,-1):
        result *= i
    return result

def combinations(n, k):
    return factorial(n)/(factorial(k)*(factorial(n-k)))

We will now calculate the probability of winning the big prize in the 6/49 Lottery, in which the player wins if all six numbers on their ticket match the six numbers drawn, in any order.

In the app, the user will input six different numbers from 1 to 49. Under the hood, the numbers will come as a python list, which will serve as the single input to our function. The engineering team wants the function to print the probability in a way in which people without probability training will understand.

In [7]:
def one_ticket_probability(nums):
    num_combinations = combinations(49,6)
    prob_pct = 100/n_combinations
    
    print('''Your chances to win the big prize with the numbers {} are {:.7f}%.
In other words, you have a 1 in {:,} chance of winning.'''.format(nums,
                    prob_pct, int(num_combinations)))

In [8]:
# Test one_ticket_probability:

one_ticket_probability([32,25,20,3,42,1])

Your chances to win the big prize with the numbers [32, 25, 20, 3, 42, 1] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chance of winning.


Here, the number of possible combinations in any order is given by our combination function. The probability of any single ticket winning is given by 1 / the number of combinations, and I multiplied this by 100 to return a percent.

### Comparing with historical data

For the first version of the app, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now. The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018.

I will open the dataset and explore it for now.

In [24]:
import pandas as pd
lottery = pd.read_csv('649.csv')
print(lottery.shape)
print(lottery.head(3))
print(lottery.tail(3))

(3665, 11)
   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   
2      649            3                0  6/26/1982               1   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   
2               6              23              24              27   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  
2              39            34  
      PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
3662      649         3589                0  6/13/2018               6   
3663      649         3590                0  6/16/2018               2   
3664      649         3591                0  6/20/2018              1

We're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now. The engineering team wants us to write a function that prints the number of times the combination selected occurred in the Canada data set, and the probability of winning the big prize in the next drawing with that combination.

In [32]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row


def check_historical_occurance(user_list, winning_nums):
    user_set = set(user_list)
    check_win = winning_nums == user_set
    num_wins = check_win.sum()
    
    num_combinations = combinations(49,6)
    prob_pct = 100/num_combinations
        
    if num_wins == 0:
        print('''The combination {} has never occured. Your chances to win the big prize in the next drawing using the combination {} are {:.7f}%, which is a 1 in {:,} chance of winning.'''.format(user_list, user_list, prob_pct, num_combinations))
        
    else:
        print('''The number of times the combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are {:.7f}%, which is a 1 in {:,} chance of winning.'''.format(user_list, num_wins, user_list, prob_pct, num_combinations))
    

In [39]:
winning_nums = lottery.apply(extract_numbers, axis=1)

from random import randint
for i in range(6):
    print('Attempt ', i+1)
    nums = []
    for j in range(6):
        nums.append(randint(1,49))

    check_historical_occurance(nums, winning_nums)
    print('\n')

Attempt  1
The combination [13, 34, 9, 24, 4, 8] has never occured. Your chances to win the big prize in the next drawing using the combination [13, 34, 9, 24, 4, 8] are 0.0000072%, which is a 1 in 13,983,816.0 chance of winning.


Attempt  2
The combination [6, 1, 31, 25, 35, 21] has never occured. Your chances to win the big prize in the next drawing using the combination [6, 1, 31, 25, 35, 21] are 0.0000072%, which is a 1 in 13,983,816.0 chance of winning.


Attempt  3
The combination [30, 24, 30, 34, 18, 45] has never occured. Your chances to win the big prize in the next drawing using the combination [30, 24, 30, 34, 18, 45] are 0.0000072%, which is a 1 in 13,983,816.0 chance of winning.


Attempt  4
The combination [3, 47, 42, 25, 49, 3] has never occured. Your chances to win the big prize in the next drawing using the combination [3, 47, 42, 25, 49, 3] are 0.0000072%, which is a 1 in 13,983,816.0 chance of winning.


Attempt  5
The combination [12, 29, 36, 38, 33, 47] has never 

In these last two steps, I first created a function to extract the winning numbers as a set, and another function to check if a user-given list was in any of the winning numbers. I then tested it 6 different times using random numbers from 1 to 49.

I will now calculate the probability of a user winning with multiple tickets.

In [44]:
def multi_ticket_probability(num_tickets):
    num_outcomes = combinations(49,6)
    probability = num_tickets / num_outcomes
    prob_pct = probability * 100
    print('''Using {} tickets, your probability of winning is {:.7f}%, which is a 1 in {} chance of winning'''.format(
    num_tickets, prob_pct, int(num_outcomes // num_tickets)))
    
test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for i in test:
    multi_ticket_probability(i)

Using 1 tickets, your probability of winning is 0.0000072%, which is a 1 in 13983816 chance of winning
Using 10 tickets, your probability of winning is 0.0000715%, which is a 1 in 1398381 chance of winning
Using 100 tickets, your probability of winning is 0.0007151%, which is a 1 in 139838 chance of winning
Using 10000 tickets, your probability of winning is 0.0715112%, which is a 1 in 1398 chance of winning
Using 1000000 tickets, your probability of winning is 7.1511238%, which is a 1 in 13 chance of winning
Using 6991908 tickets, your probability of winning is 50.0000000%, which is a 1 in 2 chance of winning
Using 13983816 tickets, your probability of winning is 100.0000000%, which is a 1 in 1 chance of winning


In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers. So, I am going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

To accomplish this, I will first find the number of combinations of n winning numbers out of 6 total nummbers by using combinations(6,n). The number of combinations that have these n winning numbers is given by 43 choose (6-n), where there are 49-6 = 43 other possible numbers and 6-n spots where they can appear. The total number of successful outocmes will then be given by the product of these two values. I will then find the probability of this by dividing by the total number of possible outcomes, 49 choose 6. 

In [47]:
def probabilities_less_6(n):
    num_comb_tic = combinations(6,n)
    num_comb_remaining = combinations(43, 6 - n) #43 because 49 - 6 numbers picked = 43 remaining numbers
    num_succ_outcomes = num_comb_tic * num_comb_remaining
    
    total_comb = combinations(49,6)
    probability = num_succ_outcomes / total_comb
    prob_pct = probability * 100
    
    print('''Your chances of having {} winning numbers with this ticket are {:.5f}%, which is a 1 in {:,} chances to win.'''.format(
        n, prob_pct, int(total_comb // num_succ_outcomes)))
    
for i in range(2,6):
    probabilities_less_6(i)
    

Your chances of having 2 winning numbers with this ticket are 13.23780%, which is a 1 in 7 chances to win.
Your chances of having 3 winning numbers with this ticket are 1.76504%, which is a 1 in 56 chances to win.
Your chances of having 4 winning numbers with this ticket are 0.09686%, which is a 1 in 1,032 chances to win.
Your chances of having 5 winning numbers with this ticket are 0.00184%, which is a 1 in 54,200 chances to win.


There is a decently high chance of picking two winning numbers, and a relatively high chance of picking three or four.