# Mobile App for Lottery Addiction

This project was set up with a fictional case, in which aimed at applying probability concepts that simulates a real-world scenerio.

CASE: <br>
Many people start playing the lottery for fun and gradually develop a habit, which evetually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spreading from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviours like theft and other criminal  activities. 

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities. For the 1st version of app, they want us to focus on the 6/49 lottery and build functions that enables users to answer questions as such:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least 5 (or 4, or 3, or 2) winning numbers on a single ticket?

The institute wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018. 

## Core Functions

We are going to write 2 functions that we'll be using frequently:
- *factorial()* - a function that calculates factorials
- *combinations()* - a function that calculates combinations

In [1]:
def factorial(n):
    final_product=1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

## One-ticket Probability

We need to build a function that calculates the probability of winning the big prize for any given ticket. For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers. 

In [11]:
def one_ticket_prob(user_numbers):
    n_combinations = combinations(49,6)
    probability_one_ticket = 1/n_combinations
    percentage_form =probability_one_ticket *100
    
    print('''Your chances to win the big prize with the numbers {} are {:.7f}%. 
In other words, you have a 1 in {:,} chances to win.'''.format(user_numbers, percentage_form, int(n_combinations)))

We now test a bit the function on 2 different outputs

In [12]:
test_input_1 =[2, 43, 22, 23, 11, 5]
one_ticket_prob(test_input_1)

Your chances to win the big prize with the numbers [2, 43, 22, 23, 11, 5] are 0.0000072%. 
In other words, you have a 1 in 13,983,816 chances to win.


## Historical Data Check for Canada Lottery

In [20]:
import pandas as pd

lottery_canada = pd.read_csv('649.csv')
lottery_canada.info()
lottery_canada.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


### Function for Historical Data Check

We will begin by extracting all the winning numbers from the lottery data set. The extract_numbers() function will go over each row of the dataframe and extract the 6 winning numbers as a Python set.

In [24]:
def extract_num(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_num = lottery_canada.apply(extract_num, axis=1)
winning_num.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [32]:
def check_historical_occurrence(user_numbers, historical_numbers):
    '''
    user_numbers: a Python List
    historical numbers: a pandas Series
    '''
    user_numbers_set = set(user_numbers)
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This does not mean it is more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.
        '''.format(user_numbers, user_numbers))
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences, user_numbers))

In [33]:
test_input_2 = [31,32,33,34,35,36]
check_historical_occurrence(test_input_2, winning_num)

The combination [31, 32, 33, 34, 35, 36] has never occured.
This does not mean it is more likely to occur now. Your chances to win the big prize in the next drawing using the combination [31, 32, 33, 34, 35, 36] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.
        


In [34]:
test_input_3 =[3, 41, 11, 12, 43, 14]
check_historical_occurrence(test_input_3, winning_num)

The number of times combination [3, 41, 11, 12, 43, 14] has occured in the past is 1.
Your chances to win the big prize in the next drawing using the combination [3, 41, 11, 12, 43, 14] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-ticket Probability

For the 1st version of the application, users should also be able to find the probability of winning if they play multiple different tickets. For instance, someone might intend to play 15 different tickets and they want to know the probability of winning the big prize.

In [37]:
def multi_ticket_probability(n_tickets):
    n_combinations = combinations(49,6)
    probability = n_tickets/n_combinations
    percentage_form = probability * 100
    
    if n_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%. 
In other words, you have a 1 in {:,} chances to win.'''.format(percentage_form, int(n_combinations)))
    else:
        combination_simplified = round(n_combinations/n_tickets)
        print('''Your chances to win the big prize with {:,} different tickets are {:.7f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_tickets,percentage_form, combination_simplified))

In [39]:
test_input_4 = [1, 20, 300, 4000, 50000, 600000]

for test in test_input_4:
    multi_ticket_probability(test)
    print('-'*30)

Your chances to win the big prize with one ticket are 0.000007%. 
In other words, you have a 1 in 13,983,816 chances to win.
------------------------------
Your chances to win the big prize with 20 different tickets are 0.0001430%.
In other words, you have a 1 in 699,191 chances to win.
------------------------------
Your chances to win the big prize with 300 different tickets are 0.0021453%.
In other words, you have a 1 in 46,613 chances to win.
------------------------------
Your chances to win the big prize with 4,000 different tickets are 0.0286045%.
In other words, you have a 1 in 3,496 chances to win.
------------------------------
Your chances to win the big prize with 50,000 different tickets are 0.3575562%.
In other words, you have a 1 in 280 chances to win.
------------------------------
Your chances to win the big prize with 600,000 different tickets are 4.2906743%.
In other words, you have a 1 in 23 chances to win.
------------------------------


## Less Winning Numbers Function

In most 6/49 lotteries, there are smaller prizes if a player's ticket match 2, 3, 4 or 5 of 6 numbers drawn. This means that in the 1st version of the app, we should find the probabilities of having 2, 3, 4, or 5 winning numbers. 

It should be taken into consideration that specific combination on the ticket is irrelevant and we only need the integer between 2 and 5 representing number of winning numbers expected. 

In [44]:
def prob_less_winning_num(n_winning_num):
    n_combinations_ticket = combinations(6, n_winning_num)
    n_combinations_remaining = combinations(43, 6 - n_winning_num)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    n_combinations_total = combinations(49,6)
    probability = successful_outcomes/n_combinations_total
    probability_percentage = probability * 100
    combinations_simplified = round(n_combinations_total/successful_outcomes)
    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_num, probability_percentage, int(combinations_simplified)))

In [45]:
for test in [2, 3, 4, 5]:
    prob_less_winning_num(test)
    print('-'*30)

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances to win.
------------------------------
Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances to win.
------------------------------
Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances to win.
------------------------------
Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances to win.
------------------------------


## CONCLUSION

We have coded 4 main functions:
- one_ticket_prob(): calculates the probability of winning the big prize with a single ticket.
- check_historical_occurrence(): checks whether a certain combination has occurred in the Canada lottery data set.
- multi_ticket_probability(): calculates the probability of any number of tickets between 1 and 13,983,816.
- prob_less_winning_num(): calculates the probability of having 2,3,4, or 5 winning numbers. 