# Introduction: Mobile App for Lottery Addiction

### Scenario

There's a fictional medical institute looking to build a mobile app to help lottery addicts, and I have been tasked to devise some of the logic and calculations that will underpin the app. The lottery in this case is a 6/49 model (match 6 out of 49 to win the big prize).

Many gamblers develop a habit which in turn escalates into an addiction, causing severe issues within their personal lives. They may not know they have an issue until it is too late, and so this app is designed to better educate lottery addicts on their chances of winning in order to deter them.

More gambling information can be found here: https://www.nhs.uk/live-well/addiction-support/gambling-addiction/

### Learning objectives

Putting into practice theoretical and empirical probability concepts such as combinations and permutations to understand the probability of different lottery scenarios, such as:

 - Probability of winning with a single ticket
 - Probability of winning if we play 40 different tickets
 - Probability of having at least five (or four, three, two) winning numbers on a single ticket
 
### Dataset

As part of the analysis stage, I'll use historical (Canadian) lottery data, located here: https://www.kaggle.com/datascienceai/lottery-dataset

## Building the core functions

Here i'll build two functions, one to calculate factorials and the other to calculate combinations.


## <center>  Factorial formula
<img src="https://www.gstatic.com/education/formulas2/472522532/en/factorial.svg" alt="Factorial" />


<br>

## <center>  Combinations formula
<img src="https://uploads-cdn.omnicalculator.com/images/binomial-coefficient-formula.PNG" alt="Combinations" />

In [1]:
#function for factorials

def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

#function for combinations

def combinations(n, k):
    ff = factorial(n)
    kk = factorial(k)
    
    return ff/(kk * factorial(n - k))

## Probability of winning for any given ticket

In the first iteration of the app, I will build a function that will calculate the probability for any given ticket. The user experience will be:
 - they input six different numbers from 1 to 49
 - these will come as a python list, to serve as a single input to the function
 - function prints the probability, in a user-friendly and accessible format

In [2]:
#number of successful outcomes is 1
#P(E) = sucessful outcome / total possible outcomes

#creating the function
def one_ticket_probability(ticket_num):
    total_c = combinations(49, 6)
    ticket_p = 1/total_c
     
    print('Your chance of success with your ticket ' + '{}'.format(ticket_num)+ ' is:' + ' '+ '{:,.10%}'.format(ticket_p) + ',' 
          +' which is a 1 in ' + '' + '{:,}'.format(int(total_c)) + ' ' + 'chance of winning!')
    

In [3]:
one_ticket_probability([1, 2, 25, 33, 16, 34])

Your chance of success with your ticket [1, 2, 25, 33, 16, 34] is: 0.0000071511%, which is a 1 in 13,983,816 chance of winning!


## Analysis of historic data to determine past wins

Using the Canadian lottery dataset, i'll determine if the numbers that the user inputs has been successful previously. This will allow the user to compare their ticket against historic data to see if their numbers have ever won before. 

In [4]:
#import libraries
import pandas as pd

In [5]:
#load in data
lottery = pd.read_csv("649.csv")

In [6]:
#quick exploratory steps to find out date range and format of data
lottery.info()
lottery['DRAW DATE'].describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB


count          3665
unique         3591
top       6/23/2012
freq              4
Name: DRAW DATE, dtype: object

In [7]:
lottery.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


## Writing a function for checking the historical data against user's ticket

In this function, it will take the user's inputted numbers (six different numbers from 1 to 49) as a list, then prints the number of times the combination has ever won; then print the probability of winning again in the next with the same combination.

In [8]:
#function to take a row of the dataframe (4:10) and return a set of all winning numbers
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values) #create a set from the numbers
    return row

In [9]:
#extract out winners
winners = lottery.apply(extract_numbers, axis=1) #apply to all rows    

In [10]:
winners.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [11]:
#function to take user's numbers (as a list and the sets of winning numbers (as pd series))
def check_historical_occurence(user, win):
    user_set = set(user)
    count = win == user_set
    count_sum = count.sum()
    
    if count_sum == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_set, user))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_set, count_sum,
                                                                            user))    

In [12]:
#testing
example = [1, 2, 25, 33, 16, 34]
example2 = [3, 11, 12, 14, 41, 43] #known winner from list

In [13]:
check_historical_occurence(example, winners)

The combination {1, 2, 33, 34, 16, 25} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination [1, 2, 25, 33, 16, 34] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [14]:
check_historical_occurence(example2, winners)

The number of times combination {3, 41, 11, 12, 43, 14} has occured in the past is 1.
Your chances to win the big prize in the next drawing using the combination [3, 11, 12, 14, 41, 43] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-ticket probability
A known behavior of lottery addicts is to buy multiple tickets, in the belief that they are significantly increasing their chances of winning. Here I will create a function which will estimate the probability depending on the amount of tickets a user has purchased.

In [15]:
#function to calculate probability depending on amount of user tickets
#based on the logic of number of successful outcomes = number of tickets played
def multi_ticket_probability(x):
    total_c = combinations(49, 6)
    ticket_p = x/total_c
    
    if x == 1:
        print('Your chance of success with one ticket is:' + ' '+ '{:,.7%}'.format(ticket_p) + ',' 
          +' which is a ''{:,}'.format(x) + ' in '  + '{:,}'.format(int(total_c)) + ' ' + 'chance of winning!')
        
    else:    
        print('Your chance of success with ''{}'.format(x) + ' ' + 'number of tickets is:' + ' '+ '{:,.7%}'.format(ticket_p) + ',' 
          +' which is a ''{:,}'.format(x) + ' in '  + '{:,}'.format(int(total_c)) + ' ' + 'chance of winning!')

In [16]:
#testing new function
test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for i in test:
    multi_ticket_probability(i)    

Your chance of success with one ticket is: 0.0000072%, which is a 1 in 13,983,816 chance of winning!
Your chance of success with 10 number of tickets is: 0.0000715%, which is a 10 in 13,983,816 chance of winning!
Your chance of success with 100 number of tickets is: 0.0007151%, which is a 100 in 13,983,816 chance of winning!
Your chance of success with 10000 number of tickets is: 0.0715112%, which is a 10,000 in 13,983,816 chance of winning!
Your chance of success with 1000000 number of tickets is: 7.1511238%, which is a 1,000,000 in 13,983,816 chance of winning!
Your chance of success with 6991908 number of tickets is: 50.0000000%, which is a 6,991,908 in 13,983,816 chance of winning!
Your chance of success with 13983816 number of tickets is: 100.0000000%, which is a 13,983,816 in 13,983,816 chance of winning!


## Probability for less numbers (i.e. smaller prizes)
Many lotteries reward players with a smaller prize fund should they obtain two, three, four, or five winning numbers. The users of the app may also be interested in understand the probability of winning one of these lesser prizes. As such, i'll need to consider a user input of their 6 different numbers and an integer between 2-5 to represent the number of winning numbers.

### Background logic

Firstly, i'll need to understand and calculate probability of having exactly 5 winning numbers. 

As an example, i'll take 6 possible numbers from a lottery ticket. Using the combinations formula, i.e. 6 choose 5 = 6 (total combinations).

There are therefore 44 possible successful outcomes; 5 matched numbers out of a possible 49, meaning there are a remaining 44 numbers that can be the unmatched number (as remember, I am looking at the possibility of exactly 5 numbers). 

For example, if my matched numbers were: 1, 2, 3, 4, 5. there are 44 more numbers that can be the unmatched e.g. 1, 2, 3, 4, 5, **32**.

**However**, I need to exclude another number as each lottery ticket has 6 numbers a player has to pick, so the above is true for **at least** 5 numbers, but if we wanted to look at **only 5 matched numbers** we actually have 43 successful outcomes. (**note:** this would be 42 if you account for a bonus ball which allows an extra chance at matching a number **if** you have 5 matched numbers already e.g. 5/6 winning numbers and not the bonus).

*(this is the same for 4 or 3, 2 etc. matched numbers, as you're still choosing 6 numbers on the ticket)*

Since there is 6 five-number combinations, corresponding to 43 successful outcomes (of 5 matched numbers only), to find the total number of successful outcomes: **6 * 43 = 258** 

Therefore, the probability of **P(5-winning number)** = 258 / total possible outcomes = 0.00001845

In [109]:
#function to calculate depending on matched numbers
def probability_less_6(k):
    
    #total combinations for number of matched numbers
    ticket_c = combinations(6, k)
          

    total_c = combinations(49, 6)
    
    #possible winning tickets = (6 choose K) * (outcomes choose (6-K))
    successful_c = ticket_c * combinations(43, 6 - k)
    
    ticket_p = successful_c / total_c
    
    #as a percentage
    pct = ticket_p * 100
    combos = round(total_c/ successful_c)
    
    print('''You have a {:.6f}% chance to have {} matched numbers with with your ticket. 
    This is a 1 in {:,} chance to win.'''.format(pct, k, combos)) 

In [110]:
for i in [2, 3, 4, 5]:
    probability_less_6(i)
    print (' ')

You have a 13.237803% chance to have 2 matched numbers with with your ticket. 
    This is a 1 in 8 chance to win.
 
You have a 1.765040% chance to have 3 matched numbers with with your ticket. 
    This is a 1 in 57 chance to win.
 
You have a 0.096862% chance to have 4 matched numbers with with your ticket. 
    This is a 1 in 1,032 chance to win.
 
You have a 0.001845% chance to have 5 matched numbers with with your ticket. 
    This is a 1 in 54,201 chance to win.
 


### Modified to account for a bonus ball 5-match

In [111]:
#function to calculate depending on matched numbers
def probability_less_6_bonus(k):
    
    #total combinations for number of matched numbers
    ticket_c = combinations(6, k)
    
    #if statement to determine how many to remove from total outcomes
    #to account for a bonus number
    if k == 5:
        remove = 1
    else:
        remove = 0 

            
    remaining = 43 - remove
    total_c = combinations(49, 6)
    
    #possible winning tickets = (6 choose K) * (outcomes choose (6-K))
    successful_c = ticket_c * combinations(remaining, 6 - k)
    
    ticket_p = successful_c / total_c
    
    #as a percentage
    pct = ticket_p * 100
    combos = round(total_c/ successful_c)
    
    print('''You have a {:.6f}% chance to have {} matched numbers with with your ticket. 
    This is a 1 in {:,} chance to win.'''.format(pct, k, combos)) 

In [112]:
for i in [2, 3, 4, 5]:
    probability_less_6_bonus(i)
    print (' ')

You have a 13.237803% chance to have 2 matched numbers with with your ticket. 
    This is a 1 in 8 chance to win.
 
You have a 1.765040% chance to have 3 matched numbers with with your ticket. 
    This is a 1 in 57 chance to win.
 
You have a 0.096862% chance to have 4 matched numbers with with your ticket. 
    This is a 1 in 1,032 chance to win.
 
You have a 0.001802% chance to have 5 matched numbers with with your ticket. 
    This is a 1 in 55,491 chance to win.
 
