# Guided Project: Mobile App for Lottery Addiction
In this project, we are going to contribute to the development of a mobile app by writing a couple of functions that are mostly focused on calculating probabilities. The app is aimed to both prevent and treat lottery addiction by helping people better estimate their chances of winning.

We will take as a reference the 6/49 lottery, a typical format from Canada, for which we coveniently have a dataset available:

**1.** *What is the probability of winning the big prize with a single ticket?*  
**2.** *What is the probability of winning the big prize if we play 40 different tickets (or any other number)?*  
**3.** *What is the probability of having at least five (or four, or three) winning numbers on a single ticket?*  

The scenario we're following throughout this project is fictional — the main purpose is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.

## Core Functions  
Below, we're going to write two functions that we'll be using frequently:

- `factorial()` — a function that calculates factorials  
- `combinations()` — a function that calculates combinations


In [1]:
'''Factorial function.
This will be used as a basis to calculate Permutations, that are combinations:
- where ORDER OF OUTCOMES MATTERS
- where REPETITION IS NOT ALLOWED (sampling without replacement)
'''

def factorial(n):
    result = 1
    for i in range(n,0,-1):
        result *= i
    return result

'''
Combination function.
This will be used to calculate the number of Combinations, alas, the total unique arrangements:
- where ORDER OF OUTCOMES DOES NOT MATTER
- where REPETITION IS NOT ALLOWED (sampling without replacement)

The function is designed to support the selection of a subset of numbers.
'''

def combination(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator / denominator

In [2]:
#find total number of possible permutations of a 5-card draw, out of a 52 card deck
print("Total number of possible permutations of a 5 card draw, out of a 52 card deck is", factorial(52) / factorial(52-5))

#find total number of 5 card combinations in a 52 card deck
print("Total number of possible combinations (that's permutations where we don't care about order) is ", combination(52,5))

Total number of possible permutations of a 5 card draw, out of a 52 card deck is 311875200.0
Total number of possible combinations (that's permutations where we don't care about order) is  2598960.0


In [3]:
#calculate probability of winning the big prize, as a 6 number play vs a total of 49 numbers. Sample with replacement
#any number drawn cannot be drawn two times

def one_ticket_probability(numbers):
    total_possible_outcomes = round(combination(49,6))
    favorable_outcome = 1
    win_probability = favorable_outcome / total_possible_outcomes
    p_win = win_probability
    print('''Your chance to win the the lottery with your numbers {0} is {1:.7%}.
In other words, you have a 1 in {2:,} chances to win.'''.format(numbers, p_win, total_possible_outcomes))

one_ticket_probability([1,2,3,4,5,6])
print('\n')
one_ticket_probability([7,8,9,10,11,12])


Your chance to win the the lottery with your numbers [1, 2, 3, 4, 5, 6] is 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


Your chance to win the the lottery with your numbers [7, 8, 9, 10, 11, 12] is 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


The aim here is to give an idea of the likelihood to win in the 6/49 lottery. This is done by leveraging on the written functions and with the rules of the game, that is:
- To win, you need to nail all the 6 extracted numbers
- The order in which the numbers are extracted is not relevant (so you shall not consider the total permutations, but simply the combinations)
- The extraction is done without replacement

Also, it's worthwile considering that whatever your numbers, you still have **the same probability of winning by playing a single ticket.**

## Revisiting 6/49 history
To make things more personal, we would like to understand if any of the played combinations was ever a winning one in the past history of 649. Hence, we'll leverage on a convenient dataset available on [kaggle](https://www.kaggle.com/datascienceai/lottery-dataset/data).

In [4]:
import pandas as pd
df = pd.read_csv('649.csv')
print(df.shape)

df.head(10)

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45
5,649,6,0,7/17/1982,8,20,21,25,31,41,33
6,649,7,0,7/24/1982,18,25,28,33,36,42,7
7,649,8,0,7/31/1982,7,16,17,31,40,48,26
8,649,9,0,8/7/1982,5,10,23,27,37,38,33
9,649,10,0,8/14/1982,4,15,30,37,46,48,3


We have a handful of prerequisites to take into account while writing the `check_historical_occurence()` function.
- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.

The engineering team wants us to write a function that prints:
- the number of times the combination selected occurred in the Canada data set; and
- the probability of winning the big prize in the next drawing with that combination.

In [5]:
df.iloc[:,4:-1]

Unnamed: 0,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6
0,3,11,12,14,41,43
1,8,33,36,37,39,41
2,1,6,23,24,27,39
3,3,9,10,13,20,43
4,5,14,21,31,34,47
...,...,...,...,...,...,...
3660,10,15,23,38,40,41
3661,19,25,31,36,46,47
3662,6,22,24,31,32,34
3663,2,15,21,31,38,49


In [6]:
#prepare a function to be applied to the whole dataframe and retrieve, for each row, the winning numbers
#it's important to use sets in the way they are NON-ORDERED LISTS WITH NO DUPLICATES
def extract_numbers(row):
    row = row[4:-1]
    return set(row)

winning_numbers = df.apply(extract_numbers, axis = 1) #apply the function along columns, that is, to each row
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Let's now write the full function to check historical occurrencies. 

In [7]:
def check_historical_occurrence(user_input, historical_series):
    user_input = set(user_input)
    
    #number of times combination has appeared in the past
    check_count = user_input == historical_series #this compares the user input with ALL the extractions in the past. retuns a boolean series
    times_won = sum(check_count) #sums true values
    
    if times_won == 0:
        print(f'Your combination has won {times_won} times in the past. This does not mean that you have more chances to win by playing it today.')
    else:
        print(f'Your combination has won {times_won} times in the past. Want to try your luck?')
    one_ticket_probability(user_input)

#--------------------------    
check_historical_occurrence([3, 41, 11, 12, 43, 14], winning_numbers)

print('\n')

check_historical_occurrence([1, 2, 3, 4, 5, 6], winning_numbers)

Your combination has won 1 times in the past. Want to try your luck?
Your chance to win the the lottery with your numbers {3, 41, 11, 12, 43, 14} is 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


Your combination has won 0 times in the past. This does not mean that you have more chances to win by playing it today.
Your chance to win the the lottery with your numbers {1, 2, 3, 4, 5, 6} is 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


# Addressing multiple tickets
In light of the fact that lottery addicts have the tendency to play more than one ticket thinking that it will raise their chances to win significantly, we want to write a function to address this specific case.  

The requisites to write this function are the following ones:
- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [8]:
def multi_ticket_probability(number_of_tickets):
    #Calculate probability of winning according to the number of tickets
    possible_outcomes = round(combination(49,6))
    p_to_win = number_of_tickets / possible_outcomes     
    
    print('Your chance to win the big prize by playing {:,} tickets is equal to {:.7%}'.format(number_of_tickets, p_to_win))
    
    simplified_outcomes = round(possible_outcomes / number_of_tickets)
    print(f'In other words, you have 1 chance in {simplified_outcomes:,}.')
    

In [9]:
test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for i in test:
    multi_ticket_probability(i)
    print('-----------')

Your chance to win the big prize by playing 1 tickets is equal to 0.0000072%
In other words, you have 1 chance in 13,983,816.
-----------
Your chance to win the big prize by playing 10 tickets is equal to 0.0000715%
In other words, you have 1 chance in 1,398,382.
-----------
Your chance to win the big prize by playing 100 tickets is equal to 0.0007151%
In other words, you have 1 chance in 139,838.
-----------
Your chance to win the big prize by playing 10,000 tickets is equal to 0.0715112%
In other words, you have 1 chance in 1,398.
-----------
Your chance to win the big prize by playing 1,000,000 tickets is equal to 7.1511238%
In other words, you have 1 chance in 14.
-----------
Your chance to win the big prize by playing 6,991,908 tickets is equal to 50.0000000%
In other words, you have 1 chance in 2.
-----------
Your chance to win the big prize by playing 13,983,816 tickets is equal to 100.0000000%
In other words, you have 1 chance in 1.
-----------


The code above is used to assess probability to win for multiple tickets being used. Something that shall be considered as well is that we consider that every unique combination is played just once by the player (it's rather obvious but it's worh specifying).


# Winning smaller prizes
The last case we shall explore is that of considering smaller prizes: it's not uncommon to have this scenario if a user guesses less than the full jackpot of 6 numbers. In light of this we'll calculate a function to address this specific case.

Points to consider and address.
- The function will store:
    - Six different numbers from 1 to 49, as the user combination;
    - an integer between 2 and 5 that represents the number of winning numbers expected
- The function will print out the information about the probability of having the inputted number of winning numbers.

## How to approach the problem? Combinatronics.
1. For a 6-numbered ticket, we need to calculate the probability of nailing EXACTLY n numbers.  
**This means that at the same time there will be n winning and 6-n NON winning numbers.**
  

2. *Calculating the combinations of n winning numbers, where n = 5.*  
Assuming we play {1,2,3,4,5,6}, in how many ways can we arrange these numbers to make sure we pick 5 (that we will consider winners?). In this case we can see that the **possible combinations are 6, as per (6 choose 5)**.  
{1,2,3,4,5}, {1,2,3,4,6}, {1,2,3,5,6}, {1,2,4,5,6}, {1,3,4,5,6}, {2,3,4,5,6}.  

  
3. *Calculating the combinations of (6-n) losing numbers.*  
In a 49 ticket lottery, this leaves us out with **43 remaining numbers that are going to be WRONG**. In this specific case it's easy to understand that we only have 43 choose 1 combinations of non-winning numbers. **In a more generic form, it's (6-n).**
    
    
4. *Finding total successful outcomes*  
To find the total number of successful outcomes, we need to multiply `winning_combinations` * `losing_combinations`. This will output the total successful outcomes.  


5. *Finding probability of getting exactly 5 numbers out of 6*
Like every generic probability problem, it will be `successful_outcomes` / `possible_outcomes`, where possible_outcomes is calculated as the combinations of 49 choose 6.

In [10]:
def probability_less_6(n_winning_numbers):
    '''
    Calculate the number of available combinations for n numbers out of 6. 
    In how many ways can n numbers be combinated out of 6 numbers played in my ticket? a 6 choose k example.
    This assumes that we are choosing WINNING numbers.
    '''
    n_winning_combinations = combination(6,n_winning_numbers)
    
    '''
    Out of 49 possible numbers, you will always choose 6 of them considering the lottery rules.
    This leaves you with 43 numbers that are NOT drawn by the winning ticket.
    
    What we really want to find is the number of possible combinations with n winning AND (6-n) wrong numbers.
    
    To calculate all combinations of numbers NOT in a set of n winning numbers, we can do combination(43, 6-n).
    This will retrieve all the non-ordered combinations of LOSING NUMBERS. 
    ''' 
    n_losing_combinations = combination(43, 6 - n_winning_numbers)
    
    '''At last, we can calculate the total number of cases for these events to happen together by multiplying them.'''
    
    successful_outcomes = round(n_winning_combinations * n_losing_combinations)
    total_possible_outcomes = combination(49,6) 
    
    p_to_win = successful_outcomes / total_possible_outcomes
    p_to_win_base_1 = round(1/p_to_win)
    

    print('''The probability of drawing a ticket with EXACTLY {0} winning numbers is {1:.4%}.
    In other words, you have one chance in {2:,}.'''.format(n_winning_numbers,p_to_win,p_to_win_base_1))
    

In [11]:
for num in [2,3,4,5]:
    probability_less_6(num)
    print('\n')

The probability of drawing a ticket with EXACTLY 2 winning numbers is 13.2378%.
    In other words, you have one chance in 8.


The probability of drawing a ticket with EXACTLY 3 winning numbers is 1.7650%.
    In other words, you have one chance in 57.


The probability of drawing a ticket with EXACTLY 4 winning numbers is 0.0969%.
    In other words, you have one chance in 1,032.


The probability of drawing a ticket with EXACTLY 5 winning numbers is 0.0018%.
    In other words, you have one chance in 54,201.




# Key takeaway
Don't play the lottery.