# Creating a Mobile App for Lottery Addiction

In this project, we're going to contribute to the making of a mobile app to combat lottery addiction. For the first version of the app, a medical institute wants us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that provide insight on the probability of winning in various scenarios such as:
- Winning the big prize with a single ticket.
- Winning the big prize with 40 different tickets (or any number of tickets).
- Having at least a certain number (e.g. 5) winning numbers on a single ticket.

The institute also wants us to consider past data from the 6/49 lottery. We'll use a data set with data for 3,665 drawings, dating from 1982 to 2018. The documentation for it can be found [here](https://www.kaggle.com/datascienceai/lottery-dataset).

## Creating Core Functions

In the 6/49 lottery, six numbers are drawn, all ranging from 1 to 49, without replacement. Meaning, no set of numbers will have more than one of the same number within it. We'll start off by coding two functions:
- A function that calculates factorials.
- A function that calculates combinations.

In [1]:
# Creating our functions
def factorial(n):
    product = n
    for num in range(1, n):
        product *= num
    return product

def combinations(n, k):
    return factorial(n) / (factorial(k) * factorial(n - k))

These functions will help us create the next function we'll build--a function to calculate the probability of winning the big prize. Winning the big prize involves having all six numbers on a ticket match all six numbers drawn. If even one number differs from the winning set, the player with the ticket does not win.

## One-ticket Probability

We discussed details with the engineering team of the medical institute, and they told us several things we need to take into account:
- Inside the app, the user inputs six different numbers from 1 to 49.
- The six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a user-friendly way, one that people with no probability training could understand. 

In [2]:
# Writing the first function for our app
def one_ticket_probability(six_num):
    num_outcomes = combinations(49, 6) # Total number of outcomes
    chance = 1 / num_outcomes * 100 # The user inputs just one combination
    print('''The chances of winning the big prize with the numbers {} are {:.7f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(six_num,
                                                               chance, int(num_outcomes)))

# Testing it out
one_ticket_probability([1, 2, 3, 4, 5, 6])

The chance of winning the big prize with the numbers [1, 2, 3, 4, 5, 6] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Exploring the Data Set 

We created a function that tells the user their chances of winning the big prize with a single ticket. For the first version of the app, however, users should be able to compare their ticket against the historical lottery data in Canada and determine whether they would have won by now. Let's import the data set to get familiar with what we're working with.

In [3]:
# Reading in the data set
import pandas as pd
lottery = pd.read_csv('649.csv')

# Getting an initial view
lottery.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [4]:
lottery.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Creating the Function for Historical Data Check

Noting the previous details the engineering team wants us to consider, this next function needs to print:
- The number of times the combination inputted occurred in the Canada data set.
- The probability of winning the big prize in the next drawing with that combination. 

We'll start with a function to extract all the winning numbers from the lottery data set. The `extract_numbers()` function will run over each row in the data set to extract the six winning numbers as a Python set.

In [18]:
# Writing a function to extract the winning numbers as a set.
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

past_winning = lottery.apply(extract_numbers, axis = 1)
past_winning.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Below, we write the `check_historical_occurrences()` function that takes in the user's inputted numbers and the historical numbers. It prints information with regards to the number of historical occurrences and the probability of winning the next drawing.

In [15]:
def check_historical_occurrences(u_input, past_winning_nums):
    '''
    u_input: a Python list
    past_winning_nums: a pandas Series
    '''
    
    user_input = set(u_input)
    check_occurrence = (user_input == past_winning_nums) # Parentheses added to avoid confusion
    matches = sum(check_occurrence)
    
    if matches == 0:
        print('''The combination {} has never occurred. 
This doesn't make it more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.000072%.
In other words, you have a 1 in 13,983,816 chances of winning.'''.format(user_input, user_input))
        
    else:
        print('''The number of times {} has occurred in past drawings is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_input, matches,
                                                                     user_input))

In [19]:
# Using a known match to test the case of inputting a previous winning set.
test_set_1 = [3, 11, 12, 14, 41, 43]
check_historical_occurrences(test_set_1, past_winning)

The number of times {3, 41, 11, 12, 43, 14} has occurred in past drawings is 1.
Your chances to win the big prize in the next drawing using the combination {3, 41, 11, 12, 43, 14} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [23]:
# Using an unknown match to test the case of inputting a non-previously winning set.
test_set_2 = [6, 7, 21, 28, 41, 49]
check_historical_occurrences(test_set_2, past_winning)

The combination {6, 7, 41, 49, 21, 28} has never occurred. 
This doesn't make it more likely to occur now. Your chances to win the big prize in the next drawing using the combination {6, 7, 41, 49, 21, 28} are 0.000072%.
In other words, you have a 1 in 13,983,816 chances of winning.


## Multi-ticket Probability

Lottery addicts usually play more than one ticket on a single drawing, with the idea that doing this will significantly increase their chances of winning. To help them better estimate their chances of winning, we'll write a function that allows users to calculate the chances of winning with any number of different tickets.

We talked with the engineering team and they gave us this information:
- The user will input the number of *different* tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer ranging from 1 to 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize based on the number of different tickets played.

In [49]:
def multi_ticket_probability(num_tickets):
    total_outcomes = combinations(49, 6)
    successful_outcomes = num_tickets
    chances = successful_outcomes / total_outcomes * 100
    
    if num_tickets == 1:
        print('''Your chances of winning the big prize with the one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances of winning.'''.format(chances, int(total_outcomes)))
        
    else:
        combinations_simplified = round(total_outcomes / num_tickets)
        print('''Your chances of winning the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chances of winning.'''.format(num_tickets, chances, 
                                                               combinations_simplified))

Below, we run a few tests to test our function.

In [50]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('--------------------') # This will separate the outputs

Your chances of winning the big prize with the one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chances of winning.
--------------------
Your chances of winning the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chances of winning.
--------------------
Your chances of winning the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chances of winning.
--------------------
Your chances of winning the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chances of winning.
--------------------
Your chances of winning the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chances of winning.
--------------------
Your chances of winning the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chances of winning.
--------------------
Your chances of winning the big pr

## Fewer Winning Numbers
Most 6/49 lotteries have smaller prizes if a player's ticket matches two, three, four, or five of the six numbers drawn. Users might want to know the probability of having two, three, four, or five winning numbers. The engineering team told us these things we need to take into account:
- Inside the app, the user inputs:
    - six different numbers from 1 to 49.
    - an integer between 2 and 5 that represents the number of winning numbers expected.
- Our function prints information about the probability of having the inputted number of winning numbers.

In [51]:
def probability_below_6(num_winning):
    n_combinations_ticket = combinations(6, num_winning)
    n_combinations_remaining = combinations(43, 6 - num_winning)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    total_outcomes = combinations(49, 6)
    chances = successful_outcomes / total_outcomes * 100
    combinations_simplified = round(total_outcomes / successful_outcomes)
    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances of winning.'''.format(num_winning, chances, 
                                                               int(combinations_simplified)))

Now, let's test the function on all four possible inputs.

In [52]:
for test_input in [2, 3, 4, 5]:
    probability_below_6(test_input)
    print('--------------------') # This will separate the outputs

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances of winning.
--------------------
Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances of winning.
--------------------
Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances of winning.
--------------------
Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances of winning.
--------------------


## Next Steps

With that, all the function requirements for the app are met. If the medicial institute wanted to pursue a second version of the app, possible features could include:
- Adding fun analogies to make the outputs easier to understand (for instance, we can find probabilities for strange events and compare with the chances of winning in the lottery; e.g. 'You are 10 times more likely to be struck by lightning than win the lottery.')
- Creating a function similar to `probability_below_6()` that would calculate the probability of having *at least* two, three, four, or five winning numbers.