# Mobile App for Lottery Addiction

Many people start playing lottery and gambling for fun but it turns into a habit and finally addiction which causes losing money and eventually engaging in desperate behaviors like theft. 

A medical institute wants to build a mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need help to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. [The data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

## Implementing two useful functions

Throughout the project, I'll need to calculate repeatedly probabilities and combinations. So I am going to start by writing two functions that I'll use often:

- A function that calculates **factorials**; and
- A function that calculates **combinations**.

In [1]:
# n! = n(n-1)(n-2)...2*1
def factorial(n):    
    result = 1
    for i in range(n, 0, -1):
        result *= i
    return result

# C(n, k) = n!/(k!*(n-k)!)
def combinations(n,k):
    return (factorial(n)/(factorial(k) * factorial(n-k)))

## The probability of winning the big prize

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn.

For the version of the app, a function is needed to calculate the probability of winning the big prize. The engineering team of the institute told us to write a function with the following specifications:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The function is requested to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [2]:
def one_ticket_probability(ticket):
    prob = (1/combinations(49, 6)) * 100
    message = "The chance to win the big prize with {} numbers is {:.7f}%."
    print(message.format(ticket, prob))
    return prob

p = one_ticket_probability([13, 22, 24, 27, 42, 44])

The chance to win the big prize with [13, 22, 24, 27, 42, 44] numbers is 0.0000072%.


The probability to win the big prize with one ticket including 6 numbers out of 49 numbers is the same for all the ticket and very small (approximately 0).

## Exploring the historical data

I am going to explore the historical data coming from the Canada 6/49 lottery. The data set can be downloaded from [Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset).

Let's open this data set and get familiar with its structure

In [3]:
import pandas as pd
data = pd.read_csv('649.csv')
data.shape

(3665, 11)

In [4]:
data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [5]:
data.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- NUMBER DRAWN 1
- NUMBER DRAWN 2
- NUMBER DRAWN 3
- NUMBER DRAWN 4
- NUMBER DRAWN 5
- NUMBER DRAWN 6

The engineering team wants us to write a function that prints:

- the number of times the combination selected occurred in the Canada data set; and
- the probability of winning the big prize in the next drawing with that combination.

In [6]:
# This function get in a row from the dataset
# and return a set of drawn numbers
def extract_numbers(row):
    result = set()
    for i in range(1, 7):
        rowname = 'NUMBER DRAWN {}'.format(i)
        result.add(row[rowname])
    return result

winning_numbers = data.apply(extract_numbers, axis = 1)
winning_numbers

0        {3, 41, 11, 12, 43, 14}
1        {33, 36, 37, 39, 8, 41}
2         {1, 6, 39, 23, 24, 27}
3         {3, 9, 10, 43, 13, 20}
4        {34, 5, 14, 47, 21, 31}
                  ...           
3660    {38, 40, 41, 10, 15, 23}
3661    {36, 46, 47, 19, 25, 31}
3662     {32, 34, 6, 22, 24, 31}
3663     {2, 38, 15, 49, 21, 31}
3664    {35, 37, 14, 48, 24, 31}
Length: 3665, dtype: object

In [7]:
# number of drawn numbers:
data.shape[0]

3665

In [8]:
# This function gets in a combination of numbers by a user 
# and a series of winning numbers. 
#It prints the number of occurrences of that numbers 
# and the chance of winning with those numbers.
def check_historical_occurence(user_numbers, winning_numbers):
    user_numbers_set = set(user_numbers)
    number_of_occurences = sum(winning_numbers == user_numbers_set)
    number_of_outcomes = data.shape[0]
    print(
        'The number of times that this combination of numbers {} occurred in the past is: {}'.
        format(user_numbers_set, number_of_occurences))
    print(
        'The chance of the big prize in the next drawing for this combination of numbers {} will be {:.2f}%.'.
        format(user_numbers_set, (number_of_occurences/number_of_outcomes)*100))
    pass

# Test the above function
check_historical_occurence([13, 22, 24, 27, 42, 44], winning_numbers)
print('**********************')
check_historical_occurence([2, 38, 15, 49, 21, 31], winning_numbers)    

The number of times that this combination of numbers {42, 44, 13, 22, 24, 27} occurred in the past is: 0
The chance of the big prize in the next drawing for this combination of numbers {42, 44, 13, 22, 24, 27} will be 0.00%.
**********************
The number of times that this combination of numbers {2, 38, 15, 49, 21, 31} occurred in the past is: 1
The chance of the big prize in the next drawing for this combination of numbers {2, 38, 15, 49, 21, 31} will be 0.03%.


Using these function, the user figures out what is his chance to win the big prize according to the historic data.  

## Multiple tickets probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

In [9]:
# This function calculates and prints the probability of winning 
# multiple tickets.
def multi_ticket_probability(number_of_tickets):
    number_of_outcomes = combinations(49, 6)
    prob = (number_of_tickets/number_of_outcomes)*100
    print('The chance of winning the big prize with {} tickets is {:.6f}%.'.
         format(number_of_tickets, prob))
    pass

# Test the above function.
test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for t in test:
    multi_ticket_probability(t)

The chance of winning the big prize with 1 tickets is 0.000007%.
The chance of winning the big prize with 10 tickets is 0.000072%.
The chance of winning the big prize with 100 tickets is 0.000715%.
The chance of winning the big prize with 10000 tickets is 0.071511%.
The chance of winning the big prize with 1000000 tickets is 7.151124%.
The chance of winning the big prize with 6991908 tickets is 50.000000%.
The chance of winning the big prize with 13983816 tickets is 100.000000%.


Since the total combinations of the numbers is a big number (13,983,816). The chance of winning by buying less than 100 tickets is almost 0. Buying 1 million tickets has the chance of 7% to win. So buying a few tickets does not increase the chance to win significantly than one ticket.

## Matching 2-5 numbers

In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

The engineering team has asked to write a function to allow the users to calculate probabilities for two, three, four, or five winning numbers (not at least two, three, ...).

These are the engineering details we'll need to be aware of:

- Inside the app, the user inputs:
    - six different numbers from 1 to 49; and
    - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

The function below calculates the probability that a player's ticket matches **exactly** the given number of winning numbers (no more and no less). The function will not return the probability of having at least five winning numbers. 

print_result is another input parameter that indicates if we want to print the result or not.

In [10]:
# This function calculates and prints the probability of having two, 
# three, four or five winning numbers exactly.
def probability_less_6(number_of_winning, print_result):
    winning_combinations = combinations(6, number_of_winning)
    remaining_combinations = combinations(43, 6-number_of_winning)
    successful_outcomes = winning_combinations * remaining_combinations
    total_outcomes = combinations(49, 6)
    prob = (successful_outcomes/total_outcomes)*100
    if print_result:
        print(
            'The chance of winning {} numbers with this ticket is {:.6f}%.'.
             format(number_of_winning, prob))
    return prob

# Test the above function
for t in range(2, 6):
    probability_less_6(t, True)

The chance of winning 2 numbers with this ticket is 13.237803%.
The chance of winning 3 numbers with this ticket is 1.765040%.
The chance of winning 4 numbers with this ticket is 0.096862%.
The chance of winning 5 numbers with this ticket is 0.001845%.


### Matching at least 2-5 numbers

It can also be helpful to implement a function that helps users to figure out What is the probability of winning at least 2 or 3 numbers. 

In [11]:
# calculates the probability of having at least two, 
# three, four or five winning numbers
def probability_at_least(number_of_winning):
    total_prob = (1/combinations(49, 6))*100
    for i in range(number_of_winning, 6):
        total_prob += probability_less_6(i, False)
    print(
        'The chance of winning at least {} numbers with this ticket is {:.6f}%.'.
         format(number_of_winning, total_prob))
    return total_prob

# Test the above function
for t in range(2, 6):
    probability_at_least(t)

The chance of winning at least 2 numbers with this ticket is 15.101557%.
The chance of winning at least 3 numbers with this ticket is 1.863755%.
The chance of winning at least 4 numbers with this ticket is 0.098714%.
The chance of winning at least 5 numbers with this ticket is 0.001852%.


## Summary and next steps:

For the first version of the app, some functions were requested to answer the following questions.

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

I coded the following functions. The first four functions are sufficient to answer the above questions. The last function gives a better understanding of the probability of winning.

- **one_ticket_probability()**: calculates the probability of winning the big prize with a single ticket
- **check_historical_occurrence()**: checks whether a certain combination has occurred in the Canada lottery data set
- **multi_ticket_probability()**: calculates the probability for any number of of tickets between 1 and 13,983,816
- **probability_less_6()**: calculates the probability of having two, three, four or five winning numbers exactly
- **probability_at_least()**:  calculates the probability of having at least two, three, four or five winning numbers

Possible features for a second version of the app include:

- Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in the lottery; for instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than winning the lottery")
- Combining the one_ticket_probability() and check_historical_occurrence() to output information on probability and historical occurrence at the same time