# Mobile App for Lottery Addiction

In this project we will be contributing to the development of a mobile app by writing some functions that are mainly focused on probability calculation. By helping people to better estimate their chances of winning, the app aims to both prevent and treat lottery addiction.

A medical institute that specializes in treating gambling addiction came up with the idea for the app. The institute already has an engineering team to build the app, however, it needs us to create the logical core of the app and do the probability calculations. For the first version of the app, the institute wants us to focus on the 6/49 lotto and create functions that can answer the following questions for users:

* Given a single ticket, what is the probability of winning the big prize?
* If we play 40 different tickets (or any other number), what is the probability of winning the big prize?
* How likely are we to get at least five (or four or three) winning numbers on one ticket?
* The scenario we'll be following throughout this project is fictional - the main purpose is to practice the application of probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.

## Core Functions

Below we'll write two functions that we'll use frequently:

* `factorial()` - a function that calculates the factorial of a number

* `combinations()` - a function for the calculation of combinations

In [4]:
def factorial(n):
    if n == 0 or n == 1:
        return 1
    return n * factorial(n - 1)
def combinations(n, k):
    return int(factorial(n) / (factorial(k) * factorial(n - k)))

## One-ticket Probability

We need to build a function that computes the probability of winning the grand prize for any given ticket. For each drawing, six numbers are drawn from a set of 49, and a person will win the grand prize if the six numbers on their ticket match all six numbers.

The engineering team told us that we needed to be aware of the following details when writing the function:

* Inside the application, the user will type in six different numbers ranging from 1 to 49.
* Under the hood, the six numbers come as a Python list and are used to input our function.
* The engineering team wants the function to print out the probability value in a friendly way - in a way that people with no probability training can comprehend.

Here's how we write the `one_ticket_probability()` function, which takes a list of six unique numbers and prints out the probability of winning in an easy-to-understand way.

In [5]:
def one_ticket_probability(user_numbers):
    
    n_combinations = combinations(49, 6)
    probability_one_ticket = 1 / n_combinations
    percentage_form = probability_one_ticket * 100
    
    print('''Your chances to win the big prize with the numbers {} are {:.8f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(user_numbers,
                    percentage_form, int(n_combinations)))

Now, let's test the function.

In [6]:
test_input_1 = [2, 15, 30, 23, 3, 5]
one_ticket_probability(test_input_1)

Your chances to win the big prize with the numbers [2, 15, 30, 23, 3, 5] are 0.00000715%.
In other words, you have a 1 in 13,983,816 chances to win.


## Historical Data Check for Canada Lottery

The data from the national 6/49 lottery game in Canada is also something the Institute would like us to consider. The dataset (which can be downloaded [here](https://www.kaggle.com/datascienceai/lottery-dataset)) contains historical data for 3,665 drawings from 1982 to 2018.

In [7]:
import pandas as pd

lottery_canada = pd.read_csv('D:/649.csv')
lottery_canada.shape

(3665, 11)

In [8]:
lottery_canada.head(10)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45
5,649,6,0,7/17/1982,8,20,21,25,31,41,33
6,649,7,0,7/24/1982,18,25,28,33,36,42,7
7,649,8,0,7/31/1982,7,16,17,31,40,48,26
8,649,9,0,8/7/1982,5,10,23,27,37,38,33
9,649,10,0,8/14/1982,4,15,30,37,46,48,3


## Function for Historical Data Check

The engineering team tells us that we need to write a function that will help users determine whether or not they would have ever won if they had used a particular combination of six numbers in the past. Here's what we need to know:

* In our application, users enter six different numbers from 1 to 49.
* Behind the scenes, the six numbers arrive as a Python list and act as input to our function.
* We need to write a function to print:
    * the number of occurrences of the selection combination; and
    * the probability that you will win the big prize in the next drawing with that combination of numbers.

We'll start by extracting all the winning numbers from the lottery dataset. `extract_numbers()` iterates over each row of the dataframe and extracts the six winning numbers as Python sets.

In [10]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_numbers = lottery_canada.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In the following, we write the `check_historical_occurrence()` function, which takes the user numbers and the historical numbers and prints out information about the number of occurrences and the probability of winning in the next drawing.

In [11]:
def check_historical_occurrence(user_numbers, historical_numbers):   

    '''
    user_numbers: a Python list
    historical numbers: a pandas Series
    '''
    
    user_numbers_set = set(user_numbers)
    check_occurrence = historical_numbers == user_numbers_set
    n_occurrences = check_occurrence.sum()
    
    if n_occurrences == 0:
        print('''The combination {} has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, user_numbers))
        
    else:
        print('''The number of times combination {} has occured in the past is {}.
Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences,
                                                                            user_numbers))

In [12]:
test_input_3 = [2, 56, 23, 76, 1, 51]
check_historical_occurrence(test_input_3, winning_numbers)

The combination [2, 56, 23, 76, 1, 51] has never occured.
This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination [2, 56, 23, 76, 1, 51] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [13]:
test_input_4 = [34, 5, 14, 47, 21, 31]
check_historical_occurrence(test_input_4, winning_numbers)

The number of times combination [34, 5, 14, 47, 21, 31] has occured in the past is 1.
Your chances to win the big prize in the next drawing using the combination [34, 5, 14, 47, 21, 31] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-ticket Probability

Users should also be able to find the probability of winning when playing several different tickets for the first version of the app. For example, someone could play 15 different tickets and want to know the odds on winning the grand prize.

When writing the function, the engineering team wants us to be aware of the following details

* The user will be entering the number of different tickets that he or she wants to play (without entering the specific combinations that he or she wants to play).
* We want the function to return a whole number from 1 to 13,983,816 (the maximum number of different tickets).
* Depending on the number of different tickets played, the function should print information about the probability of winning the big prize.

The `multi_ticket_probability()` function below takes the number of tickets and prints out information about the probability of winning depending on the input.

In [14]:
def multi_ticket_probability(n_tickets):
    n_combinations = combinations(49, 6)
    
    probability = n_tickets / n_combinations
    percentage_form = probability * 100
    
    if n_tickets == 1:
        print('''Your chances to win the big prize with one ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(percentage_form, int(n_combinations)))
    
    else:
        combinations_simplified = round(n_combinations / n_tickets)   
        print('''Your chances to win the big prize with {:,} different tickets are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_tickets, percentage_form,
                                                               combinations_simplified))

Let's run some tests for function.

In [15]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter

Your chances to win the big prize with one ticket are 0.000007%.
In other words, you have a 1 in 13,983,816 chances to win.
------------------------
Your chances to win the big prize with 10 different tickets are 0.000072%.
In other words, you have a 1 in 1,398,382 chances to win.
------------------------
Your chances to win the big prize with 100 different tickets are 0.000715%.
In other words, you have a 1 in 139,838 chances to win.
------------------------
Your chances to win the big prize with 10,000 different tickets are 0.071511%.
In other words, you have a 1 in 1,398 chances to win.
------------------------
Your chances to win the big prize with 1,000,000 different tickets are 7.151124%.
In other words, you have a 1 in 14 chances to win.
------------------------
Your chances to win the big prize with 6,991,908 different tickets are 50.000000%.
In other words, you have a 1 in 2 chances to win.
------------------------
Your chances to win the big prize with 13,983,816 different ti

## Less Winning Numbers — Function

In most of the 6/49 lotteries, there are smaller prizes available if a player's ticket matches two, three, four, or five of the six numbers that are drawn. That means that players may want to know the probability of matching 2, 3, 4, or 5, and the first version of the app should allow users to find that probability.

When we write a function to calculate these probabilities, we need to be aware of these details:

* Inside the app, the user enters
    * six different numbers, from 1 to 49; and
    * an integer between 2 and 5, representing the expected winning numbers
* Our function will print out information about the probability of having a certain number of winning numbers in the game.

In order to calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant, and that all we need is the integer between 2 and 5 that represents the number of expected winning numbers. As a result, we are going to write a function called `probability_less_6()` that takes an integer and prints out information about the chances of winning as a function of the value of that integer.

The following function computes the probability that a player's ticket matches exactly the given number of winning numbers. The function will return the probability of having exactly five winning numbers (no more and no less) if the player wants to know the probability of having five winning numbers. The probability of having at least five winning numbers will not be returned by the function.

In [45]:
def probability_less_6(n_winning_numbers):
    
    n_combinations_ticket = combinations(6, n_winning_numbers)
    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combinations_simplified = round(n_combinations_total/successful_outcomes)    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.
In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,
                                                               int(combinations_simplified)))


Now, let's test the function on all the three possible inputs.

In [46]:
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------') # output delimiter

Your chances of having 2 winning numbers with this ticket are 13.237803%.
In other words, you have a 1 in 8 chances to win.
--------------------------
Your chances of having 3 winning numbers with this ticket are 1.765040%.
In other words, you have a 1 in 57 chances to win.
--------------------------
Your chances of having 4 winning numbers with this ticket are 0.096862%.
In other words, you have a 1 in 1,032 chances to win.
--------------------------
Your chances of having 5 winning numbers with this ticket are 0.001845%.
In other words, you have a 1 in 54,201 chances to win.
--------------------------


## Conclusion

We have coded four main functions for the first version of the app:

* `one_ticket_probability()` - computes the probability of winning the big prize on a single ticket
* `check_historical_occurrence()` - checks if a given combination has occurred in the Canadian Lotto dataset.
* `multi_ticket_probability()` - returns the probability of winning an arbitrary amount of tickets between 1 and 13,983,816
* `probability_less_6()` - computes the probability to have exactly two, three, four or five winning numbers