# Developing a Mobile App for Lottery Addiction


Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

Scenario: A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The data set has data for 3,665 drawings, dating from 1982 to 2018.

## Core Functions

To start, we'll define two functions that we'll use often in calculating probabilities and combinations.

In [4]:
# Function to find the factorial of value n
def factorial(n):
    result = 1
    for i in range(n, 0, -1):
        result *= i
    return result

# Function to find the total number of possible combinations of k objects taken from a total set of n objects
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(n - k) * factorial(k)
    return numerator / denominator

## One-Ticket Probability

Now we focus on writing a function that calculates the probability of winning the big prize.

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set. A player wins the big prize if the six numbers on his ticket match all the six numbers drawn. If a player has a ticket with the numbers `{13, 22, 24, 27, 42, 44}`, he only wins the big prize if the numbers drawn are `{13, 22, 24, 27, 42, 44}`. If only one number differs, he doesn't win.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket. So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [18]:
def one_ticket_probability(nums):
    total_possible_outcomes = combinations(49, 6) # total possible outcomes given selecting 6 out of 49 numbers
    successful_outcomes = 1 # only one combination will ultimately be successful out of all the possibilities
    probability_percentage = successful_outcomes / total_possible_outcomes * 100
    
    print(("The percentage chance of hitting all winning numbers with your selection of {} is {:.7f}%.  That's a 1 in {:,} chance.").format(nums, probability_percentage, int(total_possible_outcomes)))

Now we test our function on a couple of inputs:

In [19]:
one_ticket_probability([3, 15, 29, 33, 18, 7])

The percentage chance of hitting all winning numbers with your selection of [3, 15, 29, 33, 18, 7] is 0.0000072%.  That's a 1 in 13,983,816 chance.


In [20]:
one_ticket_probability([45, 49, 3, 11, 33, 19])

The percentage chance of hitting all winning numbers with your selection of [45, 49, 3, 11, 33, 19] is 0.0000072%.  That's a 1 in 13,983,816 chance.


Ultimately, regardless of what six numbers a user selects, the probability of success is the same.

## Historical Data Check for Canada Lottery

For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.  We'll use the dataset found [here](https://www.kaggle.com/datascienceai/lottery-dataset).

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- `NUMBER DRAWN 1`
- `NUMBER DRAWN 2`
- `NUMBER DRAWN 3`
- `NUMBER DRAWN 4`
- `NUMBER DRAWN 5`
- `NUMBER DRAWN 6`

Let's open the data set (we've saved a local copy in the file `649.csv`) and get familiar with its structure.

In [35]:
import pandas as pd

lottery_canada = pd.read_csv("649.csv")
print(lottery_canada.shape)

(3665, 11)


In [36]:
lottery_canada.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [37]:
lottery_canada.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


## Function for Historical Data Check

Now, we're going to write a function that will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have won by now.

The engineering team told us that we need to be aware of the following details:

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- The engineering team wants us to write a function that prints:
  - the number of times the combination selected occurred in the Canada data set; and
  - the probability of winning the big prize in the next drawing with that combination.
  
We'll begin by extracting the winning numbers from each past drawing entry by using the function below:

In [32]:
def extract_numbers(row): # pull all 6 drawn numbers from a row in the Canada data and return in Python set form
    numbers = set(row[4:10].values)
    return numbers

In [48]:
canada_winning_nums = lottery_canada.apply(extract_numbers, axis=1) # apply our number extraction function to each row in the Canada lottery data

In [70]:
canada_winning_nums.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Now, we'll define a function that takes in a user-selected combination of numbers and a full set of historical winning numbers, and tells the user both how many times that combination occurred historically and what the odds are, going forward, of that combination winning the big prize.

In [69]:
def check_historical_occurence(nums, winning_nums):
    user_nums = set(nums) # user selected numbers, converted to Python set form
    matches = (winning_nums == user_nums).sum() # compares each row of winning numbers to the provided combination and sums the total matches
    total_drawings = winning_nums.shape[0] # the total number of past drawings
    
    if matches == 0:
        print("The combination {} never occurred in Canada's 6/49 lottery history.  That doesn't mean it's much more likely to, going forward, however.  The percentage odds of your numbers being selected is 0.0000072%.  That's a 1 in 13,983,816 chance.".format(nums))
    else:
        print("The number of times the combination {} was ever drawn in Canada's 6/49 lottery is {}.  Going forward, the percentage odds of these numbers being selected is 0.0000072%.  That's a 1 in 13,983,816 chance.".format(nums, matches, total_drawings, matches / total_drawings))

Now let's try this historical numbers check function with a couple of example inputs:

In [65]:
check_historical_occurence([3, 17, 41, 5, 12, 8], canada_winning_nums)

The combination [3, 17, 41, 5, 12, 8] never occurred in Canada's 6/49 lottery history.  That doesn't mean it's much more likely to, going forward, however.  The percentage odds of your numbers being selected is 0.0000072%.  That's a 1 in 13,983,816 chance.


In [67]:
check_historical_occurence([5, 14, 31, 21, 34, 47], canada_winning_nums)

The number of times the combination [5, 14, 31, 21, 34, 47] was ever drawn in Canada's 6/49 lottery is 1.  Going forward, the percentage odds of these numbers being selected is 0.0000072%.  That's a 1 in 13,983,816 chance.


## Multi-Ticket Probability

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.

We've talked with the engineering team and they gave us the following information:

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

In [71]:
def multi_ticket_probability(num_tickets):
    total_possible_outcomes = combinations(49, 6)
    probability = num_tickets / total_possible_outcomes # num_tickets gives us number of successful outcomes possible
    
    print("With {} different tickets, the percentage chance you'll win the big prize is {:.7f}%.".format(num_tickets, probability * 100))

Now we'll test this function with a few example inputs:

In [72]:
examples = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for x in examples:
    multi_ticket_probability(x)

With 1 different tickets, the percentage chance you'll win the big prize is 0.0000072%.
With 10 different tickets, the percentage chance you'll win the big prize is 0.0000715%.
With 100 different tickets, the percentage chance you'll win the big prize is 0.0007151%.
With 10000 different tickets, the percentage chance you'll win the big prize is 0.0715112%.
With 1000000 different tickets, the percentage chance you'll win the big prize is 7.1511238%.
With 6991908 different tickets, the percentage chance you'll win the big prize is 50.0000000%.
With 13983816 different tickets, the percentage chance you'll win the big prize is 100.0000000%.


The number of tickets the user specifies gives us the number of possible successful outcomes.  By dividing this value by the total number of possible outcomes (achieved using our `combinations` function), we determine the probability of success given said number of tickets.

## Less Winning Numbers -- Function

We're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers.

For extra context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

These are the engineering details we'll need to be aware of:

- Inside the app, the user inputs:
  - six different numbers from 1 to 49; and
  - an integer between 2 and 5 that represents the number of winning numbers expected
- Our function prints information about the probability of having the inputted number of winning numbers.

To calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant behind the scenes, and we only need the integer between 2 and 5 representing the number of winning numbers expected.

In [80]:
def probability_less_6(n):
    num_combos_ticket = combinations(6, n) # determine how many combinations of chosen n can be created from a set of 6
    num_combos_remaining = combinations(49 - n, 6 - n) # determine the number of combinations possible given a single subset of n numbers
    
    total_successful_combos = num_combos_ticket * num_combos_remaining
    total_possible_combos = combinations(49, 6) # total possible combinations of 6 numbers from set of 1-49
    
    probability_pct = total_successful_combos / total_possible_combos * 100 # multiply by 100 for a percentage
    
    combos_simplified = round(total_possible_combos / total_successful_combos)
    
    print("The percent chance of having {} winning numbers out of 6 is {:.7f}%.  That is roughly a 1 in {:,} chance of winning.".format(n, probability_pct, combos_simplified))

In [81]:
for i in range(2, 6):
    probability_less_6(i)

The percent chance of having 2 winning numbers out of 6 is 19.1326531%.  That is roughly a 1 in 5 chance of winning.
The percent chance of having 3 winning numbers out of 6 is 2.1710812%.  That is roughly a 1 in 46 chance of winning.
The percent chance of having 4 winning numbers out of 6 is 0.1061942%.  That is roughly a 1 in 942 chance of winning.
The percent chance of having 5 winning numbers out of 6 is 0.0018879%.  That is roughly a 1 in 52,969 chance of winning.


## Conclusion

We managed to write four main functions for our app:

- `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket
- `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada 6/49 lottery data set
- `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816
- `probability_less_6()` — calculates the probability of having two, three, four or five winning numbers

Possible features for a second version of the app include:

- Making the outputs even easier to understand by adding analogies (for example, we can find probabilities for strange events and compare with the chances of winning in the lottery.  For instance, we can output something along the lines "You are 100 times more likely to be the victim of a shark attack than win the lottery").
- Combining the one_ticket_probability() and check_historical_occurrence() functions to output information on probability and historical occurrence at the same time.