# 6/49 Lottery Mobile App

The purpose if this project is to contribute to the development of a mobile app that aims to **help users better estimate their chances of winning the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49)**.

This project will create the logical core of the app and calculate probabilities. Users of the app will be able to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

This project will consider historical data from the [national 6/49 lottery game in Canada](https://www.kaggle.com/datascienceai/lottery-dataset) which has data for 3,665 drawings, dating from 1982 to 2018.

## Imports

In [1]:
import numpy as np
import pandas as pd

## Go-To Functions

Two functions that will be repeatedly used in this project are factorials and combinations.

### Factorials

The formula for factorial is given by:

$$n! = n * (n - 1) * (n - 2) * ... * 2 * 1$$

In [5]:
def factorials(n):
    """
    Return the factorial of n.
    Example: 3! = 3 x 2 x 1 = 6
    """
    res = 1
    for i in range(n):
        res *= n
        n -= 1
    return res

### Combinations

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, and order does not matter.

The formula for unordered sampling without replacement is given by:

$$_{n}C_{k} = {n \choose k} = \frac{n!}{k!(n-k)!}$$

In [6]:
def combinations(n,k):
    """
    Return the number of ways to choose k objects from a gripu of n objects.
    """
    return factorials(n)/(factorials(k)*factorials(n-k))

## Probability of Winning The Big Price With a Single Ticket
A player wins the big prize if **the six numbers on their tickets match all the six numbers drawn**. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, they only win the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, they did not win.

This section will build a function that calculates the probability of winning the big prize for any given ticket.

**Function specifications:**

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

In [7]:
LEN_TICKET = 6  # Each ticket muct have exactly 6 numbers.
LEN_POSSIBLE_DIGITS = 49

def one_ticket_probability(nums):
    """
    Calculate the probability of winning the big prize with one 
    ticket for a given list of numbers.
    
    Arguments:
    ----------
    nums : list of 6 different integers from 1 to 49 inclusive.
    """
    if len(nums) == LEN_TICKET:
        comb = combinations(LEN_POSSIBLE_DIGITS, LEN_TICKET)
        p_win_percentage = (1/comb) * 100
        print("Your chances of winning the big prize using one" 
              "ticket with numbers {} is {:.7f}%.\nIn other words,"
              "you have a one in {:,.0f} chance of "
              "winning.".format(nums, p_win_percentage,comb)
             )
    else:
        print("You entered {} numbers. Please enter exactly 6 \
        numbers.".format(len(nums)))
        

In [8]:
one_ticket_probability([13, 22, 24, 27, 42, 44])

Your chances of winning the big prize using oneticket with numbers [13, 22, 24, 27, 42, 44] is 0.0000072%.
In other words,you have a one in 13,983,816 chance of winning.


## Explore Historical Data

In [9]:
historical_df = pd.read_csv("data/649.csv")
print(historical_df.head(3))
print(historical_df.tail(3))

   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
0      649            1                0  6/12/1982               3   
1      649            2                0  6/19/1982               8   
2      649            3                0  6/26/1982               1   

   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \
0              11              12              14              41   
1              33              36              37              39   
2               6              23              24              27   

   NUMBER DRAWN 6  BONUS NUMBER  
0              43            13  
1              41             9  
2              39            34  
      PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \
3662      649         3589                0  6/13/2018               6   
3663      649         3590                0  6/16/2018               2   
3664      649         3591                0  6/20/2018              14   

     

In [10]:
historical_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   PRODUCT          3665 non-null   int64 
 1   DRAW NUMBER      3665 non-null   int64 
 2   SEQUENCE NUMBER  3665 non-null   int64 
 3   DRAW DATE        3665 non-null   object
 4   NUMBER DRAWN 1   3665 non-null   int64 
 5   NUMBER DRAWN 2   3665 non-null   int64 
 6   NUMBER DRAWN 3   3665 non-null   int64 
 7   NUMBER DRAWN 4   3665 non-null   int64 
 8   NUMBER DRAWN 5   3665 non-null   int64 
 9   NUMBER DRAWN 6   3665 non-null   int64 
 10  BONUS NUMBER     3665 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, the six numbers drawn can be found in columns 5 to 10.

## Probability of Winning The Big Price (Historical Data)

This function will enable users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

**Function Specifications:**

- Inside the app, the user inputs six different numbers from 1 to 49.
- Under the hood, the six numbers will come as a Python list and serve as an input to our function.
- Function should print:
    - the number of times the combination selected occurred in the Canada data set; and
    - the probability of winning the big prize in the next drawing with that combination.

### Extract All Winning Numbers From Historical Data

In [11]:
def extract_numbers(row):
    """
    Returns all winning numbers in the historical dataframe.
    
    Arguments:
    ----------
    row: row in the historical dataframe.
    """
    numbers = row[4:10]
    return set(numbers.values)

In [13]:
winning_numbers = historical_df.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [14]:
def check_historical_occurence(ticket_nums, past_winning_nums):
    """
    
    """
    comb = combinations(LEN_POSSIBLE_DIGITS, LEN_TICKET)
    p_win_percentage = (1/comb) * 100
    ticket_nums = set(ticket_nums)
    occurred = past_winning_nums == ticket_nums
    total_occurrences = sum(occurred)
    
    if total_occurrences == 0:
        print("This combination has never won the big prize.")
    else:
        print("The number of times this combination has won the "
              "big prize is {}.".format(total_occurrences))
        
    print("Your chances of winning the big prize with this combination "
          "is {:.7f}%. In other words, you have a 1 in {:,.0f} chance "
          "of winning.".format(p_win_percentage,comb)
         )

In [15]:
check_historical_occurence([3, 41, 11, 12, 43, 14], winning_numbers)

The number of times this combination has won the big prize is 1.
Your chances of winning the big prize with this combination is 0.0000072%. In other words, you have a 1 in 13,983,816 chance of winning.


## Probability of Winning The Big Price With Multiple Tickets

Users usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. This section will create a function that will allow  users to calculate the chances of winning for any number of different tickets.

**Function Specifications:**

- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
- The function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

**Logic**

The number of successful outcomes is given by the number of tickets the user intends to play, and the total number of outcomes is given by $49 \choose 6$.

Ultimately, the probability of winning the big prize with n tickets is given by:

$$P(Winning with n tickets) = \frac{n}{49\choose 6}$$

In [18]:
def multi_ticket_probability(n_tickets):
    """
    Calculate the chances of winning for any number of different tickets.
    
    Argument:
    ---------
    n_tickets (int): number of tickets.
    """
    n_combinations = combinations(LEN_POSSIBLE_DIGITS, LEN_TICKET)
    p_win_percentage = (n_tickets/n_combinations) * 100
    chances_simplified = round(n_combinations/n_tickets)
    print("You have a {:.7f}% chance of winning the big prize with {:,}"
          " tickets. In other words you have a 1 in {:,} chance.\n"
          .format(p_win_percentage, n_tickets, chances_simplified)
         )

**Testing Function**


In [19]:
for i in [1, 10, 100, 10000, 1000000, 6991908, 13983816]:
    multi_ticket_probability(i)

You have a 0.0000072% chance of winning the big prize with 1 tickets. In other words you have a 1 in 13,983,816 chance.

You have a 0.0000715% chance of winning the big prize with 10 tickets. In other words you have a 1 in 1,398,382 chance.

You have a 0.0007151% chance of winning the big prize with 100 tickets. In other words you have a 1 in 139,838 chance.

You have a 0.0715112% chance of winning the big prize with 10,000 tickets. In other words you have a 1 in 1,398 chance.

You have a 7.1511238% chance of winning the big prize with 1,000,000 tickets. In other words you have a 1 in 14 chance.

You have a 50.0000000% chance of winning the big prize with 6,991,908 tickets. In other words you have a 1 in 2 chance.

You have a 100.0000000% chance of winning the big prize with 13,983,816 tickets. In other words you have a 1 in 1 chance.



## Probability of Winning Numbers Less Than 6

In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

This section will create a function to calculate probabilities for two, three, four, or five winning numbers.

**Function Specification:**

- Inside the app, the user inputs:
    - six different numbers from 1 to 49; and
    - an integer between 2 and 5 that represents the number of winning numbers expected
- The function prints information about the probability of having the inputted number of winning numbers.

**Logic**

Assuming a player chose these six numbers on a ticket: (1, 2, 3, 4 ,5 ,6). Out of these six numbers the number of five-number combinations is $6 \choose{5}$. 

For each one of the six five-number combinations above, there are 44 possible successful outcomes in a lottery drawing. However, the outcomes of interest here are those that match exactly five numbers, not at least five numbers. This means that for each of the six five-number combinations there are 43 possible successful outcomes, not 44.

From above, the total number of successful outcomes given six five-number combinations and each combination corresponding to 43 successful outcomes is $6 * 43$ by the rule of products.

Ultimately, the probability of getting exactly 5 winning numbers is given by:

$$P(Five winning numbers) = \frac{6 * 43}{49 \choose{6}}$$

In [20]:
def probability_less_than_6(n_winning):
    n_choose_6 = combinations(LEN_TICKET,n_winning)
    n_successful = combinations(43, LEN_TICKET - n_winning)
    total_possible_outcomes = combinations(LEN_POSSIBLE_DIGITS, LEN_TICKET)
    outcomes_simplified = round(total_possible_outcomes/(n_choose_6*n_successful))
    
    p_win_percentage = (n_choose_6 * n_successful)/total_possible_outcomes
    print("Your chances of having {} winning numbers are {:.7f}%. In other words,"
          " you have a 1 in {} chance.\n".format(n_winning, p_win_percentage, outcomes_simplified))

**Testing Function**

In [21]:
for i in range(1,6):
    probability_less_than_6(i)

Your chances of having 1 winning numbers are 0.4130195%. In other words, you have a 1 in 2 chance.

Your chances of having 2 winning numbers are 0.1323780%. In other words, you have a 1 in 8 chance.

Your chances of having 3 winning numbers are 0.0176504%. In other words, you have a 1 in 57 chance.

Your chances of having 4 winning numbers are 0.0009686%. In other words, you have a 1 in 1032 chance.

Your chances of having 5 winning numbers are 0.0000184%. In other words, you have a 1 in 54201 chance.



## Conclusion and Future Work

Possible features for a second version of the app include:

- Making the outputs even easier to understand by comparing the chances of winning in the lottery with that of strange events. For example, "You are 100 times more likely to be the victim of a shark attack than winning the lottery").
- Combining the one_ticket_probability() and check_historical_occurrence() to output information on probability and historical occurrence at the same time.
- Creating a function similar to probability_less_6() which calculates the probability of having at least two, three, four or five winning numbers.

    - For example, the number of successful outcomes for having at least four winning numbers is the sum of these three numbers:
        - The number of successful outcomes for having four winning numbers exactly
        - The number of successful outcomes for having five winning numbers exactly
        - The number of successful outcomes for having six winning numbers exactly