# Mobile App for Lottery Addiction

Playing in lotteries is an activity that many pursue initially for fun. However, for a significant segment of these people this form of gambling turns into a habit that can eventually escalate into addiction. Like other forms of compulsive gambling, lottery addicts risk spending from savings and are critically suceptible to predatory loans that can lead to accumulating debts. Most lottery addicts suffer from difficulty estimating the true odds of winning (which is often intentionally confounded by lottery designs) as well as the classic "gambler's fallacy" which leads to the notion that "a gamblers luck can turn around".

In this project, we'll imagine that a medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, so our role is to create the logical core of the app and calculate probabilities. 

For our first efforts we're going to focus on accurately capturing the behavior of the national [Canadian Lotto 6/49](https://en.wikipedia.org/wiki/Lotto_6/49). Lotto 6/49 is a lottery in which a total of six numbers ranging from 1 to 49 are drawn from a pool of forty-nine. If a ticket matches all six numbers exactly, a jackpot prize of at least $5,000,000 CAD is won. To train our model, we'll use a historical dataset of 3,665 drawings dating from 1982 to 2018 [hosted by Kaggle](https://www.kaggle.com/datascienceai/lottery-dataset).

Throughout this project we will seek to answer the following questions:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?


In [1]:
# Basic combinatorial functions
def factorial(n: int) -> int:
    """
    Returns the value of n! = n*(n-1)*(n-2)*...*1
    Inputs:
        n | int | value used to compute n!
    Returns:
        n! | int 
    """
    fac = 1
    for k in range(n):
        fac *= (k+1)
    return fac

def combinations(n: int, k: int) -> int:
    """
    Returns the number of combinations from a set of n items chosen k at a time.
    Inputs:
        n | int | number of items in set to pull from
        k | int | number of items chosen for each combination
    Returns
        C(n,k) | int | number of combinations of k items chosen from a set of n items.
    """
    assert k <= n, "n must be greater than or equal to k."
    return factorial(n)/(factorial(n-k)*factorial(k))

Next, we'll look into calculating the probability of winning the jackbox given the numbers for a single ticket. The ticket will be represented by a Python list of six numbers. We'll try presenting the probability of winning in two ways: as a percentage and as a (sort-of) odds.

In [2]:
def one_ticket_prob(lst: list) -> float:
    """
    Computes the probability of winning for a given ticket represented by a list of 6 integers.
    Inputs:
        lst | list(int) | list representing the given ticket of lottery numbers
    Returns:
        out | float | probability of given ticket to win the jackpot prize
    """
    prob = 1./combinations(49,6)
    s = "The probability of the ticket {ticket} winning the jackpot is {prob:%}\n" \
        "This is equivalent to winning 1 time in {odds:,} tries."
    return s.format(ticket=lst, prob=prob, odds=combinations(49,6))

In [3]:
# testing function with some random inputs
import random
ticket = [random.randint(1,49) for k in range(6)]
print(one_ticket_prob(ticket))

The probability of the ticket [13, 20, 36, 10, 41, 3] winning the jackpot is 0.000007%
This is equivalent to winning 1 time in 13,983,816.0 tries.


## Comparing ticket to historical data

Our next objective is to write a function that is able to compare a given ticket with historical data of prize drawings to check if those specific numbers have ever won.

In [4]:
import pandas as pd

data = pd.read_csv('649.csv')
data.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.1+ KB


From the data description, we know this is already a pretty clean dataset since there are no missing values in any of the columns. Of particular interest for us are the `NUMBER DRAWN [K]` where `1<=K<=6` represents each number drawn for a given lottery.

In [6]:
def extract_draws(row: pd.core.series.Series) -> set:
    """
    Extracts the winning numbers from a single lottery drawing.
    Inputs:
        row | pd.Series | pandas series containing data indexed by "NUMBER DRAWN [K]" where 1<=K<=6
    Returns:
        out | set | set conntaining winning draws as integers
    """
    return {row["NUMBER DRAWN {val}".format(val=k+1)] for k in range(6)}

winning_draws = data.apply(extract_draws, axis=1)
winning_draws.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [7]:
def check_ticket_historical(ticket: list, winning_draws: pd.core.series.Series) -> str:
    """
    Checks given ticket against historical data to see if ticket numbers have ever won before.
    Inputs:
        ticket | list | list of integers representing desired ticket
        winning_draws | pd.core.series.Series | pandas series containing winning historical ticket data stored as sets
    Returns:
        out | str | string to display information about ticket's historical winnings
    """
    ticket_set = set(ticket)
    cond = winning_draws == ticket_set
    count_of_wins = cond.sum()
    s = "The ticket {ticket} has won a total of {wins} times in the past.".format(ticket=ticket, wins=count_of_wins)
    if not count_of_wins:
        s += "\nDespite never winning before, " + one_ticket_prob(ticket)
    return s

In [8]:
ticket = [random.randint(1,49) for k in range(6)]
print(check_ticket_historical(ticket, winning_draws))

The ticket [28, 26, 13, 37, 44, 18] has won a total of 0 times in the past.
Despite never winning before, The probability of the ticket [28, 26, 13, 37, 44, 18] winning the jackpot is 0.000007%
This is equivalent to winning 1 time in 13,983,816.0 tries.


## Multi-ticket probabilities

Lottery players usually purchase more than one ticket for a single drawing to help improve their chances of winning. Our next objective is to write a function that calculates the probability of winning the draw for any number of distinct tickets.

Note that the maximum number of distinct tickets allowable for a single draw is `combinations(49,6) = 13,983,816`.

In [9]:
def multi_ticket_prob(n_tickets: int) -> str:
    """
    Calculates the probability of winning the draw given a number n_tickets of distinct tickets.
    Inputs:
        n_tickets | int | number of winning tickets for this drawing
    Returns:
        out | str | string to display information about probability of winning
    """
    assert n_tickets>=0, "n_tickets must be at least 0"
    
    combos = combinations(49,6)
    assert n_tickets<=combos, "n_tickets cannot be larger than {}".format(combos)
    
    prob = n_tickets/combos
    s = "The probability of the {n_tickets} ticket(s) winning the jackpot is {prob:%}\n" \
        "This is equivalent to winning 1 time in {odds:,} tries.".format(n_tickets=n_tickets, prob=prob, odds=round(1/prob))
    return s

In [11]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for val in test_inputs:
    print(multi_ticket_prob(val)+"\n")

The probability of the 1 ticket(s) winning the jackpot is 0.000007%
This is equivalent to winning 1 time in 13,983,816 tries.

The probability of the 10 ticket(s) winning the jackpot is 0.000072%
This is equivalent to winning 1 time in 1,398,382 tries.

The probability of the 100 ticket(s) winning the jackpot is 0.000715%
This is equivalent to winning 1 time in 139,838 tries.

The probability of the 10000 ticket(s) winning the jackpot is 0.071511%
This is equivalent to winning 1 time in 1,398 tries.

The probability of the 1000000 ticket(s) winning the jackpot is 7.151124%
This is equivalent to winning 1 time in 14 tries.

The probability of the 6991908 ticket(s) winning the jackpot is 50.000000%
This is equivalent to winning 1 time in 2 tries.

The probability of the 13983816 ticket(s) winning the jackpot is 100.000000%
This is equivalent to winning 1 time in 1 tries.



## Minor prize winning probabilities

In addition to the jackpot prize awarded for matching all 6 of the drawn numbers, there are smaller prizes for matching a subset of 2, 3, 4, or 5 numbers drawn. These minor awards are designed to help smooth over the "all or nothing" mindset a potential gambler might see. As a consequence, users might be interested in knowing the probability of having two, three, four, or five winning numbers.

To determine the number of successes for matching `m` numbers on a ticket note that there are `C(6,m)` possible subtickets. For each of these possible subtickets, there are `C(49-6, 6-m)` remaining ways to fill the last `6-m` slots in the ticket. In particular, we use `49-6 = 43` possible numbers because if one of the remaining digits is already on our ticket, that would make the ticket match `m+1` numbers, rather than the desired `m` numbers.

In [15]:
def minor_prize_prob(ticket: list, m: int) -> str:
    """
    Calculates the probability of winning a minor prize by matching 2<=m<=5 numbers to the winning draw, rather than the full 6.
    Inputs:
        ticket | list (int) | specific ticket to check against
        m | int | number of matches required to win, note this calculates the probability of matching exactly m numbers from the winning draw
    Returns:
        out | str | string to display information about probability of winning
    """
    assert m>=2, "m must be at least 2 matches"
    if m==len(ticket): return one_ticket_prob(ticket)
    
    combos = combinations(49,6)
    successes = combinations(6, m)*combinations(49-6, 6-m)
    prob = successes/combos
    s = "The probability of the {ticket} winning a minor prize by matching {m} numbers is {prob:%}\n" \
        "This is equivalent to winning 1 time in {odds:,} tries.".format(ticket=ticket, m=m, prob=prob, odds=round(1/prob))
    
    return s

In [16]:
test_matches  = [2,3,4,5]
ticket = [random.randint(1,49) for k in range(6)]
for val in test_matches:
    print(minor_prize_prob(ticket, val)+"\n")

The probability of the [32, 20, 16, 21, 10, 43] winning a minor prize by matching 2 numbers is 13.237803%
This is equivalent to winning 1 time in 8 tries.

The probability of the [32, 20, 16, 21, 10, 43] winning a minor prize by matching 3 numbers is 1.765040%
This is equivalent to winning 1 time in 57 tries.

The probability of the [32, 20, 16, 21, 10, 43] winning a minor prize by matching 4 numbers is 0.096862%
This is equivalent to winning 1 time in 1,032 tries.

The probability of the [32, 20, 16, 21, 10, 43] winning a minor prize by matching 5 numbers is 0.001845%
This is equivalent to winning 1 time in 54,201 tries.

