# Mobile App for Lottery Addiction

## Introduction

### Scenario

>Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, accumulate debts, and eventually engage in desperate behaviors like theft.
>
>A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

\- From Dataquest's guided project [introduction](https://app.dataquest.io/c/65/m/382/guided-project%3A-mobile-app-for-lottery-addiction/1/introduction?path=2&slug=data-scientist&version=1)

### Goal

>For the first version of the app, they want us to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) and build functions that enable users to answer questions like:
>
>- What is the probability of winning the big prize with a single ticket?
>- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
>- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?
>
>The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set from Kaggle](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

\- From Dataquest's guided project [introduction](https://app.dataquest.io/c/65/m/382/guided-project%3A-mobile-app-for-lottery-addiction/1/introduction?path=2&slug=data-scientist&version=1)

### Result

The 6/49 Lottery has a fixed jackpot amount of \\$5 million with six matching numbers. Because the 6/49 is a Canadian lottery, the winnings are non-taxable so the winner takes home the full amount. Each ticket costs \\$3. The odds of winning jackpot is 1 in 13,983,816 or 0.0000071511%. Using historical winning numbers neither improves nor worsens the odds; the same with using Quick Pick versus manually choosing numbers.

The odds of winning drastically improves to 1 in 13 if one purchases one million tickets which equates to a cost of \\$3 million and net profit of \\$2 million assuming one sole winner. Compare this to a more reasonable purchase amount of 40 tickets which costs \\$120 and decreases the odds of winning to 1 in 349,595. Based on individual risk threshold and disposable income level, one can "game" the system by purchasing more tickets to increase the odds. There is also the risk of sharing the prize with other potential winners which may remove net profit altogether.

Prizes are also available if 2, 3, 4, or 5 numbers match the winning numbers. The odds of matching the winning numbers are listed below.

| Match | Odds |
| :--: | :-- |
| 2 | 1 in 7 |
| 3 | 1 in 56 |
| 4 | 1 in 1,032 |
| 5 | 1 in 54,200 |
| 6 | 1 in 13,983,816 |

## One-ticket Probability

>
>We discussed with the engineering team of the medical institute, and they told us we need to be aware of the following details when we write the function:
>
>- Inside the app, the user inputs six different numbers from 1 to 49.
>- Under the hood, the six numbers will come as a Python list, which will serve as the single input to our function.
>- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.

\- From the guided project's ['One-ticket Probability' section](https://app.dataquest.io/c/65/m/382/guided-project%3A-mobile-app-for-lottery-addiction/3/one-ticket-probability?path=2&slug=data-scientist&version=1)

### Total number of combinations for a six-number lottery ticket

In [1]:
def combinations(n, k):
    def factorial(x):
        n_fact = 1
        for i in range(1, x+1):
            n_fact *= i
        return n_fact
    return factorial(n)/(factorial(k)*factorial(n-k))

In [2]:
num_combos = combinations(49,6)
print(f"{num_combos:,.0f}")

13,983,816


In [3]:
print(f"{1/num_combos*100:.10f}")

0.0000071511


There are 49 possible numbers, and six numbers are sampled without replacement. The chance of winning the lottery with one ticket is 1 in 13,983,816 or 0.0000071511%.

To give you an idea of how slim these odds are, the [chance of getting struck by lightning](https://historydaily.org/lightning-doesnt-strike-twiceit-strikes-seven-times) is 1 in 960,000. In other words, it is almost 15 times more likely that you are hit with a lightning strike than winning the lottery.

### Quick Pick numbers

The user can choose their own six numbers on the selection slip or draw six Quick Pick random numbers. Below we simulate drawing random numbers using Quick Pick.

In [4]:
import random

def quick_pick():
    quick_pick_numbers = []
    rand_num = random.randint(1,50)
    while len(quick_pick_numbers)<6:
        rand_num = random.randint(1,50)
        if rand_num not in quick_pick_numbers:
            quick_pick_numbers.append(rand_num)
    return sorted(quick_pick_numbers)

In [5]:
quick_pick_nums = quick_pick()
print(quick_pick_nums)

[9, 11, 21, 26, 43, 45]


### Odds of winning with Quick Pick versus selecting numbers

The odds of winning by using Quick Pick to generate random numbers versus manually selecting numbers are the same:

In [6]:
def one_ticket_probability(player_numbers):
    combos = combinations(49, len(player_numbers))
    proportion = 1/combos
    print(f"Your chance to win the jackpot with {player_numbers} is {proportion*100:.10f}%, or 1 in {int(combos):,}.")

In [7]:
quick_pick = one_ticket_probability(quick_pick_nums)

Your chance to win the jackpot with [9, 11, 21, 26, 43, 45] is 0.0000071511%, or 1 in 13,983,816.


In [8]:
selection_slip = one_ticket_probability([2, 5, 7, 8, 32, 41])

Your chance to win the jackpot with [2, 5, 7, 8, 32, 41] is 0.0000071511%, or 1 in 13,983,816.


## Historical Data Check for Canada Lottery

In [9]:
import pandas as pd

dataset = pd.read_csv("649.csv")

In [10]:
dataset.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [11]:
dataset.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


As noted in the Introduction, the dataset spans draws between 1982 and 2018. There are 3,665 rows. Fulfilling one of the objectives requested by the engineering team, we store each row of numbers drawn into a list.

In [12]:
def extract_numbers(index):
    index = index[4:10]
    index = set(index.values)
    return index   

In [13]:
winning_numbers = dataset.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [14]:
def check_historical_occurrence(player_numbers, winning_numbers):
    player_numbers_set = set(player_numbers)
    check_occurrence = player_numbers_set == winning_numbers
    n_occurrences = check_occurrence.sum()
    string_time = "time" if n_occurrences == 1 else "times"
    
    if n_occurrences == 0:
        print(f"Your numbers {player_numbers} have never won the lottery in the past.")
    else:
        print(f"Your numbers {player_numbers} have won {n_occurrences} {string_time} in the past.")

In [15]:
player_test1_numbers = [3, 6, 19, 24, 46, 48]
check_historical_occurrence(player_test1_numbers, winning_numbers)

Your numbers [3, 6, 19, 24, 46, 48] have never won the lottery in the past.


In [16]:
player_test2_numbers = [2, 3, 15, 23, 41, 46]
check_historical_occurrence(player_test2_numbers, winning_numbers)

Your numbers [2, 3, 15, 23, 41, 46] have won 1 time in the past.


Using historical winning numbers neither increases nor decreases the probability of winning future lotteries. The odds are still 1 in 13,983,816:

In [17]:
one_ticket_probability(player_test1_numbers)
one_ticket_probability(player_test2_numbers)

Your chance to win the jackpot with [3, 6, 19, 24, 46, 48] is 0.0000071511%, or 1 in 13,983,816.
Your chance to win the jackpot with [2, 3, 15, 23, 41, 46] is 0.0000071511%, or 1 in 13,983,816.


## Multi-ticket Probability

>Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning.
>
>We've talked with the engineering team and they gave us the following information:
>
>- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).
>- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).
>- The function should print information about the probability of winning the big prize depending on the number of different tickets played.

\- From the guided project's ['Multi-ticket Probability' section](https://app.dataquest.io/c/65/m/382/guided-project%3A-mobile-app-for-lottery-addiction/6/multi-ticket-probability)

In [18]:
def multi_ticket_probability(number_of_tickets):
    combos = combinations(49, 6)
    proportion = number_of_tickets/combos
    string_ticket = "ticket" if number_of_tickets == 1 else "tickets"
    print(f"Your chance to win the jackpot with {number_of_tickets:,} {string_ticket} is {proportion*100:.10f}%, or 1 in {int(combos/number_of_tickets):,}.")

In [19]:
test_ticket_counts = [1, 2, 3, 4, 5, 10, 40, 100, 10_000, 1_000_000, 6_991_908, 13_983_816]

for test_ticket_count in test_ticket_counts:
    multi_ticket_probability(test_ticket_count)

Your chance to win the jackpot with 1 ticket is 0.0000071511%, or 1 in 13,983,816.
Your chance to win the jackpot with 2 tickets is 0.0000143022%, or 1 in 6,991,908.
Your chance to win the jackpot with 3 tickets is 0.0000214534%, or 1 in 4,661,272.
Your chance to win the jackpot with 4 tickets is 0.0000286045%, or 1 in 3,495,954.
Your chance to win the jackpot with 5 tickets is 0.0000357556%, or 1 in 2,796,763.
Your chance to win the jackpot with 10 tickets is 0.0000715112%, or 1 in 1,398,381.
Your chance to win the jackpot with 40 tickets is 0.0002860450%, or 1 in 349,595.
Your chance to win the jackpot with 100 tickets is 0.0007151124%, or 1 in 139,838.
Your chance to win the jackpot with 10,000 tickets is 0.0715112384%, or 1 in 1,398.
Your chance to win the jackpot with 1,000,000 tickets is 7.1511238420%, or 1 in 13.
Your chance to win the jackpot with 6,991,908 tickets is 50.0000000000%, or 1 in 2.
Your chance to win the jackpot with 13,983,816 tickets is 100.0000000000%, or 1 in 1

## Cost to Win

Lottery winnings are not taxable in Canada. The jackpot for the 6/49 lottery (having all six winning numbers) equates to fixed prize of \\$5 million. With one 6/49 lottery ticket costing \\$3, we determine the maximum number of tickets we would need to purchase for a net gain at various risk thresholds and assuming you are the sole winner. There is no limit to how many tickets you can purchase but you run the risk of sharing the prize in the case of other winners.

In [20]:
def profit(desired_net_amount):
    ticket_cost = 3
    prize_amount = 5000000
    net_cost = prize_amount-desired_net_amount
    if net_cost <= 0:
        print("Your desired profit is not achievable.")
    else:
        number_of_tickets = net_cost // ticket_cost
        string_ticket = "ticket" if number_of_tickets == 1 else "tickets"
        print(f"The cost to win the jackpot with a net profit of ${desired_net_amount:,} is ${(ticket_cost * number_of_tickets):,} ({number_of_tickets:,} {string_ticket}).")
        multi_ticket_probability(number_of_tickets)

In [21]:
test_desired_profit_amounts = [1, 500_000, 1_000_000, 2_000_000, 3_000_000, 4_000_000, 4_500_000, 4_999_997]

for test_desired_profit in test_desired_profit_amounts:
    profit(test_desired_profit)
    print(f"{'-'*100}")

The cost to win the jackpot with a net profit of $1 is $4,999,998 (1,666,666 tickets).
Your chance to win the jackpot with 1,666,666 tickets is 11.9185349693%, or 1 in 8.
----------------------------------------------------------------------------------------------------
The cost to win the jackpot with a net profit of $500,000 is $4,500,000 (1,500,000 tickets).
Your chance to win the jackpot with 1,500,000 tickets is 10.7266857630%, or 1 in 9.
----------------------------------------------------------------------------------------------------
The cost to win the jackpot with a net profit of $1,000,000 is $3,999,999 (1,333,333 tickets).
Your chance to win the jackpot with 1,333,333 tickets is 9.5348294057%, or 1 in 10.
----------------------------------------------------------------------------------------------------
The cost to win the jackpot with a net profit of $2,000,000 is $3,000,000 (1,000,000 tickets).
Your chance to win the jackpot with 1,000,000 tickets is 7.1511238420%, or 

## Less Winning Numbers

>In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.
>
>These are the engineering details we'll need to be aware of:
>
>- Inside the app, the user inputs:
>   - six different numbers from 1 to 49; and
>   - an integer between 2 and 5 that represents the number of winning numbers expected
>- Our function prints information about the probability of having the inputted number of winning numbers.

\- From the guided project's ['Less Winning Numbers' section](https://app.dataquest.io/c/65/m/382/guided-project%3A-mobile-app-for-lottery-addiction/7/less-winning-numbers-function?path=2&slug=data-scientist&version=1)

In [22]:
def probability_less_6(count_winning_numbers):
    successful_outcomes = combinations(6, count_winning_numbers) * combinations(43, 6 - count_winning_numbers)
    combos = combinations(49, 6)
    proportion = successful_outcomes/combos
    print(f"The chance of your ticket having {count_winning_numbers} winning numbers is {proportion*100:.5f}%, or 1 in {int(combos/successful_outcomes):,}.")

In [23]:
test_count_winning_numbers = [2, 3, 4, 5]
for test_count in test_count_winning_numbers:
    probability_less_6(test_count)

The chance of your ticket having 2 winning numbers is 13.23780%, or 1 in 7.
The chance of your ticket having 3 winning numbers is 1.76504%, or 1 in 56.
The chance of your ticket having 4 winning numbers is 0.09686%, or 1 in 1,032.
The chance of your ticket having 5 winning numbers is 0.00184%, or 1 in 54,200.
