# **Guided Project: Mobile app for lottery addiction**

*Made by Jaime Bibiloni in November 2022*

# **Tackling lottery addiction with hard cold numbers**

---

## **Introduction**

A medical institute has requested that we create the core logic for a new app aimed at improving user's understanding of the real probabilities involved in the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49), a kind of lottery popular in Canada which allows users to pick their playing numbers. By improving customer's understanding of the underlying probabilities, the institute hopes to decrease lottery addiction. Our task involves answering questions such as:

- What is the probability of winning the big prize with a single ticket?

- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?

- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

We will be analyzing a [data set](https://www.kaggle.com/datasets/datascienceai/lottery-dataset?resource=download) hosted in Kaggle which stores data on more than 3500 drawings between 1982 and 2018.

In the **6/49 lottery, six different numbers are drawn without replacement from a set of 49 numbers** that ranges from 1 to 49.

In [1]:
import pandas as pd

# Automatically format all code in the notebook
%load_ext lab_black

# Show every output in the notebook
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

## **Core functions**

Let's start by creating two useful functions we will reuse during the project:
- `factorial()`: A function to calculate factorials.
- `combinations()`: A function to calculate combinations:

In [2]:
def factorial(n):
    if n == 0:
        return 1
    else:
        fact = n
        for e in range(1, n):
            fact = fact * (n - 1 * e)
        return fact

In [3]:
def combinations(n, k):
    comb = factorial(n) / (factorial(k) * factorial(n - k))
    return int(comb)

## **One-ticket probability of winning**

In [4]:
# The ticket input will be a list
def one_ticket_probability(ticket):
    possible_outcomes = combinations(49, 6)
    return (
        "The probability of winning the 6/49 lottery with the ticket "
        "{0} is 1 out of {1} million ({2} %)".format(
            ticket,
            round((possible_outcomes / 1000000), 2),
            "{:f}".format((1 / possible_outcomes) * 100),
        )
    )

In [5]:
# Testing the function with different tickets
one_ticket_probability([1, 2, 3, 4, 5, 6])

one_ticket_probability([11, 22, 33, 44, 55, 66])

one_ticket_probability([8, 42, 25, 39, 36, 31])

'The probability of winning the 6/49 lottery with the ticket [1, 2, 3, 4, 5, 6] is 1 out of 13.98 million (0.000007 %)'

'The probability of winning the 6/49 lottery with the ticket [11, 22, 33, 44, 55, 66] is 1 out of 13.98 million (0.000007 %)'

'The probability of winning the 6/49 lottery with the ticket [8, 42, 25, 39, 36, 31] is 1 out of 13.98 million (0.000007 %)'

As shown above, **the probability of winning the 6/49 lottery with any ticket is** extremely low: roughly **1 out of 14 million**.

## **Historical data check**

In [6]:
lottery_649 = pd.read_csv("649.csv")

lottery_649.shape

lottery_649.head(3)

lottery_649.tail(3)

(3665, 11)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,06/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The data set contains 11 columns showing the date and each drawn number for each of the 3665 draws (rows) included. All these draws took place in the Canadian 6/49 lottery between 1982 and 2018.

Let's create a function so users can compare their tickets against the historical lottery data and determine whether they would have ever won.

In [14]:
# Function to extract a set of winning numbers from a row of the data set
def extract_numbers(lottery_row):
    numbers_set = set(lottery_row[4:11])
    return numbers_set


# Try function with first row
extract_numbers(lottery_649.iloc[0])

# Extract all sets of hitoric winning numbers recorded in the data set
all_sets = lottery_649.apply(extract_numbers, 1)

{3, 11, 12, 13, 14, 41, 43}

In [16]:
# Function to compare a single ticket against the full list of historical winners
def check_historical_occurrence(ticket, historical_winners):
    ticket_set = set(ticket)
    past_winner_times = 0

    for e in historical_winners:
        if e == ticket_set:
            past_winner_times += 1

    winner_probability = past_winner_times / len(historical_winners)

    print(past_winner_times)

    print(
        "According to results from  past draws, the probability of winning "
        "the 6/49 lottery with the ticket "
        "{0} is {1}%".format(ticket, "{:.3f}".format(winner_probability * 100))
    )


# Checking function
check_historical_occurrence([14, 24, 31, 35, 37, 48, 17], all_sets)

# Checking if function works when changing number order of a past winning ticket
check_historical_occurrence([24, 14, 31, 35, 37, 48, 17], all_sets)

# Checking function
check_historical_occurrence([1, 2, 3, 4, 5, 6, 7], all_sets)

1
According to results from  past draws, the probability of winning the 6/49 lottery with the ticket [14, 24, 31, 35, 37, 48, 17] is 0.027%
1
According to results from  past draws, the probability of winning the 6/49 lottery with the ticket [24, 14, 31, 35, 37, 48, 17] is 0.027%
0
According to results from  past draws, the probability of winning the 6/49 lottery with the ticket [1, 2, 3, 4, 5, 6, 7] is 0.000%


As shown in the examples below, **this new function shows how incredibly unlikely it is to win the lottery**, even for tickets fortunate enough to have won once in 35+ years.

## **Multi-ticket probability**

To help players who are addicted to playing the lottery and buy many different tickets to increase their chances of winning, we are going to build a function that shows the odds of winning depending on the amount of tickets in play:

In [9]:
def multi_ticket_probability(number_of_tickets):
    total_possible_tickets = combinations(49, 6)
    multi_probability = number_of_tickets / total_possible_tickets

    if number_of_tickets == 1:

        print(
            "The probability of winning the 6/49 lottery when playing "
            f"{number_of_tickets}" + " ticket"
            f" is {multi_probability * 100:.5f} %"
        )
    else:
        print(
            "The probability of winning the 6/49 lottery when playing "
            f"{number_of_tickets}" + " tickets"
            f" is {multi_probability * 100:.5f} %"
        )


# Testing function on different scenarios
multi_ticket_probability(1)
multi_ticket_probability(10)
multi_ticket_probability(40)
multi_ticket_probability(100)
multi_ticket_probability(10000)
multi_ticket_probability(100000)
multi_ticket_probability(1000000)
multi_ticket_probability(6991908)
multi_ticket_probability(13983816)

The probability of winning the 6/49 lottery when playing 1 ticket is 0.00001 %
The probability of winning the 6/49 lottery when playing 10 tickets is 0.00007 %
The probability of winning the 6/49 lottery when playing 40 tickets is 0.00029 %
The probability of winning the 6/49 lottery when playing 100 tickets is 0.00072 %
The probability of winning the 6/49 lottery when playing 10000 tickets is 0.07151 %
The probability of winning the 6/49 lottery when playing 100000 tickets is 0.71511 %
The probability of winning the 6/49 lottery when playing 1000000 tickets is 7.15112 %
The probability of winning the 6/49 lottery when playing 6991908 tickets is 50.00000 %
The probability of winning the 6/49 lottery when playing 13983816 tickets is 100.00000 %


The multi ticket probability function shows, again, how incredibly unlikely winning at the 6/49 lottery is. For example, just **to get a 1% winning chance, you would need to buy more than 100,000 tickets**.

## **Lesser prizes**

Most 6/49 lotteries award smaller prizes to tickets which get only 2, 3, 4 or 5 numbers right. Some users might be interested in the odds related to those prizes so we are going to write a function to calculate them:

In [30]:
# correct_numbers must be an integer between 2 and 5
def probability_less_6(correct_numbers):
    successful_outcomes = combinations(6, correct_numbers) * combinations(
        43, 6 - correct_numbers
    )

    possible_outcomes = combinations(49, 6)

    probability = successful_outcomes / possible_outcomes

    print(
        "The probability of getting right exactly "
        f"{correct_numbers}"
        " numbers when playing the 6/49 lottery"
        f" is {probability * 100:.5f} %"
    )


# Testing function on different scenarios
probability_less_6(5)
probability_less_6(4)
probability_less_6(3)
probability_less_6(2)

The probability of getting right exactly 5 numbers when playing the 6/49 lottery is 0.00184 %
The probability of getting right exactly 4 numbers when playing the 6/49 lottery is 0.09686 %
The probability of getting right exactly 3 numbers when playing the 6/49 lottery is 1.76504 %
The probability of getting right exactly 2 numbers when playing the 6/49 lottery is 13.23780 %
