# Investigation into reducing lottery addiction by understanding probabilities

## Introduction
In this project, we are going to contribute to the development of a mobile app by writing a couple of functions that is meant to help lottery addicts better estimate their chances of winning and, hopefully, to prevent them from this dangerous habit.

We'll focus on the [6/49 lottery](https://www.wikiwand.com/en/Lotto_6/49), where six numbers are drawn from a set of 49 (from 1 to 49) for each ticket, and a player wins the big prize if the six numbers on their tickets match all the six numbers drawn. Our goal is to create the logical core of the app and build functions that enable users to answer these questions:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The historical data used in this project comes from the national 6/49 lottery game in Canada. The [dataset](https://www.kaggle.com/datascienceai/lottery-dataset) counts 3,665 drawings, dating from 1982 to 2018.


# Core Functions
We're going to write two functions that we'll be using frequently:
- `factorial()` - a function that calculates factorials
- `combinations()`- a function that calculates combinations

In [1]:
# Function to find factorial
def factorial(n):
    product = 1
    for i in range(n):
        product *= i+1
    return product

# Function to find combination
def combinations(n,k):
    return factorial(n)/(factorial(k)*factorial(n-k))

Now we can use these functions to calculate the probability of winning the 6/49 lottery with just one ticket. The user can input a list of 6 numbers and see how likely they are to win.

# One-Ticket Probability

In [2]:
# Function to find probability of winning lottery with one ticket
def one_ticket_probability(list_of_6):
    # Only accept lists of 6
    if len(list_of_6) != 6:
        return "Please enter 6 numbers"
    # Only accept whole numbers
    for n in list_of_6:
        if type(n) != int:
            return "Must be whole numbers"
        # Only accept numbers between 1 and 49
        if n == 0 or n > 49:
            return "Numbers must be between 1 and 49"
    # Total outcomes
    outcomes = combinations(49, len(list_of_6))
    # Total successful ooutcomes
    succesful_outcomes = 1
    percentage = (succesful_outcomes / outcomes) * 100
    return "You have a {:8f}% chance of winning the lottery with these numbers.".format(percentage)

test = one_ticket_probability([1, 2, 3, 4, 5, 7])
test

'You have a 0.000007% chance of winning the lottery with these numbers.'

# Historical Data Check for Canada Lottery
We wrote a function that tells users what the probability of winning the big prize with a single ticket. Another feature of our application should let users compare their ticket against the historical lottery data in Canada and determine whetehert they would have ever won by now. We're going to write a function to implement this feature. 

Before we do that, we'll do necessary inspection and get familiar with the dataset and its structure.

In [3]:
import pandas as pd
lottery = pd.read_csv('649.csv')

print('Number of rows: ', lottery.shape[0], '\nNumber of columns: ', lottery.shape[1])
lottery.head(3)

Number of rows:  3665 
Number of columns:  11


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [4]:
lottery.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


The dataframe contains 11 columns with self-explanatory names, including the columns for each of the six drawn numbers + a bonus. There are no missing values in the dataframe.

## Function for Historical Data Check
Now, let's write function for comparing any ticket with historical lottery data in Canada. The output will be:
- the number of times the combination selected occured in the Canada data set; and
- the probability of winning the big prize in the next drawing with that combination.

In [6]:
# Extract lottery draws from dataset
def extract_numbers(row):
    the_list = []
    for i in row[4:10]:
        the_list.append(i)
    return set(the_list)

# Apply function to Dataset
winning_numbers = lottery.apply(extract_numbers, axis=1)

# Function to find if numbers have ever won before
def check_historical_occurence(user_nums):
    user_nums = set(user_nums)
    matches = 0
    for sets in winning_numbers:
        if user_nums == sets:
            matches += 1
    return 'Your numbers have won the lottery the following number of times: {} .'.format(matches)

# Test
test = check_historical_occurence([4, 41, 11, 12, 43, 14])
test

'Your numbers have won the lottery the following number of times: 0 .'

The next function will take in a list of 6 numbers and return both the probability of winning and the number of times those numbers those numbers have won in past draws.

In [8]:
# Function to show probability and historic occurences
def probability_and_historic(user_nums):
    a = check_historical_occurence(user_nums)
    b = one_ticket_probability(user_nums)
    return """
    {}
    
    {}
    """.format(a, b)
# Test
print(probability_and_historic([4, 41, 11, 12, 43, 14]))


    Your numbers have won the lottery the following number of times: 0 .
    
    You have a 0.000007% chance of winning the lottery with these numbers.
    


The odds are small. However, some people buy multiple tickets. Let's see if this improves their odds:

## Multi-ticket Probability


In [9]:
# Function to check probability with multiple tickets
def multi_ticket_probability(n):
    outcomes = combinations(49, 6)
    success_outcomes = n
    percentage = (success_outcomes / outcomes) * 100
    return "You have a {:8f}% chance of winning the lottery with {} tickets".format(percentage, n)

# Test
for i in range(10000, 100000, 20000):
    print(multi_ticket_probability(i))
    print("-----------------------------------------------------")

You have a 0.071511% chance of winning the lottery with 10000 tickets
-----------------------------------------------------
You have a 0.214534% chance of winning the lottery with 30000 tickets
-----------------------------------------------------
You have a 0.357556% chance of winning the lottery with 50000 tickets
-----------------------------------------------------
You have a 0.500579% chance of winning the lottery with 70000 tickets
-----------------------------------------------------
You have a 0.643601% chance of winning the lottery with 90000 tickets
-----------------------------------------------------


## Less Winning Numbers - Function
Now we're going to create a function that calculates the probability of having two, three, four or five winning numbers.

In [10]:
# Function to check probabilites of matching 5, 4, 3 and 2 numbers
def probability_less_6(n):
    if n < 2 or n > 6 or type(n) != int:
        return "Invalid number"
    
    number_combinations = combinations(6, n)
    success_per_combination = combinations(43, 6-n) 
    
    successful_outcomes = number_combinations * success_per_combination
    outcomes = combinations(49, 6)
    
    percentage = (successful_outcomes / outcomes) * 100
    return "You have a {:8f}% chance of matching {} numbers".format(percentage, n)

# Test
for i in range(5, 1, -1):
    print(probability_less_6(i))

You have a 0.001845% chance of matching 5 numbers
You have a 0.096862% chance of matching 4 numbers
You have a 1.765040% chance of matching 3 numbers
You have a 13.237803% chance of matching 2 numbers


In [11]:
# Function to check probabilites of matching at least 5, 4, 3 and 2 numbers
def probability_at_least(n):
    if n < 2 or n > 6 or type(n) != int:
        return "Invalid number"
    
    number_combinations = combinations(6, n)
    success_per_combination = combinations(44, 6-n) 
    
    successful_outcomes = number_combinations * success_per_combination
    outcomes = combinations(49, 6)
    
    percentage = (successful_outcomes / outcomes) * 100
    return "You have a {:8f}% chance of matching at least {} numbers".format(percentage, n)

# Test
for i in range(5, 1, -1):
    print(probability_at_least(i))
    print("-----------------------------------------------------")

You have a 0.001888% chance of matching at least 5 numbers
-----------------------------------------------------
You have a 0.101474% chance of matching at least 4 numbers
-----------------------------------------------------
You have a 1.894190% chance of matching at least 3 numbers
-----------------------------------------------------
You have a 14.561583% chance of matching at least 2 numbers
-----------------------------------------------------


## Compare Odds to Other Events

In [12]:
# Import random module
import random

# Lottery outcomes
lottery = 13983816

# Comparison function
def compare_odds():
    n = random.randint(1, 7)
    if n == 1:
        return "You are {} times more likely to be killed by a shark than win the 6/49 lottery".format(round(lottery 
                                                                                                       / 3700000, 2))
    if n == 2:
        return "You are {} times more likely to succesfully apply to be a NASA Astronaut than win the 6/49 lottery".format(round(lottery 
                                                                                                       / 1525, 2))
    if n == 3:
        return "You are {} times more likely to be injured by a toilet than win the 6/49 lottery".format(round(lottery 
                                                                                                       / 10000, 2))
    if n == 4:
        return "You are {} times more likely to be struck by lightning than win the 6/49 lottery".format(round(lottery 
                                                                                                       / 114195, 2))
    if n == 5:
        return "You are {} times more likely to meet the love of your life tomorrow than win the 6/49 lottery".format(round(lottery 
                                                                                                       / 562, 2))
    if n == 6:
        return "You are {} times more likely to win an Olympic Gold than win the 6/49 lottery".format(round(lottery / 662000, 2))
    if n == 7:
        return "You are {} times more likely to win an Oscar than win the 6/49 lottery".format(round(lottery / 3700000, 2))

# Test
compare_odds()

'You are 3.78 times more likely to be killed by a shark than win the 6/49 lottery'

# Conclusion
The odds are to win the 6/49 are very bad. Even if you buy many tickets or are happy to match a few numbers. The medical institute need to use the odds, create the app and help those who are adddicted.
