# Lottery Addiction Project

### Introduction

It comes as no surprise that gambling addiction, while not necessarily detrimental to an individuals health, is one of the worst forms of addiction that an individual can succumb to. With the constant legalization of sports betting in particular, it appears that more people will unfortunately become addicted to gambling.

However, as a Data Scientist, Probability and Statistics are two of our most powerful weapons to generate meaningful insight out of data. We will use these tools to create a makeshift app that operates on the following logic to help these addicts:
- What is the probability of winning the big prize in a single ticket?
- What is the probability of winning a prize if we play 40 tickets?
- What is the probability of having at lest 5 winning numbers on a single lottery ticket?

The dataset we will be using comes from the 6/49 lottery game from Canada and has 3,665 drawings dating back to 1982 up to 2018.

### Preprocessing

Some of the most useful formulas for these types of problems are the combination and permutation formulas. As a result, we will create functions in this step that applies those formulas instantly:

In [1]:
# First we create a factorial function
def factorial(n):
    product = 1
    for i in range(1, n+1):
        product = product * i
        i += 1
    return product

In [2]:
# Let's test our function to make sure it works: 5! and 9! equal 120 and 362,880
print(factorial(5))
print(factorial(9))

120
362880


In [3]:
# Now we create a function that computes our combination formula
def combination(n, k):
    formula = factorial(n) / (factorial(k) * factorial(n-k))
    return formula

# Again, let's check that it works C(5,3) = 10
print(combination(5, 3))

10.0


### Problem Setup

Recall that the dataset we are working with is from the 6/49 lottery game in Canada. The first thing to explain are the rules surrounding the competition.

A player selects 6 numbers ranging form 1-49. Six numbers are then drawn later in the evening, and if a player has selected the 6 numbers drawn on their ticket, they win the jackpot. 

### Probability Calculations

Recall that our first goal is to answer the question, "What is the probability of getting the jackpot on a single ticket?". We answer that question now.

In [4]:
def one_ticket_probability(six_number_list):
    total_possible_outcomes = combination(49, 6)  # Out of 49 numbers, we choose 6 hoping to win
    probability = (1 / total_possible_outcomes) * 100  # We are only right a single time
    return "The probability that {} is the right answer is {}%.".format(six_number_list, probability)

# We now test our function
print(one_ticket_probability([1, 2, 3, 4, 5, 6]))

The probability that [1, 2, 3, 4, 5, 6] is the right answer is 7.151123842018516e-06%.


In [5]:
# We now wish to view the historical data for the competition
import pandas as pd
lottery_history = pd.read_csv('649.csv')
print(lottery_history.shape)
lottery_history.head(3)

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [6]:
lottery_history.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


Thanks to the dataset imported above, we now can create a function that enables users to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

In [7]:
# First, we create a function that returns every 6-number combo in the set
def extract_numbers(index):
    row_set = []  # We create a set in the global scope to capture all rows
    row = lottery_history.iloc[index, 4:10]  # Index the dataframe using input
    for number in row:
        row_set.append(number)   # Add elements to the list
    return set(row_set) # Return a set version of the row

# This should correspond to the first row of the dataframe.
extract_numbers(0)

{3, 11, 12, 14, 41, 43}

In [8]:
# We now wish to use this function to extract all winning numbers
winning_numbers = []
for i in range(len(lottery_history)):
    winning_set = extract_numbers(i)
    winning_numbers.append(winning_set)

In [9]:
# We now have a list containing all the winning numbers. We convert it to a series.
winning_series = pd.Series(data=winning_numbers, index=lottery_history['DRAW DATE'])

In [10]:
# We now are ready to write a function that checks historical occurrences
def check_historical_occurence(number_list, winning_numbers):
    number_set = set(number_list)  # Convert our list into a set
    times_numbers_won = []
    for wins in winning_numbers:  # Iterate through historical wins
        if number_set == wins:  # If there is a match between the choice and win
            times_numbers_won.append(True)  # Append win as a True boolean to list
        else:
            times_numbers_won.append(False)  # Otherwise, append a value of false
    total_wins = sum(times_numbers_won)
    probability = (total_wins / 3665) * 100
    return 'Your choice of {} has won {} times. Your chance of winning with these numbers is {}'.format(number_set, total_wins, probability)

In [11]:
# We now test our function. Let's use the first row so we get a result
check_historical_occurence([3, 11, 12, 14, 41, 43], winning_series)

'Your choice of {3, 41, 11, 12, 43, 14} has won 1 times. Your chance of winning with these numbers is 0.027285129604365622'

While this is a good start, addicts will most likely play several tickets at once in an attempt to improve their chances of winnning. We need to take this situation into account. We'll write a function that allows the user to input any number of different tickets that they want to play. 

In [12]:
def multi_ticket_probability(number_of_tickets):
    total_outcomes = combination(49, 6)  # The total number of combinations
    successful_outcomes = number_of_tickets  # The potential times user wins
    probability = (successful_outcomes / total_outcomes) * 100  # The prbability of winning
    return "Your chances of winning with {} tickets are {}% ".format(number_of_tickets, probability)

In [13]:
# We also wish to test our function
tickets = [1, 10, 100, 1000, 100000, 6991908, 13983816]
for i in tickets:
    print(multi_ticket_probability(i))

Your chances of winning with 1 tickets are 7.151123842018516e-06% 
Your chances of winning with 10 tickets are 7.151123842018517e-05% 
Your chances of winning with 100 tickets are 0.0007151123842018516% 
Your chances of winning with 1000 tickets are 0.007151123842018516% 
Your chances of winning with 100000 tickets are 0.7151123842018516% 
Your chances of winning with 6991908 tickets are 50.0% 
Your chances of winning with 13983816 tickets are 100.0% 


It should be noted here that realistically, these are EXTREMELY low odds.

Thus far, we have answered two out of our three questions in the introduction. We now wish to write a function that calculates the probability of having 2, 3, 4, or 5 winning numbers. The comments in the function below explain how we accomplish this task. 

This is relevant because our competition has smaller prizes where you can still earn winnnings if you have 2 - 5 numbers correct for the evening drawing. 

In [14]:
def probability_less_6(integer):
    combinations = combination(6, integer)  # These are the number of ways that we can combine integer numbers out of 6
    successes = 49 - integer - 1  # These are the number of integers left after we have chosen our number
    total_wins = combinations * successes  # These are the number of wins that have our integer number of digits right
    probability = total_wins / combination(49, 6)
    return "The chances of winning with {} numbers right are {}".format(integer, probability)

In [15]:
# Let's test out our new function
integers = [2, 3, 4, 5]
for i in integers:
    print(probability_less_6(i))

The chances of winning with 2 numbers right are 4.9342754509927765e-05
The chances of winning with 3 numbers right are 6.436011457816664e-05
The chances of winning with 4 numbers right are 4.719741735732221e-05
The chances of winning with 5 numbers right are 1.8449899512407772e-05


## Conclusion

For this project, we were able to create functions that would be useful for showing gambling addicts just exactly how low their chances of winning are. We covered multiple scenarios. From choosing a single ticket, to purchasing hundreds of thousands, the probability a gambler has to actually win the prize for a game of 6/49 is very low. 

Unfortunatley, I still have a feeling that even though an addict knows how low the probability of them winning is, they will still do it anyway with the rationale of "There is always a chance". 

As Data Scientists, there is nothing we can do about that. 