## Lottery Project ##

For this project, we're going to write do an analysis of a [Canadaian Lottery Dataset](https://www.kaggle.com/datascienceai/lottery-dataset) from Kaggle that will enable players of the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) to answer:

* What is the probability of winning the big prize with a single ticket?
* What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
* What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

The dataset for 3,665 drawings, dating from 1982 to 2018

In [1]:
# Create factorial function 
def factorial(n):
    final_product = 1
    for integer in range(n, 0, -1):
        final_product *= integer
    return final_product

# Test factorial function with 3 (3 * 2 * 1) = 6
print(factorial(3))

# Create combinations function
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator / denominator

# Test combinations function with 52 and 5, like a game of poker
# with 5 card draw and 52 total cards. We expect a total combination
# of 2,598,960

print(combinations(52,5))

6
2598960.0


 ## Writing the function to calculate the probability of winning
    
To properly calculate the probabily of a lottery player winning, we need to create some functions keeping the following restrictions in mind:

* Inside the app, the user inputs six different numbers from 1 to 49.
* The six numbers will come as a Python list, which will serve as the single input to our function.
* We want the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.


In [2]:
def one_ticket_probability(input_list):
    if len(input_list) != 6:
        return "Please enter a list of six numbers"
    else:
        total_possible_outcomes = int(combinations(49, 6))
        successful_outcomes = 1
        winning_ticket_probability = successful_outcomes / total_possible_outcomes
        return "The probability of you winning the 6/49 lottery is {:2%}, or 1 in {} possible combinations.".format(winning_ticket_probability, total_possible_outcomes) 

one_ticket_probability([1,2,3,4,5,6])



'The probability of you winning the 6/49 lottery is 0.000007%, or 1 in 13983816 possible combinations.'

## one_ticket_probability function explained

In the above step, we created a function that used our combinations function to determine the total number of possible winning combinations inherent in the 6/49 lottery.

Then, we determined the probability of finding one winning ticket, over total possible outcomes, and used that to display to users their probability of winning with their numbers.

## Data analysis

Now we're going to explore the historical data from the 6/49 lottery so we can add the ability to tell users if their numbers have ever historically won

In [3]:
import pandas as pd

data = pd.read_csv("649.csv")

data.shape

(3665, 11)

In [4]:
data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [5]:
data.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [6]:
data.columns

Index(['PRODUCT', 'DRAW NUMBER', 'SEQUENCE NUMBER', 'DRAW DATE',
       'NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 'NUMBER DRAWN 4',
       'NUMBER DRAWN 5', 'NUMBER DRAWN 6', 'BONUS NUMBER'],
      dtype='object')

In [7]:
columns_to_rename = {"NUMBER DRAWN 1":"number_drawn_1", "NUMBER DRAWN 2":"number_drawn_2", "NUMBER DRAWN 3":"number_drawn_3", "NUMBER DRAWN 4":"number_drawn_4", "NUMBER DRAWN 5":"number_drawn_5", "NUMBER DRAWN 6":"number_drawn_6", "BONUS NUMBER":"bonus_number", "DRAW DATE":"draw_date", "SEQUENCE NUMBER":"sequence_number", "DRAW NUMBER":"draw_number"}
data = data.rename(columns=columns_to_rename)
                     
data.head()                     
                     

Unnamed: 0,PRODUCT,draw_number,sequence_number,draw_date,number_drawn_1,number_drawn_2,number_drawn_3,number_drawn_4,number_drawn_5,number_drawn_6,bonus_number
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [8]:
def extract_numbers(row):
    numbers = row[4:10]
    numbers = set(numbers.values)
    return numbers

In [9]:
lottery_numbers = data.apply(extract_numbers, axis=1)

In [10]:
lottery_numbers[0]

{3, 11, 12, 14, 41, 43}

In [11]:
from collections import Counter
def check_historical_occurence(user_list, historical_winning_numbers):
    user_list = set(user_list)
    historical_lottery_check = user_list == historical_winning_numbers
    number_of_winning_submissions = historical_lottery_check.sum()
    
    if len(user_list) != 6:
        return "Please enter your guess as a list of six numbers"
    
    if number_of_winning_submissions == 0:
        return "This combination has never been submitted as a guess in the 6/49 lottery. However you chances of winning are the same. You have a chance of 1 in 13983816 of winning."
    
    else:
        return "This combination has been submitted {} times as a guess in 6/49. However you chances of winning are the same. You have a chance of 1 in 13983816 of winning.".format(number_of_winning_submissions) 


In [12]:
check_historical_occurence([1,2,3,4,5,6],lottery_numbers)

'This combination has never been submitted as a guess in the 6/49 lottery. However you chances of winning are the same. You have a chance of 1 in 13983816 of winning.'

In [13]:
check_historical_occurence([3, 11, 12, 14, 41, 43],lottery_numbers)

'This combination has been submitted 1 times as a guess in 6/49. However you chances of winning are the same. You have a chance of 1 in 13983816 of winning.'

In [14]:
check_historical_occurence([3, 11, 12, 14, 41],lottery_numbers)

'Please enter your guess as a list of six numbers'

## check_historical_occurence function explained

In the above function, we:
    
1. Extracted all the historical winning lottery combinations from our dataset
1. Compared our user submitted values to all historical winning values
1. Depending on the outcome, we return a different message to our users letting them know their submission has or has not one before, but that the probability they win is still extremely insignificant, ~ 1 in 13,983,816 chances

In [15]:
def multi_ticket_probability(number_of_submissions):
    total_possible_outcomes = combinations(49, 6)
    total_possible_outcomes = int(total_possible_outcomes)
    possible_successful_outcomes = int(number_of_submissions)
    winning_percentage = possible_successful_outcomes / total_possible_outcomes
    
    return "The probability of your submissions winning the lottery is {} out of {} or {:2%}.".format(possible_successful_outcomes, total_possible_outcomes, winning_percentage)



In [16]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for number_of_guesses in test_inputs:
    print(multi_ticket_probability(number_of_guesses)+"\n")

The probability of your submissions winning the lottery is 1 out of 13983816 or 0.000007%.

The probability of your submissions winning the lottery is 10 out of 13983816 or 0.000072%.

The probability of your submissions winning the lottery is 100 out of 13983816 or 0.000715%.

The probability of your submissions winning the lottery is 10000 out of 13983816 or 0.071511%.

The probability of your submissions winning the lottery is 1000000 out of 13983816 or 7.151124%.

The probability of your submissions winning the lottery is 6991908 out of 13983816 or 50.000000%.

The probability of your submissions winning the lottery is 13983816 out of 13983816 or 100.000000%.



## multi_ticket_probability function explained

Above, we used a function that took an integer as the number of submissions a user may plan to play the lottery and returns to them their chances of winning the lottery.

We used a variety of test inputs to illustrate the number of lottery submissions it requires to increase the probability of winning, and to show how miniscule the chances of winning are.

## Probability of winning a smaller prize with 2-5 winning numbers

The lottery also gives you a smaller bonus prize if you have 2 or more matching numbers. We're going to wrtie a function that:

* Takes six different numbers from 1 to 49.
* Takes a second integer between 2 and 5 that represents the number of winning numbers expected
* Our function will print information about the probability of having the inputted number of winning numbers.

We'll need to consider the following probablity questions:
* What is the probability of having ***exactly*** five winning numbers?
* What is the probability of having ***at least*** five winning numbers?

In [54]:
def probability_less_6(number_winning_numbers):
    total_possible_outcomes = combinations(49, 6)
    
    total_possible_outcomes = int(total_possible_outcomes)
    
    number_of_combinations_ticket = combinations(6, number_winning_numbers)
    remaining_outcomes = combinations(43, 6-number_winning_numbers)

    number_of_successful_outcomes = number_of_combinations_ticket * remaining_outcomes
    
    number_of_successful_outcomes = int(number_of_successful_outcomes)
    
    winning_probability = number_of_successful_outcomes / total_possible_outcomes
    return '''You have {} potential successful outcomes, out of {} total possible outcomes, or a probability of {:2%}\n'''.format(number_of_successful_outcomes, total_possible_outcomes, winning_probability)



In [55]:
possible_guesses = [2,3,4,5]

for number in possible_guesses:
    print(probability_less_6(number))

You have 1851150 potential successful outcomes, out of 13983816 total possible outcomes, or a probability of 13.237803%

You have 246820 potential successful outcomes, out of 13983816 total possible outcomes, or a probability of 1.765040%

You have 13545 potential successful outcomes, out of 13983816 total possible outcomes, or a probability of 0.096862%

You have 258 potential successful outcomes, out of 13983816 total possible outcomes, or a probability of 0.001845%



## probability_less_6 function

In the above step, we determined the total possible outcomes for bonus guesses, since the lottery gives smaller bonus prizes for combinations of 2-5 winning numbers.

We determined:
    
1. The total possible outcomes of the lottery
1. The total possible successful outcomes, based on the user's number of possible smaller bonus prizes between 2-5
1. The number of remaining combinations after determining their total possible outcomes
1. Their winning probability, based on their totally possible successful outcomes and the remaining outcomes
    
    