# Probability of Winning the Lottery

In this notebook I will explore the probability of winning the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49).

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, the player only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, the player doesn't win.

The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

## Core Functions

Below, I'll  write two functions that I'll be using frequently:
- `factorial()` — a function that calculates factorials
- `combinations()` — a function that calculates combinations

In [1]:
def factorial(n):
    final_product =1
    for i in range(n,0,-1):
        final_product*=i
    return final_product

def combinations(n,k):
    numerator = factorial(n)
    denominator = factorial(k)*factorial(n-k)
    return numerator/denominator

##  Win the Big Prize with One Ticket

Below I'll build a function that calculates the probability of winning the big prize for any given ticket. For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers.

The `one_ticket_probability()` function, takes in a list of six unique numbers and prints the probability of winning in a way that's easy to understand.

In [2]:
def one_ticket_probability(list_of_6):
    num_combinations = combinations(49,6)
    probability_correct = 1/num_combinations
    porb_corr_percent = probability_correct*100
    print('You have {:.7f}% chance of winning the big prize, or a 1 in {:,.0f} chance, with your selction of {}'.format(porb_corr_percent, num_combinations, list_of_6))

In [3]:
test_input_1 = [30, 29, 10, 8, 3, 2]
one_ticket_probability(test_input_1)

You have 0.0000072% chance of winning the big prize, or a 1 in 13,983,816 chance, with your selction of [30, 29, 10, 8, 3, 2]


In [4]:
test_input_1 = [1, 2, 3, 4, 5, 6]
one_ticket_probability(test_input_1)

You have 0.0000072% chance of winning the big prize, or a 1 in 13,983,816 chance, with your selction of [1, 2, 3, 4, 5, 6]


## Has the ticket won before?

Needless to say chances of winning the big prize are slim. Now I'll build a functino that checks if a ticket is a past winner.

The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018.

### Explore Data

In [5]:
import pandas as pd
history = pd.read_csv('649.csv')

In [6]:
print(history.shape)
history.head(3)

(3665, 11)


Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [7]:
history.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


### Build Function

First I'll extract all the winning numbers from the lottery data set. The `extract_numbers()` function will go over each row of the dataframe and extract the six winning numbers as a Python set.

In [8]:
def extract_numbers(row):
    row = row[4:10]
    numbers = set(row.values)
    return numbers

past_winners = history.apply(extract_numbers, axis=1)
past_winners.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

Now I'll write the `check_historical_occurrence()` function that takes in the users numbers and the historical numbers and prints information with respect to the number of occurrences and the probability of winning in the next drawing.

In [9]:
def check_historical_occurrence(list_of_6, past_winners):
    """
    list_of_6: a python list
    past_winners: a pandas Series
    """
    set_of_6 = set(list_of_6)
    occurrence = past_winners == set_of_6
    num_occurrences = occurrence.sum()
    if num_occurrences == 0:
        print('''Your combination of {} has never won.
This doesn't mean your cobination is more likely to win now. 
Your chances of winning the big prize in the next drawing with this combination is 0.0000072%, or 1 in 13,983,816.
        '''.format(list_of_6))
    else:
        print('''Your combination of {} has won {} time(s). 
This doesn't mean your cobination is more likely to win now. 
Your chances of winning the big prize in the next drawing with this combination is 0.0000072%, or 1 in 13,983,816.
        '''.format(list_of_6, num_occurrences))

In [10]:
test_input_3 = [33, 36, 37, 39, 8, 41]
check_historical_occurrence(test_input_3, past_winners)

Your combination of [33, 36, 37, 39, 8, 41] has won 1 time(s). 
This doesn't mean your cobination is more likely to win now. 
Your chances of winning the big prize in the next drawing with this combination is 0.0000072%, or 1 in 13,983,816.
        


In [11]:
test_input_3 = [13, 26, 17, 9, 28, 49]
check_historical_occurrence(test_input_3, past_winners)

Your combination of [13, 26, 17, 9, 28, 49] has never won.
This doesn't mean your cobination is more likely to win now. 
Your chances of winning the big prize in the next drawing with this combination is 0.0000072%, or 1 in 13,983,816.
        


# Win with multiple tickets

Now I'll explore the probability of winning with multiple different tickets. For instance, what is the chance of winning the big prize if the player plays 15 different tickets

The `multi_ticket_probability()` function below takes in the number of tickets and prints probability information depending on the input.

In [12]:
def multi_ticket_probability(num_tickets):
    '''
    tickets: number of tickets playing as int
    '''
    num_combinations = combinations(49, 6)
    probability = num_tickets/num_combinations
    prob_perc = probability*100
    combos = num_combinations/num_tickets
    print(''' If you play {} ticket(s) your chance of winning the big prize is {:.7f}%, or 1 in {:,.0f} 
    '''.format(num_tickets,prob_perc,combos))
        
    

Now I'll run a few test on the function.

In [13]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter


 If you play 1 ticket(s) your chance of winning the big prize is 0.0000072%, or 1 in 13,983,816 
    
------------------------
 If you play 10 ticket(s) your chance of winning the big prize is 0.0000715%, or 1 in 1,398,382 
    
------------------------
 If you play 100 ticket(s) your chance of winning the big prize is 0.0007151%, or 1 in 139,838 
    
------------------------
 If you play 10000 ticket(s) your chance of winning the big prize is 0.0715112%, or 1 in 1,398 
    
------------------------
 If you play 1000000 ticket(s) your chance of winning the big prize is 7.1511238%, or 1 in 14 
    
------------------------
 If you play 6991908 ticket(s) your chance of winning the big prize is 50.0000000%, or 1 in 2 
    
------------------------
 If you play 13983816 ticket(s) your chance of winning the big prize is 100.0000000%, or 1 in 1 
    
------------------------


# Winning Smaller Prizes
In most 6/49 lotteries, there are smaller prizes if a player's ticket matches two, three, four, or five of the six numbers drawn. 

The function below, `probability_less_6()`, calculates the probability that a player's ticket matches exactly the given number of winning numbers. If the player wants to find out the probability of having five winning numbers, the function will return the probability of having five winning numbers exactly (no more and no less). The function will not return the probability of having at least five winning numbers.

In [14]:
def probability_less_6(num_of_winners):
    """
    num_of_winners: an integer between 2 and 5 representing the number of winning numbers expected.
    """
    n_combinations_ticket = combinations(6, num_of_winners)
    n_combinations_remaining = combinations(43, 6 - num_of_winners)
    successful_outcomes = n_combinations_ticket * n_combinations_remaining
    
    n_combinations_total = combinations(49, 6)    
    probability = successful_outcomes / n_combinations_total
    
    probability_percentage = probability * 100    
    combos = round(n_combinations_total/successful_outcomes)
    
    print('''Your chances of having {} winning numbers with this ticket are {:.6f}% or 1 in {:,}.
    '''.format(num_of_winners, probability_percentage, int(combos)))

Now, I'll test the function on all the possible inputs.

In [15]:
for test_input in [2, 3, 4, 5]:
    probability_less_6(test_input)
    print('--------------------------') # output delimiter

Your chances of having 2 winning numbers with this ticket are 13.237803% or 1 in 8.
    
--------------------------
Your chances of having 3 winning numbers with this ticket are 1.765040% or 1 in 57.
    
--------------------------
Your chances of having 4 winning numbers with this ticket are 0.096862% or 1 in 1,032.
    
--------------------------
Your chances of having 5 winning numbers with this ticket are 0.001845% or 1 in 54,201.
    
--------------------------


## Conclusions
In this notebook I explored the probability of winning the 6/49 lottery. To do this I created four main functions described below.
- `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket
- `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set
- `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816
- `probability_less_6()` — calculates the probability of having two, three, four or five winning numbers exactly

The notebook is based on a guided project from Dataquest, an online Data Science bootcamp. The learning goal of the project was to test understanding of theoretical and empirical probabilities, probability rules, and combinations and permutations.