# How many chances do you have to win a lottery?

In this guided project we are going to estimate chances to win in 6/49 lottery.

Three main questions we are going to answer are:
1. What is the probability of winning the big prize with a single ticket?
2. What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
3. What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

As a data source we are going to consider data from the national 6/49 lottery game in Canada. It has data for 3665 drawings from 1982 to 2018. The data set could be found [here](https://www.kaggle.com/datascienceai/lottery-dataset)

## Basic functions

We are going to create 2 basic functions:
1. **Factorial** - returns a number of possible outcomes (**permutuals** - where the order of each number matters) for an arrangement of **n-numbers** when sampling without replacing
2. **Combinations** - returns a number of possible combinations (the order of each number doesn't matter) for a set of **k-numbers** taken from an arrangement of **n-numbers**

In [1]:
def factorial(n):
    result = 1
    for i in range(n,0,-1):
        result *= i
    return result

def combinations(n,k):
    return factorial(n) / (factorial(k) * factorial(n-k))

## 1. The probability of winning a prize.
It doesn't matter which combination we choose, the probability of winning is being calculated by dividing the number of successful outcomes (always **1**) by the number of all possible outcomes

In [2]:
num_outcomes = combinations(49,6)
p_win = 1 / num_outcomes
p_win_percentage = p_win * 100
print('''The probability of winning in "6 from 49" is {:.6f} %
In other words, you will win an 1 of {} cases'''.format(p_win_percentage,
                                                        int(num_outcomes)
                                                       )
     )

The probability of winning in "6 from 49" is 0.000007 %
In other words, you will win an 1 of 13983816 cases


## Additional task - checking how many times a given combination had won in the past

### Exploring the data

In [3]:
import pandas as pd
hist_results = pd.read_csv('649.csv')
print(hist_results.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB
None


In [4]:
hist_results.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [5]:
hist_results.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


### Extracting winning sets

In [6]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_sets = hist_results.apply(extract_numbers, axis=1)
winning_sets[:3]

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
dtype: object

### Checking historical occurence

In [7]:
def check_historical_occurence(users_numbers):
    users_numbers = set(users_numbers)
    matches = 0
    for result in winning_sets:
        if users_numbers == result:
            matches += 1
    print('The combination "{}" occured {} times in the past'.format(users_numbers, matches)) 

check_historical_occurence([11, 41, 14, 12, 43,3])

The combination "{3, 41, 11, 12, 43, 14}" occured 1 times in the past


## 2. Multi-ticket winning probability
Our second goal is to calculate the probability of winning when buying more than one tickets. It is counted by simply multiplying the probability of winning with one ticket by the number of tickets bought.

In [8]:
def multi_ticket_probability(num_tickets):
    num_outcomes = combinations(49,6)
    winning_probability = num_tickets / num_outcomes
    winning_probability_percentage = winning_probability * 100
    print('The probability of winning with {} tickets is {:.6f} %'.format(num_tickets, 
                                                                          winning_probability_percentage
                                                                         ))
multi_ticket_probability(1)
multi_ticket_probability(2)     
multi_ticket_probability(10)     
multi_ticket_probability(100)  
multi_ticket_probability(1000)     
multi_ticket_probability(139839)     

The probability of winning with 1 tickets is 0.000007 %
The probability of winning with 2 tickets is 0.000014 %
The probability of winning with 10 tickets is 0.000072 %
The probability of winning with 100 tickets is 0.000715 %
The probability of winning with 1000 tickets is 0.007151 %
The probability of winning with 139839 tickets is 1.000006 %


# 3. Probability of having at least five (or four, or three, or two) winning numbers on a single ticket

## 3.1 Probability of having *exactly* five (or four, or three, or two) winning numbers on a single ticket

In [9]:
def probability_exactly_less_6(num):
    num_comb_int = combinations(6,num)
    num_comb_out = combinations(43, 6-num)
    tot_successful_combinations = num_comb_int * num_comb_out
    
    tot_combinations = combinations(49,6)
    
    p_win = tot_successful_combinations / tot_combinations
    p_win_percentage = p_win * 100
    
    num_comb_simplified = round(tot_combinations / tot_successful_combinations)
    print('''The probability of matching exactly {} out from 6 is {:.4f} %
Or 1 chance from {} \n'''.format(num,
                                         p_win_percentage,
                                         num_comb_simplified
                                        )
        )

for i in range (2,6):
    probability_exactly_less_6(i)

The probability of matching exactly 2 out from 6 is 13.2378 %
Or 1 chance from 8 

The probability of matching exactly 3 out from 6 is 1.7650 %
Or 1 chance from 57 

The probability of matching exactly 4 out from 6 is 0.0969 %
Or 1 chance from 1032 

The probability of matching exactly 5 out from 6 is 0.0018 %
Or 1 chance from 54201 



## 3.2 Probability of having *at least* five (or four, or three, or two) winning numbers on a single ticket

*At least* 4 numbers = *exactly* 4 numbers or *exactly* 5 numbers or *exactly* 6 numbers

In [10]:
def probability_at_least_less_6(num):
    
    tot_successful_combinations = 0
    for n in range(num,7):
        num_comb_int = combinations(6,n)
        num_comb_out = combinations(43, 6-n)
        tot_successful_combinations += num_comb_int * num_comb_out

    tot_combinations = combinations(49,6)
    
    p_win = tot_successful_combinations / tot_combinations
    p_win_percentage = p_win * 100
    num_comb_simplified = round(tot_combinations / tot_successful_combinations)
    print('''The probability of matching at least {} out from 6 is {:.4f} %
Or 1 chance from {} \n'''.format(num,
                                         p_win_percentage,
                                         num_comb_simplified
                                        )
         )

for i in range (2,6):
    probability_at_least_less_6(i)

The probability of matching at least 2 out from 6 is 15.1016 %
Or 1 chance from 7 

The probability of matching at least 3 out from 6 is 1.8638 %
Or 1 chance from 54 

The probability of matching at least 4 out from 6 is 0.0987 %
Or 1 chance from 1013 

The probability of matching at least 5 out from 6 is 0.0019 %
Or 1 chance from 53992 

