## Project Scope: Probability of winning the lottery

In this project, we are going to setup functions that will calculate the probability of winning the lottery. The dataset is from Kaggle, focussing on the 6/49 lottery in the Canada.

The following functions will be created:
* factorial() - calculate the factorial of a number
* combinations() - calculate the number of combinations of a set of numbers

To address the following questions:
* What is the probability of winning the lottery with a single ticket?
* What is the probability of winning the lottery if we buy 100 tickets?
* What is the probability of having at least five (or four, or three) winning numbers on a single ticket? 

### Probability Explained in Math

Buying lottery tickets is an independent and discrete event, so we can use the binomial distribution to model the number of winning tickets. 

1. Factorial
* What it is: The factorial of a non-negative integer $n$, denoted by $n!$, is the product of all positive integers less than or equal to $n$. $$n! = n \times (n-1) \times \dots \times 2 \times 1$$
* Why it's used (Permutations): It calculates the total number of unique ways to order $n$ distinct items. This concept is called a permutation. For example, the number of ways to arrange 4 books on a shelf is $4! = 24$.

2. Combinations  ($\binom{n}{k}$ or $C(n, k)$)
* What it is: Combinations calculate the number of ways to choose a subset of $k$ items from a larger set of $n$ distinct items, where the order of selection is irrelevant.$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$
* Why it's used (Counting Subsets): It addresses questions like: "If I have 10 people, how many different groups of 3 can I form?" The group {A, B, C} is the same as {C, B, A}, which is why we divide out the permutations (using the factorials in the denominator).

3. Calculating Probability
$$P(E) = \frac{\text{Number of Favorable Outcomes}}{\text{Total Number of Possible Outcomes}}$$
* Total Possible Outcomes: Your combination function is used to calculate the size of the sample space—the denominator in the probability equation. For example, the total number of ways to choose 5 cards from a 52-card deck is $\binom{52}{5}$.
* Favorable Outcomes: Your combination function (and sometimes factorials, depending on the problem) is used to calculate the size of the event space—the numerator. For example, the number of ways to choose 5 red cards from the 26 available red cards is $\binom{26}{5}$.



### Core Functions

In [4]:
def factorial(n):
    if n ==0:
        return 1
    else:
        return n * factorial(n-1)

def combination(n, k):
    return factorial(n) / (factorial(k) * factorial(n-k))

### Take a peak at the data

In [5]:
import pandas as pd
df = pd.read_csv("649.csv")
df.head(1)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13


There are 49 numbers in total. For each draw, one of the number is selected, that left the total to 48 Draw 6 times, and 1 bonus draw.

### One-ticket probability

By the logic, for the first draw, I'd have 49 choices of numbers; the second draw, 48; the third, 47; the fourth, 46; the fifth 45; and the sixth, 44. 

That could be expressed as 49 * 48 * 47 * 46 * 45 * 44. In another words, 
$$\mathbf{P(49, 6)} = \frac{49!}{(49-6)!} = \frac{49!}{43!} = \mathbf{10,068,347,520}$$

However, the order of the numbers doesn't matter, so we need to divide by the number of ways to arrange 6 numbers, which is 6!. 
$$\mathbf{C(49, 6)} = \frac{49!}{6!(49-6)!} = \mathbf{13,983,816}$$

In [None]:
def one_ticket_probability():
    return (1/combination(49,6))*100 # here 1 means 1 ticket


For player, regardless of the numbers chosen for the single ticket, if the player only buy once, then the probability stays unchanged. 

### Multiple tickets probability

If a player wants to increase their odds of winning the big prize, they would be wondering the probability of winning the big prize if they buy more tickets.The following function will calculate the probability of winning the big prize if a player buys a certain number of tickets.

In [None]:
def multi_ticket_probability(number_of_tickets):
    if number_of_tickets == 1:
        prob = (1/combination(49,6))*100
    else:
        prob = (number_of_tickets/combination(49,6))*100
    print(f"The probability of winning the big prize with {number_of_tickets} tickets is {prob}%")

In [19]:
test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for test_input in test_inputs:
    multi_ticket_probability(test_input)
    print('------------------------') # output delimiter

The probability of winning the big prize with 1 tickets is 7.151123842018516e-06%
------------------------
The probability of winning the big prize with 10 tickets is 7.151123842018517e-05%
------------------------
The probability of winning the big prize with 100 tickets is 0.0007151123842018516%
------------------------
The probability of winning the big prize with 10000 tickets is 0.07151123842018516%
------------------------
The probability of winning the big prize with 1000000 tickets is 7.151123842018517%
------------------------
The probability of winning the big prize with 6991908 tickets is 50.0%
------------------------
The probability of winning the big prize with 13983816 tickets is 100.0%
------------------------


### Less winning numbers on a ticket

For a lot of cases, the player often chooses numbers that are drawn, but not all of them such that the player wins the big prize. There could be 1, 2, 3, 4, or 5 numbers that matches the ones drawn. 

Let's dissect the cases (If one number the player chose was drwan):
* You need to select exactly 1 number from the 6 winning numbers that were drawn. Number of ways to choose 1 matching number from the 6 winning numbers:$$\mathbf{C(6, 1)} = \frac{6!}{1!(6-1)!} = 6$$
* You need to select the remaining 5 numbers from the pool of numbers that did not win. Since there are 49 total numbers and 6 winning numbers, there are $49 - 6 = 43$ non-winning numbers. Number of ways to choose 5 non-matching numbers from the 43 non-winning numbers: $$\mathbf{C(43, 5)} = \frac{43!}{5!(43-5)!} = 962,598$$
* Favorable outcomes: $$\mathbf{C(6, 1) \times C(43, 5)} = 6 \times 962,598 = 5,775,588$$
* Probability of winning the big prize with a single ticket: $$P(\text{Exactly 1 Match}) = \frac{\mathbf{C(6, 1)} \times \mathbf{C(43, 5)}}{\mathbf{C(49, 6)}} = \frac{5,775,588}{13,983,816} \approx \mathbf{0.4129}$$

In [20]:
def less_numbers_probability(numbers):
    return (combination(6, numbers)*combination(43, 6-numbers)/combination(49,6))*100


In [None]:
for test_input in [2, 3, 4, 5]:
    print(f"Probability of matching exactly {test_input} numbers: {less_numbers_probability(test_input):.4f}%")
    print('--------------------------') # output delimiter

Probability of matching exactly 2 numbers: 13.2378%
--------------------------
Probability of matching exactly 3 numbers: 1.7650%
--------------------------
Probability of matching exactly 4 numbers: 0.0969%
--------------------------
Probability of matching exactly 5 numbers: 0.0018%
--------------------------
