<a href="https://github.com/lexx010?tab=repositories">
    <img src="img/github_sm.png">
</a>

## Project Overview: 6/49 Lottery Probability Core

A medical institute focused on preventing and treating gambling addiction is developing a mobile app to help lottery players better understand their chances of winning. They’ve asked us to build the **logical core** of the app for the 6/49 lottery.

The app should answer key probability questions such as:

* What is the probability of winning the jackpot with a **single ticket**?
* What are the chances of winning if a user plays **multiple tickets** (e.g., 40)?
* What is the probability of getting **at least 2, 3, 4, or 5 matching numbers** on a single ticket?

Our goal is to implement clean, reusable functions that calculate and clearly explain these probabilities.


### Create Foundational Functions

The first step in building the core logic for the lottery probability calculator is to implement two essential mathematical functions: one for calculating **factorials** and another for calculating **combinations**.

The **factorial** function should take a single positive integer `n` and return the product of all positive integers from 1 up to `n`. This is a basic mathematical operation used extensively in probability and combinatorics.

Next, implement a **combinations** function to calculate how many ways `k` items can be selected from a group of `n` items, regardless of order. This is commonly referred to as "n choose k" and is calculated using the factorial formula:

$$
C(n, k) = \frac{n!}{k!(n - k)!}
$$

These two functions will serve as the foundation for all further calculations related to lottery probabilities, including single-ticket odds, multiple-ticket scenarios, and partial matches.


In [3]:
import pandas as pd

In [4]:
# A function that calculates factorials
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

# A function that calculates combinations
def combinations(n,k):
     if k > n:
            raise ValueError("k cannot be greater than n")
     return factorial(n) // (factorial(k)*factorial(n-k))


### One-Ticket Probability

The next step is to calculate the probability of winning the big prize using a **single lottery ticket**, where each ticket contains **six unique numbers** chosen from a pool of numbers ranging from 1 to 49.

To do this, we need to write a function that takes the user’s selected numbers as input. These numbers will be provided as a **Python list** of six distinct integers between 1 and 49. The function should then compute the probability of matching all six numbers correctly in a 6/49 lottery draw, where only **one exact combination** results in a win.

The function must also **display the probability in a clear and user-friendly format**, so that even users without any background in statistics or probability can easily understand their chances of winning. This means presenting the result both as a percentage and as a simplified ratio (e.g., "1 in X").


In [5]:
def one_ticket_probability(user_numbers):
    # calculating the total number of possible outcomes
    total_outcomes = combinations(49, 6)
    # the number of successful outcomes is 1
    successful_outcomes = 1
    # the number of successful outcomes and the total number of possible outcomes to calculate the probability for one ticket.
    probability = successful_outcomes / total_outcomes
    percentage = probability * 100

    #output
    print(f"Your numbers: {sorted(user_numbers)}")
    print("Probability of winning the 6/49 lottery with this ticket:")
    print(f"{percentage:.8f}% chance - or abount 1 in {total_outcomes:,}.")

one_ticket_probability([1, 2, 3, 4, 5, 6])




Your numbers: [1, 2, 3, 4, 5, 6]
Probability of winning the 6/49 lottery with this ticket:
0.00000715% chance - or abount 1 in 13,983,816.


### Historical Data Check for Canada Lottery

In addition to calculating probabilities, the first version of the app should allow users to compare their chosen ticket against **historical lottery data** from Canada. This feature helps users see whether their specific combination of numbers would have ever won in the past.

The dataset we'll use comes from [Kaggle](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) and contains results from **3,665 official 6/49 lottery drawings**, dating from **1982 to 2018**. Each row in the dataset represents a single drawing and includes the six winning numbers in the following columns:

* NUMBER DRAWN 1
* NUMBER DRAWN 2
* NUMBER DRAWN 3
* NUMBER DRAWN 4
* NUMBER DRAWN 5
* NUMBER DRAWN 6

The app should be able to check if the user’s chosen combination has **ever appeared** in these past drawings and display how many times (if any) it occurred. This adds context to the probability calculations and enhances user awareness about historical outcomes.


In [6]:
# Load the CSV file and print its shape
df = pd.read_csv("649.csv")
print(f"Number of rows: {df.shape[0]}")
print(f"Number of columns: {df.shape[1]}")

Number of rows: 3665
Number of columns: 11


In [7]:
df.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [8]:
df.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17



### Function for Historical Data Check

To help users better understand how their chosen numbers have performed in the past, we will create a function that checks the **historical lottery data** for exact matches.

This function will take two inputs:

* A list of six numbers selected by the user
* A Series containing sets of winning numbers from historical draws

The function will perform two main tasks:

1. **Check how many times** the user’s exact combination has appeared in the historical Canada lottery data
2. **Display the probability** of winning the big prize in the next 6/49 drawing using that same combination

The results should be printed in a **clear and easy-to-understand format**, so users can see both their historical “success rate” and the odds of winning going forward. This provides a balance between data-driven insight and probability awareness.


In [9]:
# Extract all the winning six numbers from the historical data set
def extract_numbers(row):
    winning_numbers = row[4:10]
    return set(winning_numbers.values)

winning_numbers = df.apply(extract_numbers, axis=1)
winning_numbers.head()


0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [10]:
def check_historical_occurence(user_numbers, historical_winning_numbers):
# Convert the user numbers list as a set using the set() function.
    usee_numbers_set = set(user_numbers)
# Compare the set against the pandas Series 
    matches = historical_winning_numbers == usee_numbers_set
    match_count = matches.sum()
    total_draws = len(historical_winning_numbers)

    print(f"Your numbers: {sorted(user_numbers)}")
    print(f"The combination occured {match_count} time(s) in {total_draws}.")

    total_combinations = combinations(49, 6)
    probability = 1 / total_combinations
   
    print(f"Probability of winning with this exact combination: {probability * 100:.8f}%.")
    print(f"That’s about 1 in {total_combinations:,}.")


In [11]:
user_numbers = {3, 41, 11, 12, 43, 14}
check_historical_occurence(user_numbers, winning_numbers)

Your numbers: [3, 11, 12, 14, 41, 43]
The combination occured 1 time(s) in 3665.
Probability of winning with this exact combination: 0.00000715%.
That’s about 1 in 13,983,816.


### Multi-ticket Probability

To help users understand how playing multiple tickets affects their chances of winning, we'll create a function that estimates the **probability of winning the jackpot** based on the number of different tickets played.

* The input will be a single integer between **1 and 13,983,816** (the total number of possible 6/49 combinations).
* The function will calculate and **display the probability of winning** with that many tickets, shown as both a **percentage** and a simplified **"1 in X"** format.

This allows users to clearly see how much (or how little) their odds improve with more tickets.




In [12]:
def multi_ticket_probability(tickets_list):
    #  calculating the total number of possible outcomes
    total_outcomes = combinations(49,6)

    #  number of successful outcomes

    for tickets in tickets_list:
        if tickets > total_outcomes:
            probability = 1.0
        else:
            probability = tickets / total_outcomes

        percentage = probability * 100
        odds = f"1 in {int(1 / probability):,}" if probability !=0 else "infinity"

        print(f"Playing {tickets:,} ticket(s) gives you a {percentage:.8f}% chance of winning ({odds}).")
        
test_tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
multi_ticket_probability(test_tickets)
            
        

Playing 1 ticket(s) gives you a 0.00000715% chance of winning (1 in 13,983,816).
Playing 10 ticket(s) gives you a 0.00007151% chance of winning (1 in 1,398,381).
Playing 100 ticket(s) gives you a 0.00071511% chance of winning (1 in 139,838).
Playing 10,000 ticket(s) gives you a 0.07151124% chance of winning (1 in 1,398).
Playing 1,000,000 ticket(s) gives you a 7.15112384% chance of winning (1 in 13).
Playing 6,991,908 ticket(s) gives you a 50.00000000% chance of winning (1 in 2).
Playing 13,983,816 ticket(s) gives you a 100.00000000% chance of winning (1 in 1).


### Probability for Fewer Winning Numbers

Users may want to know their chances of matching **two, three, four, or five** numbers out of the six drawn, as many 6/49 lotteries offer smaller prizes for these partial matches.

To support this, we will create a function that:

* Accepts as input the user’s **six unique numbers** (from 1 to 49)
* Accepts an integer between **2 and 5** indicating how many matching numbers to calculate the probability for

The function will then compute and display the probability of having exactly that many winning numbers on a ticket, helping users understand their odds for smaller prizes.


In [13]:
#  A function takes in an integer between 2 and 5 and prints information 
#  about the chances of winning depending on the value of that integer.  

def probability_less_6(k):
    if k < 2 or k > 5:
        print(" This function only works for k values between 2 and 5.")
        return
    
    # calculate the number of successful outcomes 
    favorable_outcomes = combinations(6, k) * combinations(43, 6-k)

    # calculate the number of total possible outcomes.
    total_outcomes = combinations(49, 6)

    probability = favorable_outcomes / total_outcomes
    percentage = probability * 100

    print(f"Probability of matching exactly {k} numbers in the 6/49 lottery:")
    print(f"{percentage:.2f}% — that's about 1 in {int(1 / probability):,}")

# test_k_values = [2, 3, 4, 5]
for k in [2, 3, 4, 5]:
    probability_less_6(k)
    print("-" * 60)



Probability of matching exactly 2 numbers in the 6/49 lottery:
13.24% — that's about 1 in 7
------------------------------------------------------------
Probability of matching exactly 3 numbers in the 6/49 lottery:
1.77% — that's about 1 in 56
------------------------------------------------------------
Probability of matching exactly 4 numbers in the 6/49 lottery:
0.10% — that's about 1 in 1,032
------------------------------------------------------------
Probability of matching exactly 5 numbers in the 6/49 lottery:
0.00% — that's about 1 in 54,200
------------------------------------------------------------


### Combining Probability and Historical Occurrence

To give users a complete understanding of their lottery ticket’s prospects, we will create a combined function that:

* Takes six user-selected numbers as input
* Calculates and prints the **probability of winning the jackpot** with those numbers
* Checks and prints the **number of times that exact combination has appeared** in the historical lottery data

The function will present both pieces of information in a concise, easy-to-understand format, showing users their odds of winning in the future along with the historical success of their chosen numbers. This combined output makes the app more informative and user-friendly.


In [14]:
def display_ticket_info(user_numbers, historical_winning_numbers):
    # calculate total combinations and probability
    total_combinations = combinations(49, 6)
    probability = 1 / total_combinations
    percentage = probability * 100

    # check if user's combination occurred in the past
    user_set = set(user_numbers)
    matches = historical_winning_numbers == user_set
    match_count = matches.sum()
    total_draws = len(historical_winning_numbers)

    #display results
    print(f" Your numbers: {sorted(user_numbers)}")
    print(f" This exact combination has occurred {match_count} time(s) in {total_draws} draws.")
    print(f" Probability of winning the 6/49 lottery with these numbers: {percentage:.8f}%")
    print(f" That’s about 1 in {total_combinations:,}")

In [None]:
user_numbers = [3, 41, 11, 12, 43, 14]
display_ticket_info(user_numbers, winning_numbers)

 Your numbers: [3, 11, 12, 14, 41, 43]
 This exact combination has occurred 1 time(s) in 3665 draws.
 Probability of winning the 6/49 lottery with these numbers: 0.00000715%
 That’s about 1 in 13,983,816
