# Odds of Winning the TOTO Prize
---

**Introduction:**

This project aims to help TOTO lottery addicts better estimate their chances of winning. 

The project will build functions that enable users to answer questions like:
- *What is the probability of winning the Jackpot prize with a single ticket?*
- *What is the probability of winning the Jackpot prize if we play 50 different tickets (or any other number)?*
- *What is the probability of having at least five (or four, or three) winning numbers on a single ticket?*

The scenario we're following throughout this project is fictional — the main purpose is to practice applying the concepts learned in a setting that simulates a real-world scenario. Throughout the project, we'll need to calculate repeatedly probabilities and combinations.


## Core Functions

In [1]:
### Function to calculate factorials (to find the total number of permutations)

def factorial(n):
    product = 1
    for i in range(n, 0, -1):
        product = product * i
    return product      

In [2]:
### Function to return the number of combinations when we're taking only k objects from a group of n objects.

def combinations(n, k):
    numerator = factorial(n) 
    denominator = factorial(k) * factorial(n-k)
    return numerator/denominator

## One-Ticket Probability of Winning the Jackpot Prize

We will write a function that calculates the probability of winning the TOTO Jackpot prize.

In the TOTO lottery, six Winning Numbers and one Additional Number are drawn from 49 numbers that range from 1 to 49. A player wins the Jackpot prize if the six numbers on their tickets match all the six Winning Numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the Jackpot prize if the Winning Numbers drawn are {13, 22, 24, 27, 42, 44}.

We want players to be able to calculate the probability of winning the Jackpot prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the Jackpot prize for any given ticket.


In [3]:
### The function takes in a list of six unique numbers and prints the probability of winning.

def one_ticket_probability(list_of_6_numbers):
    
    ### Start by calculating the total number of possible outcomes — this is total number of combinations 
    ###    for a six-number lottery ticket. 
    ### There are 49 possible numbers, and six numbers are sampled without replacement.
    
    n_combinations = combinations(49, 6)
    
    
    ### Probability of one successful outcome.
    
    p_one_ticket = 1 / n_combinations
    
    
    ### Convert the probability to percentage.
    
    percent = p_one_ticket * 100
    
    
    ### Print the result in a friendly way.
    
    template = """Your chances to win the Jackpot prize with the numbers {input} are {num1:.7f}%.
In other words, you have a 1 in {num2:,} chances to win."""
    
    output = template.format(input=list_of_6_numbers, num1=percent, num2=int(n_combinations))
    
    print(output)
    

In [4]:
### Testing the function with sample input.

input_1 = [3, 49, 7, 2, 6, 9]
one_ticket_probability(input_1)


Your chances to win the Jackpot prize with the numbers [3, 49, 7, 2, 6, 9] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [5]:
### Testing the function with another sample input.

input_2 = [3, 8, 7, 45, 6, 35]
one_ticket_probability(input_2)


Your chances to win the Jackpot prize with the numbers [3, 8, 7, 45, 6, 35] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Historical Data Check for TOTO Lottery

We will allow users to compare their ticket against the historical TOTO lottery data and determine whether they would have ever won by now.

Below we will be exploring the historical data coming from the TOTO lottery for 1242 drawings (each row shows data for a single drawing), dating from 3 Jul 2008 to 13 Aug 2020. For each drawing, we can find the six Winning Numbers drawn in the following six columns:
- Winning Number 1
- 2
- 3
- 4
- 5
- 6


The data set can be downloaded from [Lottolyzer](https://en.lottolyzer.com/history/singapore/toto/) and it has the following structure as shown below.

In [6]:
import pandas as pd
pd.options.display.max_columns = 50 ### to avoid truncated output. 
pd.options.display.max_rows = 20

data = pd.read_csv("ToTo.csv")

### Get the number of rows and columns.

data.shape


(1242, 32)

In [7]:
### Explore the first 3 rows of data.

data.head(3)

Unnamed: 0,Draw,Date,Winning Number 1,2,3,4,5,6,Additional Number,From Last,Same As Day,Odd,Even,1-10,11-20,21-30,31-40,41-50,Division 1 Winners,Division 1 Prize,Division 2 Winners,Division 2 Prize,Division 3 Winners,Division 3 Prize,Division 4 Winners,Division 4 Prize,Division 5 Winners,Division 5 Prize,Division 6 Winners,Division 6 Prize,Division 7 Winners,Division 7 Prize
0,3582,2020-08-13,5,9,29,41,42,43,28,92943,,5,1,2,0,1,0,3,0.0,0.0,3.0,68719.0,59.0,2403.0,262.0,296.0,4121.0,50.0,5646.0,25.0,77445.0,10.0
1,3581,2020-08-10,9,19,20,29,36,43,38,43,,4,2,1,2,1,1,1,1.0,2283630.0,3.0,160255.0,122.0,1548.0,272.0,379.0,6740.0,50.0,8191.0,25.0,122188.0,10.0
2,3580,2020-08-06,10,28,35,39,41,43,44,10,,4,2,1,0,1,2,2,0.0,0.0,0.0,0.0,50.0,2835.0,136.0,569.0,3379.0,50.0,4392.0,25.0,63855.0,10.0


In [8]:
### Explore the last 3 rows of data.

data.tail(3)

Unnamed: 0,Draw,Date,Winning Number 1,2,3,4,5,6,Additional Number,From Last,Same As Day,Odd,Even,1-10,11-20,21-30,31-40,41-50,Division 1 Winners,Division 1 Prize,Division 2 Winners,Division 2 Prize,Division 3 Winners,Division 3 Prize,Division 4 Winners,Division 4 Prize,Division 5 Winners,Division 5 Prize,Division 6 Winners,Division 6 Prize,Division 7 Winners,Division 7 Prize
1239,2343,2008-07-10,22,32,41,42,43,44,31,,,2,4,0,0,1,1,4,,,,,,,,,,,,,,
1240,2342,2008-07-07,2,18,19,21,39,45,36,45.0,,4,2,1,2,1,1,1,,,,,,,,,,,,,,
1241,2341,2008-07-03,6,11,14,15,28,45,35,,,3,3,1,3,1,0,1,,,,,,,,,,,,,,


## Function for Historical Data Check

We're going to write a function that will enable users to compare their ticket against the historical TOTO lottery data and determine whether they would have ever won by now.


In [9]:
### Function to extract all the winning six numbers from the historical data set as Python sets.

def extract_numbers(row):
    ### Get data from column index 2 to 7 (i.e. columns for 'Winning Number 1', '2', ... to '6').
    row = row[2:8] 
    
    ### Convert the values to 'set' datatype, i.e. a collection of unique items which is unordered and unindexed.
    ### The major advantage of using a set, as opposed to a list, is that it has a highly optimized method 
    ###    for checking whether a specific element is contained in the set.
    row = set(row.values)
    return row

### Apply the function to every row of the dataset (axis=1) to extract all the Winning Numbers.

winning_numbers = data.apply(extract_numbers, axis=1) 

### Preview of the first 5 rows of the Winning Numbers.

winning_numbers.head()


0      {5, 9, 42, 41, 43, 29}
1     {36, 9, 43, 19, 20, 29}
2    {35, 39, 41, 10, 43, 28}
3      {32, 1, 3, 40, 10, 15}
4    {34, 38, 43, 48, 18, 24}
dtype: object

In [10]:
### Function that takes in two inputs: 
### - a Python list containing the user numbers 
### - and a pandas Series containing sets with the Winning Numbers 

def check_historical_occurence(list_of_6_numbers, series_of_winning_numbers):
    
    ### Convert the user numbers list as a set.
    input_numbers = set(list_of_6_numbers)
    
    ### Compare the set against the pandas Series that contains the sets with the Winning Numbers to find the number of matches.
    ### A Series of Boolean values will be returned as a result of the comparison.
    matches = series_of_winning_numbers == input_numbers
    
    ### Sum the matches to get the number of times the matches has occurred.
    total_matches = matches.sum()
    
    ### Print information about the number of times the combination inputted by the user occurred in the past.    
    template = "This combination of numbers {input} has occurred {num1} time(s) as Winning Numbers in the past."
    output = template.format(input=list_of_6_numbers, num1=int(total_matches))    
    print(output)
    
    if total_matches == 0:
        print("This doesn't mean it's more likely to occur now.")    
    
    print('\nIn the next drawing:')
    
    ### Call the one_ticket_probability() function again to print the probability of winning the Jackpot prize 
    ### in the next drawing with that combination.
    one_ticket_probability(list_of_6_numbers) 
    

In [11]:
### Test the check_historical_occurence() function.

check_historical_occurence([2, 41, 11, 12, 43, 14], winning_numbers)


This combination of numbers [2, 41, 11, 12, 43, 14] has occurred 0 time(s) as Winning Numbers in the past.
This doesn't mean it's more likely to occur now.

In the next drawing:
Your chances to win the Jackpot prize with the numbers [2, 41, 11, 12, 43, 14] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


In [12]:
### Test the check_historical_occurence() function again with a known Winning Number.

check_historical_occurence([9, 19, 20, 29, 36, 43], winning_numbers)


This combination of numbers [9, 19, 20, 29, 36, 43] has occurred 1 time(s) as Winning Numbers in the past.

In the next drawing:
Your chances to win the Jackpot prize with the numbers [9, 19, 20, 29, 36, 43] are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.


## Multi-Ticket Probability of Winning the Jackpot Prize

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning — on this screen, we're going to write a function that will allow the users to calculate the chances of winning for any number of different tickets.


In [13]:
### A function that prints the probability of winning the Jackpot prize depending on the number of different tickets played.

def multi_ticket_probability(n_tickets):
    
    ### Start by calculating the total number of possible outcomes — this is total number of combinations 
    ###    for a six-number lottery ticket. 
    ### There are 49 possible numbers, and six numbers are sampled without replacement.
    
    n_combinations = combinations(49, 6)
    
    
    ### Calculate the probability of winning the Jackpot prize based on the number of tickets.
    
    p_n_tickets = n_tickets / n_combinations
    
    
    ### Convert the probability to percentage.
    
    percent = p_n_tickets * 100
    
    
    ### Print the result in a friendly way.
    
    if n_tickets == 1:
        template = """Your chances to win the Jackpot prize with {input:,} ticket are {num1:.7f}%.
In other words, you have a 1 in {num2:,} chances to win."""
    else:
        template = """Your chances to win the Jackpot prize with {input:,} different tickets are {num1:.7f}%.
In other words, you have a 1 in {num2:,} chances to win."""
                
    output = template.format(input=n_tickets, num1=percent, num2=round(n_combinations/n_tickets))
    
    print(output)
    

In [14]:
### Test the above function using the following test inputs.

number_of_tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for n in number_of_tickets:
    multi_ticket_probability(n)
    print('-----------------------------------------------------------------------------------------')
    

Your chances to win the Jackpot prize with 1 ticket are 0.0000072%.
In other words, you have a 1 in 13,983,816 chances to win.
-----------------------------------------------------------------------------------------
Your chances to win the Jackpot prize with 10 different tickets are 0.0000715%.
In other words, you have a 1 in 1,398,382 chances to win.
-----------------------------------------------------------------------------------------
Your chances to win the Jackpot prize with 100 different tickets are 0.0007151%.
In other words, you have a 1 in 139,838 chances to win.
-----------------------------------------------------------------------------------------
Your chances to win the Jackpot prize with 10,000 different tickets are 0.0715112%.
In other words, you have a 1 in 1,398 chances to win.
-----------------------------------------------------------------------------------------
Your chances to win the Jackpot prize with 1,000,000 different tickets are 7.1511238%.
In other word

## Less Winning Numbers - Probability of Winning Smaller Prizes

Below we will write one more function to allow users to calculate probabilities for three, four, or five Winning Numbers.

For extra context, in [TOTO](https://online.singaporepools.com/en/lottery/toto-statistics-history) lottery there are smaller prizes if a player's ticket match three, four, or five of the six Winning Numbers drawn (or also match the Additional Number drawn). As a consequence, the users might be interested in knowing the probability of having three, four, or five Winning Numbers.

- Group 2 Prize: Matching 5 Winning Numbers + Additional Number
- Group 3 Prize: Matching 5 Winning Numbers
- Group 4 Prize: Matching 4 Winning Numbers + Additional Number
- Group 5 Prize: Matching 4 Winning Numbers 
- Group 6 Prize: Matching 3 Winning Numbers + Additional Number
- Group 7 Prize: Matching 3 Winning Numbers 


In [15]:
### A function which takes in an integer n (value between 3 and 5) and prints information about the chances of winning 
### the Group 3, Group 5, or Group 7 prize, depending on the value of that integer.

def probability_win_group357(n):
    
    ### Calculate the number of successful outcomes given the value of the input. 
    
    ### Find the total number of n-number combinations from the set of 6 numbers (i.e. 6 choose n)
    combinations_n = combinations(6, n)
    
    ### Find the remaining number of combinations (i.e. 42 choose 6-n).
    ### Note: 42 is used instead of 43 because there is one number drawn for Additional Number.
    combinations_remaining = combinations(49-6-1, 6-n)
 
    ### Calculate the total number of successful outcomes.
    ### (Imagine if n=3, we need to find combination from the ticket [6 choose 3] and the remaining combination [42 choose 3]).
    ### (Imagine if n=4, we need to find combination from the ticket [6 choose 4] and the remaining combination [42 choose 2]).
    ### (Imagine if n=5, we need to find combination from the ticket [6 choose 5] and the remaining combination [42 choose 1]).
    total_successful_outcome = (combinations_n * combinations_remaining)
    
    ### Get the total possible outcomes from [49 choose 6] for Winning Numbers.
    total_possible_outcome = combinations(49, 6)
    
    ### Calculate the probability
    probability = total_successful_outcome / total_possible_outcome
    
    percent = probability * 100
    
    template = """Your chances of having {input} Winning Numbers with this ticket are {num1:.7f}%.
In other words, you have a 1 in {num2:,} chances to win."""
    
    output = template.format(input=n, num1=percent, num2=int(round(total_possible_outcome / total_successful_outcome)))
    
    print(output)    


In [16]:
### A function which takes in an integer n (value between 3 and 5) and prints information about the chances of winning 
### the Group 2, Group 4, or Group 6 prize, depending on the value of that integer.

def probability_win_group246(n):
    
    ### Calculate the number of successful outcomes given the value of the input. 
    
    ### Find the total number of n-number combinations from the set of 6 numbers (i.e. 6 choose n)
    combinations_n = combinations(6, n)
    
    ### Find the remaining number of combinations (i.e. 42 choose 6-n-1).
    ### Note: 42 is used instead of 43 because there is one number drawn for Additional Number.
    combinations_remaining1 = combinations(49-6-1, 6-n-1)
    
    ### Combination [1 choose 1] is referring to the Additonal Number.
    combinations_remaining2 = combinations(1, 1)
 
    ### Calculate the total number of successful outcomes.
    ### (Imagine if n=3, find combination from the ticket [6 choose 3] and remaining combination [42 choose 2] and [1 choose 1]).
    ### (Imagine if n=4, find combination from the ticket [6 choose 4] and remaining combination [42 choose 1] and [1 choose 1]).
    ### (Imagine if n=5, find combination from the ticket [6 choose 5] and remaining combination [42 choose 0] and [1 choose 1]).
    total_successful_outcome = (combinations_n * combinations_remaining1 * combinations_remaining2)
    
    ### Get the total possible outcomes from [49 choose 6] for Winning Numbers and [43 choose 1] for the Additional Number.
    total_possible_outcome = combinations(49, 6) + combinations(43, 1)
        
    ### Calculate the probability
    probability = total_successful_outcome / total_possible_outcome
    
    percent = probability * 100
    
    template = """Your chances of having {input} Winning Numbers + Additional Number with this ticket are {num1:.7f}%.
In other words, you have a 1 in {num2:,} chances to win."""
    
    output = template.format(input=n, num1=percent, num2=int(round(total_possible_outcome / total_successful_outcome)))
    
    print(output)    


In [17]:
### Test the above functions using the following test inputs.

test_inputs = [3,4,5]

for n in test_inputs:
    probability_win_group357(n)
    print('------------------------------------------------------------------------------------------')
    probability_win_group246(n)
    print('------------------------------------------------------------------------------------------')
    

Your chances of having 3 Winning Numbers with this ticket are 1.6418980%.
In other words, you have a 1 in 61 chances to win.
------------------------------------------------------------------------------------------
Your chances of having 3 Winning Numbers + Additional Number with this ticket are 0.1231420%.
In other words, you have a 1 in 812 chances to win.
------------------------------------------------------------------------------------------
Your chances of having 4 Winning Numbers with this ticket are 0.0923568%.
In other words, you have a 1 in 1,083 chances to win.
------------------------------------------------------------------------------------------
Your chances of having 4 Winning Numbers + Additional Number with this ticket are 0.0045052%.
In other words, you have a 1 in 22,197 chances to win.
------------------------------------------------------------------------------------------
Your chances of having 5 Winning Numbers with this ticket are 0.0018021%.
In other words