# Exploring Lottery Logic
**by Gerard Tieng**

For this project, we will be demonstrating our skills in statistics and probability with the exploration of potential outcomes in a lottery. We will assume the role of a data specialist who will build the main logical core of a mobile app, commissioned by a medical institute that aims to prevent and treat gambling addictions. 

Our objective will be to explain the almost near impossible probabilities behind lottery participation in order to deter further gambling addiction. We will answer the probability of winning with a single ticket, multiple tickets, and matching subsets of numbers within the winning set.

The lottery rules in this case is based on a 6 number draw without replacement from a range of 1-49 inclusive. Additionally, [lottery records](https://www.kaggle.com/datascienceai/lottery-dataset) from Canada's national drawings from 1982-2018 will be used in this project to complement theoretical probabilites with real data.

## Essential Formulas

When dealing with probabilities, there two main algebraic equations which prove essential for considering total outcomes: factorials and combinations.

Factorials, expressed as `n!`, allows us to see the number of total unique outcomes (permutations) when selecting items from a set one-by-one without repitition. For example:

`5! = 5*4*3*2*1 or 120 permuations`

In [1]:
def factorial(n):
    final_product = 1
    for i in range(n):
        final_product *= i+1
    return final_product

The factorial function is the main building block in the formula that determines combinations, or permutations within a smaller range of the total possible outcome.

In [2]:
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k) * factorial(n-k)
    return numerator / denominator

## Winning with One Ticket

The first function we will design is one that accepts the biggest possible number to choose in the lottery as the first argument, and a second argument being a list of numbers chosen by the player to play. What will return is the probability for each incrementing step of positive matches.

In [3]:
def one_ticket_probability(max_number, lotto_picks):
    successful_outcomes = 1 
    total_potential_outcomes = combinations(max_number, len(lotto_picks))
    probability_percent = (successful_outcomes / total_potential_outcomes)*100
    probability_fraction = str(successful_outcomes) + "/" + str(total_potential_outcomes)
    print("Probability of {} number(s) correct: {} or {}".format(len(lotto_picks), probability_fraction, "{:.7f}%".format(probability_percent)))

Here is an example of a ticket submission for the 49-range, 6-number lottery:

In [4]:
one_ticket_probability(49,[1,2,3,4,5,6])

Probability of 6 number(s) correct: 1/13983816.0 or 0.0000072%


Here is another for another lottery with different parameters: 
- max number: 60
- total numbers picked: 10

In [5]:
one_ticket_probability(60,[1,2,3,4,5,6,7,8,9,10])

Probability of 10 number(s) correct: 1/75394027566.0 or 0.0000000%


## Comparing Against Historical Data

Some may argue or doubt theoretical probabilities for their own reasons. So in the next section, we will write a function that will compare a player's personal set of 6 numbers to 35 years of actual bi-weekly lottery drawings from the Canadian national lottery. Let's start with exploring [the dataset](https://www.kaggle.com/datascienceai/lottery-dataset) with pandas.

In [6]:
import pandas as pd

lottery = pd.read_csv("649.csv")
lottery.shape

(3665, 11)

In [7]:
lottery.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [8]:
lottery.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


Upon inspection of the dataset, we see that there were more than 3600 drawings conducted between 1982 and 2018. The following code will extract the winning 6 numbers from the dataframe and collect them as a set.

In [9]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row

winning_numbers = lottery.apply(extract_numbers, axis = 1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In the following function, we will take the player's numbers and compare them to the list of winning numbers to identify any matches.

In [10]:
def check_historical_occurence(player_numbers, winning_numbers):
    player_numbers = set(player_numbers)
    jackpot = player_numbers == winning_numbers
    big_winners = jackpot.sum()
    
    return "In 35 years of collected lottery data, your set of numbers won {} time(s).".format(big_winners)
    

In [11]:
check_historical_occurence([3,41,11,12,43,14], winning_numbers)

'In 35 years of collected lottery data, your set of numbers won 1 time(s).'

## More Tickets, More Chances?

Chances of winning will improve along with the investment of more tickets. However, in a lottery structured such as the one in our case study, that type of investment is considerably large. The following function is designed to calculate the probability of winning the lottery based on the input of how many tickets purchased.

In [12]:
def multi_ticket_probability(ticket_count):
    total_outcomes = combinations(49, 6)
    probability_percent = ticket_count/total_outcomes*100
    probability_fraction = str(ticket_count) + "/" + str(int(total_outcomes))
    return "Your chances of winning with {} ticket(s) is {} or {}".format(ticket_count, probability_fraction, "{:.7f}%".format(probability_percent))

Now let's see how good our chances of winning are at different investment levels.

In [13]:
tickets_bought = [1,10,100,10000,1000000]

for tickets in tickets_bought:
    print(multi_ticket_probability(tickets))

Your chances of winning with 1 ticket(s) is 1/13983816 or 0.0000072%
Your chances of winning with 10 ticket(s) is 10/13983816 or 0.0000715%
Your chances of winning with 100 ticket(s) is 100/13983816 or 0.0007151%
Your chances of winning with 10000 ticket(s) is 10000/13983816 or 0.0715112%
Your chances of winning with 1000000 ticket(s) is 1000000/13983816 or 7.1511238%


Even with an investment of cash to purchase 1 million tickets, the chances of winning are only 7%. Talk about a longshot!

## Probability of Partial Matches

We've focused mainly on the big six-number jackpot, but actually fairly common for a lottery ticket to match at least 1 number correctly (41%). Modifying the logic we've demonstrated above, we can also calculate the probability tickets with partial matching numbers. 

In [14]:
def probability_less_6(picks):
    total_possible_combinations = combinations(49,6)
    winning_combinations = combinations(6, picks)
    remainder_combinations = combinations(43, 6-picks)
    final_probability = ((winning_combinations*remainder_combinations) / total_possible_combinations)*100
    
    return "Your chances of matching {} number(s) is {}%.".format(picks, final_probability)

In [15]:
for i in range(5):
    print(probability_less_6(i+1))

Your chances of matching 1 number(s) is 41.30194504847604%.
Your chances of matching 2 number(s) is 13.237802900152577%.
Your chances of matching 3 number(s) is 1.7650403866870101%.
Your chances of matching 4 number(s) is 0.0968619724401408%.
Your chances of matching 5 number(s) is 0.0018449899512407771%.
