# Guided Project: Mobile App for Lottery Addiction 
This guided project is designed to allow us to practice what we learned about probability. Below is the imaginary scenario from Dataquest: 

"A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities."

The medical team wants us to start by designing the app to focus on the [6/49 lottery](https://en.wikipedia.org/wiki/Lotto_6/49) in Canada. This lottery allows users to choose six numbers from 1 to 49, without replacement, and users win if their six numbers match the numbers chosen in the drawing. 


This project will require us to repeatedly calculate various probabilities, so we will begin by creating some useful functions: 

In [1]:
def factorial(n):
    product = 1
    for i in range(1,n+1):
        product *= i
    return product

In [9]:
def combination(n, k):
    product = 1
    for i in range(n - k + 1, n + 1):
        product *= i
    return product / factorial(k)

In [36]:
def one_ticket_probability(numbers): #this function calculates the probability of any single ticket winning the 6/49 lottery. 
    outcomes = combination(49, 6)
    prob = 1 / outcomes * 100
    return "The probability of winning is {:.8f} percent when you choose {}".format(prob, numbers)

In [35]:
one_ticket_probability([1,2,3,4,5,6]) #testing the function 

'The probability of winning is 0.00000715 percent when you choose [1, 2, 3, 4, 5, 6]'

## Comparing Data on Past Drawings 
Now that we have created the above functions, we can move to the next step in the project. The medical team wants us to incorporate historical data from the 6/49 lottery into the app, so users can see if their numbers have ever won the lottery. The data for the lottery is available [here](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) and contains information on drawings from 1982 to 2018. 

In [38]:
import pandas as pd
draws = pd.read_csv('Datasets/lottery_data.csv')

In [39]:
draws.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [40]:
draws.shape

(3665, 11)

In [41]:
draws.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


In [115]:
def extract_numbers(row):
    return set(row[4:11])

In [118]:
all_sets = draws.apply(extract_numbers, axis=1)

In [119]:
all_sets

0        {3, 41, 11, 12, 43, 14, 13}
1         {33, 36, 37, 39, 8, 41, 9}
2         {1, 34, 6, 39, 23, 24, 27}
3         {34, 3, 9, 10, 43, 13, 20}
4        {34, 5, 45, 14, 47, 21, 31}
                    ...             
3660    {35, 38, 40, 41, 10, 15, 23}
3661    {36, 46, 47, 19, 25, 26, 31}
3662     {32, 34, 6, 16, 22, 24, 31}
3663      {2, 38, 8, 15, 49, 21, 31}
3664    {35, 37, 14, 48, 17, 24, 31}
Length: 3665, dtype: object

In [130]:
def check_historical_occurence(values, all_sets):
    entry = set(values)
    matches = 0 
    for s in all_sets:
        if entry == s:
            matches += 1
    if matches == 0:
        return "This selection of numbers has never occurred in the lottery before. The probability of winning with {} is 0.00000715 percent.".format(values)
    elif matches == 1:
        return "This selection of numbers has occurred once in the past. The probability of winning with {} is 0.00000715 percent.".format(values)
    else:
        return "This selection of numbers has occurred {} times in the past. The probability of winning with {} is 0.00000715 percent.".format(matches, values)

In [132]:
check_historical_occurence([2, 38, 10, 15, 49, 21, 31], all_sets)

'This selection of numbers has never occurred in the lottery before. The probability of winning with [2, 38, 10, 15, 49, 21, 31] is 0.00000715 percent.'

In [144]:
def multi_ticket_probability(num_tickets):
    outcomes = combination(49, 6)
    prob_win = num_tickets / outcomes * 100 
    if num_tickets == 1:
        return "The chance of winning with one ticket is {:.8f} percent.".format(prob_win)
    else: 
        return "The chance of winning with {:,} tickets is {:.8f} percent.".format(num_tickets, prob_win)


In [145]:
inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for i in inputs:
    print(multi_ticket_probability(i))

The chance of winning with one ticket is 0.00000715 percent.
The chance of winning with 10 tickets is 0.00007151 percent.
The chance of winning with 100 tickets is 0.00071511 percent.
The chance of winning with 10,000 tickets is 0.07151124 percent.
The chance of winning with 1,000,000 tickets is 7.15112384 percent.
The chance of winning with 6,991,908 tickets is 50.00000000 percent.
The chance of winning with 13,983,816 tickets is 100.00000000 percent.


Above, we see that an individual would have to buy 1 million tickets to have a 7% chance of winning the lottery. 

## Finding the Probability of Having Few Matches
In most 6/49 lotteries there are smaller prizes available when a player's ticket match two, three, four, or five of the six numbers drawn. Because of this, users of the addiction app may be interested in knowing the likelihood of having a partial match. Below, we create a function that gives the probability of such a partial match. 

In [189]:
def probability_less_6(val):
    partial_matches = combination(6, val)
    fulls_with_pm = combination(43, 6 - val)
    good_outcomes = partial_matches * fulls_with_pm 
    prob_good = good_outcomes / combination(49, 6) * 100
    return "The chance of having {} numbers match is {:.6f} percent.".format(val, prob_good)

In [190]:
in_vals = [2, 3, 4, 5]
for n in in_vals:
    print(probability_less_6(n))

The chance of having 2 numbers match is 13.237803 percent.
The chance of having 3 numbers match is 1.765040 percent.
The chance of having 4 numbers match is 0.096862 percent.
The chance of having 5 numbers match is 0.001845 percent.


In [219]:
def tot_probability_less_6(val):
    tot_prob = 0
    for n in range(val, 6):
        partial_matches = combination(6, n)
        fulls_with_pm = combination(43, 6 - n)
        good_outcomes = partial_matches * fulls_with_pm 
        prob_good = good_outcomes / combination(49, 6) * 100
        tot_prob += prob_good
    return "The chance of having at least {} numbers match is {:.6f} percent.".format(val, tot_prob)

In [220]:
for n in in_vals:
    print(tot_probability_less_6(n))

The chance of having at least 2 numbers match is 15.101550 percent.
The chance of having at least 3 numbers match is 1.863747 percent.
The chance of having at least 4 numbers match is 0.098707 percent.
The chance of having at least 5 numbers match is 0.001845 percent.
