# Treating Gambling Addiction - Building a Probability App for Lottery Players 

# Introduction

The purpose of this project is to build a mobile app to help lottery addicts better estimate their chances of winning. Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft. 

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to mitigate the issue. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities. 

More specifically, our goal is to answer the following questions:
- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

Thanks to our analysis, we found out that the probabilities of winning the lottery are quite low in all three scenarios. Let's now start creating the core functions to discover how we got to such results.

# Core Functions

Throughout the project, we'll need to calculate probabilities and combinations. As a consequence, we'll start by writing two functions that we'll use often:

- A function that calculates factorials; and
- A function that calculates combinations.

It's worth mentioning that in the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. The drawing is done without replacement, which means once a number is drawn, it's not put back in the set.

Let's now try to create the two functions.

In [1]:
# Creating a function that calculates factorials
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

# Creating a function that calculates combinations
def combinations(n, k):
    numerator = factorial(n)
    denominator = factorial(k)*factorial(n-k)
    return numerator / denominator 

# One-ticket Probability

Now that we've created the two core functions, we can focus on building the next ones. We'll now aim at writing a function that calculates the probability of winning the big prize. More specifically, we'll answer the first of the three goal questions: 

*"What is the probability of winning the big prize with a single ticket?"* 

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (remember that for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

In [2]:
# Creating a function that calculates the probability of winning the big prize 
def one_ticket_probability(ticket):
    c = combinations(49, 6)
    outcome = 1 
    probability = outcome / c
    probability_pct = probability*100 # calculating the percentage of the probability 
    print (
    "You have a 1 in {:,} or {:7f}% chance of winning with {}".format(c, probability_pct, ticket)) # printing values in a friendly way for non-technical audiences 

In [3]:
# Testing the function with a few inputs 
ticket_test_1 = [1, 2, 5, 3, 7, 11]
ticket_test_2 = [12, 31, 22, 8, 16, 49]
one_ticket_probability(ticket_test_1)
print('\n')
one_ticket_probability(ticket_test_2)

You have a 1 in 13,983,816.0 or 0.000007% chance of winning with [1, 2, 5, 3, 7, 11]


You have a 1 in 13,983,816.0 or 0.000007% chance of winning with [12, 31, 22, 8, 16, 49]


In the above steps, we created a function to calculate the probability for users to win the big prize. We calculated the total number of possible outcomes by using the `combinations()` function. Then, we divided 1 by that combination, since the number of successful outcomes is one. So, we calculated the probability of winning, which we then converted into percentage. Finally, we used the `str.format()` method to make the result more friendly for users.

# Historical Data Check for Canada Lottery

So far, we've managed to find the probability of winning the big prize with a single ticket. For the first version of the app, however, users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

So, now we'll focus on exploring the historical data coming from the Canada 6/49 lottery. Let's read in the dataset and explore it first.

In [4]:
# Reading in the dataset 
import pandas as pd 
lot_649 = pd.read_csv("~/Desktop/my_projects/data/{0}".format('649.csv'))

In [5]:
lot_649.shape

(3665, 11)

In [6]:
lot_649.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [7]:
lot_649.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3660,649,3587,0,6/6/2018,10,15,23,38,40,41,35
3661,649,3588,0,6/9/2018,19,25,31,36,46,47,26
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


As shown above, The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- `NUMBER DRAWN 1`
- `NUMBER DRAWN 2`
- `NUMBER DRAWN 3`
- `NUMBER DRAWN 4`
- `NUMBER DRAWN 5`
- `NUMBER DRAWN 6`

## Function for Historical Data Check

Now that we've briefly explored the Canada lottery data set, we're going to analyse this historical data. We'll write a function that enables us to compare users' ticket against the historical lottery data in Canada and determine whether they would have ever won by now. This will then help us answer the second of the three goal questions: 

*"What is the probability of winning the big prize if we play 40 different tickets (or any other number)?"*

Firstly, we'll extract all the winning six numbers from the historical data set as Python sets with a function named `extract_numbers()`. Next, we'll write a function named `check_historical_occurence()` that takes in two inputs: a Python list containing the user numbers and a pandas Series containing sets with the winning numbers (this is the series we'll extract using the `extract_numbers()` function). Finally, we'll test our functions with a few inputs.

In [8]:
def extract_numbers(row):
    # Converting values in selected columns to a set
    row = row[['NUMBER DRAWN 1', 'NUMBER DRAWN 2', 'NUMBER DRAWN 3', 
               'NUMBER DRAWN 4', 'NUMBER DRAWN 5', 'NUMBER DRAWN 6']] 
    row = set(row.values)
    return row 

# Extracting all the winning numbers through the apply() method
winning_numbers = lot_649.apply(extract_numbers, axis=1)
winning_numbers.head()

0    {3, 41, 11, 12, 43, 14}
1    {33, 36, 37, 39, 8, 41}
2     {1, 6, 39, 23, 24, 27}
3     {3, 9, 10, 43, 13, 20}
4    {34, 5, 14, 47, 21, 31}
dtype: object

In [9]:
# Comparing the user input (python list) with the winning_numbers (pandas series)
# Calculating the number of times the users input has occurred before using winning_numbers as a reference 
def check_historical_occurrence(user_no, winning_no):
    user_no = set(user_no)
    result = user_no == winning_no
    historical_occurrence = result.sum()
        
    if historical_occurrence == 0:
        print("""
The combination {} has never occurred in the past. 
However, your chances to win the next draw with {} are 1 in 13,983,816 or 0.0000072%. 
""".format(user_no, user_no) 
             ) # printing values in a friendly way for non-technical audiences 
    else:
        print("""
The combination {} has occurred {} times in the past. 
However, this doesn't guarantee that you'll win the next draw.
You have a 1 in 13,983,816 or 0.0000072% chance of winning with {}.
""".format(user_no, historical_occurrence, user_no)
             )

In [10]:
# Testing the function above
user_no_test_1 = [1, 2, 11, 3, 7, 48]
user_no_test_2 = [19, 31, 38, 8, 16, 49]
test_1 = check_historical_occurrence(user_no_test_1, winning_numbers)
print('\n')
test_2 = check_historical_occurrence(user_no_test_2, winning_numbers)


The combination {1, 2, 3, 7, 11, 48} has never occurred in the past. 
However, your chances to win the next draw with {1, 2, 3, 7, 11, 48} are 1 in 13,983,816 or 0.0000072%. 




The combination {38, 8, 16, 49, 19, 31} has never occurred in the past. 
However, your chances to win the next draw with {38, 8, 16, 49, 19, 31} are 1 in 13,983,816 or 0.0000072%. 



To compare the users numbers to the historical data, we created two functions. The first function is `extract_numbers()`. It helped us extract the winning numbers from the historical dataset as a pandas series. This enables to isolate the winning numbers from the data frame. The second function is `check_historical_occurrence()`. This function takes in the user's input as a python list, and a pandas series of the past winning numbers. This function then trurns the python list into a set so that it can compare it to the sets in our pandas series. This is helpful because it makes us check if there are any matches in the two sets. If there's a match, the function prints the number of matches and the probability of winning the big prize. This will give lottery addicts a more realistic perspective on what to expect.

## Multi-ticket Probability

Lottery addicts play more than one ticket on a single drawing. They think that this might increase their chances of winning. Let's help them better estimate their chances of winning. We will write a function to calculate the chances of winning for any number of different tickets. In this way, we'll be able to answer the second of the three goal questions. 

In [11]:
# Writing a function that prints the probability of winning the big prize depending on the number of different tickets played
def multi_ticket_probability(n_tickets):
    tot_outcomes = combinations(49, 6)
    winning_outcomes = n_tickets
    probability = winning_outcomes / tot_outcomes
    probability_pct = probability * 100 # Calculating probability in percentage 
    if n_tickets == 1:
        print(
'''You have a 1 in {:,} or a {:.7f}% chance of winning
if you play with {:,} ticket. 
'''.format(tot_outcomes, probability_pct, n_tickets)
        ) # Printing values in a friendly way for non-technical audiences 

    else:
        new_tot_outcomes = int(tot_outcomes/winning_outcomes)
        print(
'''You have a 1 in {:,} or a {:.7f}% chance of winning
if you play with {:,} tickets. 
'''.format(new_tot_outcomes, probability_pct, n_tickets)
        )

In [12]:
# Testing function using different inputs 
test_1 = [1, 10, 100, 10000, 1000000, 6991908, 13983816]

for i in test_1:
    multi_ticket_probability(i)

You have a 1 in 13,983,816.0 or a 0.0000072% chance of winning
if you play with 1 ticket. 

You have a 1 in 1,398,381 or a 0.0000715% chance of winning
if you play with 10 tickets. 

You have a 1 in 139,838 or a 0.0007151% chance of winning
if you play with 100 tickets. 

You have a 1 in 1,398 or a 0.0715112% chance of winning
if you play with 10,000 tickets. 

You have a 1 in 13 or a 7.1511238% chance of winning
if you play with 1,000,000 tickets. 

You have a 1 in 2 or a 50.0000000% chance of winning
if you play with 6,991,908 tickets. 

You have a 1 in 1 or a 100.0000000% chance of winning
if you play with 13,983,816 tickets. 



We created a function to help users better estimate their chances of winning. The function is `multi_ticket_probability()` and it helped us calculate the chances of winning for any number of different tickets. This function takes in the number of tickets that the user wants to play with. It then calculates the total number of possible outcomes through the `combinations()` function. It also calculates the probability for the number of tickets inputted in percentages. Finally, the function prints the probability of winning with the amount of tickets, which will give users a chance to better estimate their probabilities of winning depending on their number of tickets.

As shown above, the probability of winning increases the more tickets users input. However, users should play with a ridiculously high amount of tickets to get any significant chance of winning. For instance, it takes 1,000,000 tickets to have only a 7% chance of winning. 

# Less Winning Numbers — Function

Now we're going to write a function to allow the users to calculate probabilities for two, three, four, or five winning numbers. In this way, we'll answer the third and last one of the three goal questions: 

*"What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?"*

For more context, in most 6/49 lotteries there are smaller prizes if a player's ticket match two to five of the six numbers drawn. So the user wants to know the probability of having two, three, four, or five winning numbers.

We'll first write a function named `probability_less_6()` which takes in an integer between `2` and `5` and prints information about the chances of winning depending on the value of that integer. We'll then test the function with a couple of random inputs.

In [13]:
def probability_less_6(n):
    # Finding the probability of having 2 to 5 winning numbers in a ticket
    tot_outcomes = combinations(49, 6)
    winning_outcomes = {}
    
    # Calculating the total combinations of numbers between 2 and 5 if chosen from a set of 6 numbers
    for i in range(2,6): 
        c = combinations(6, i) 
        remainder = combinations(43, 6 - i)
    
    # Finding the possible outcomes of picking any winning number between 2 and 5 from the 43 remaining numbers
        outcome = c * remainder
        winning_outcomes[i] = outcome
    
    # Calculating the probability of having n  winning numbers in a ticket
    if n in winning_outcomes:
        winning_outcome = winning_outcomes[n]
    
    probability = winning_outcome / tot_outcomes
    probability_pct = probability  * 100 # Multiplying by 100 converts to a percentage
    new_tot_outcomes = int(tot_outcomes / winning_outcome)
    
    print('''
You have a 1 in {:,} chance, 
or a {:.4f}% chance of having {} winning numbers on a ticket.
'''.format(new_tot_outcomes, probability_pct, n)
         )

In [14]:
# testing the probability_less_6 function
n_winning = [2, 3, 4, 5]
for i in n_winning:
    probability_less_6(i)
    print('\n') # prints a newline after each iteration


You have a 1 in 7 chance, 
or a 13.2378% chance of having 2 winning numbers on a ticket.




You have a 1 in 56 chance, 
or a 1.7650% chance of having 3 winning numbers on a ticket.




You have a 1 in 1,032 chance, 
or a 0.0969% chance of having 4 winning numbers on a ticket.




You have a 1 in 54,200 chance, 
or a 0.0018% chance of having 5 winning numbers on a ticket.





We created a function to calculate probabilities for two to five winning numbers. The `probability_less_6()` function follows the below steps:
- It finds the number of winning outcomes of having between 2 to 5 winning numbers out of 6. It uses the `combinations()` function of the numbers. For instance, to find 5 winning numbers, it computes the value of 6 combinations of 5.
- It divides the number of winning outcomes by the total number of possible outcomes. It then multiplies it by 100 to get the percentage. 

Lottery addicts can now know what's the realistic probability of winning the big prize. Yet, they will keep on playing. This is because in most 6/49 lotteries there are smaller prizes. But, from what we can see above, there is only a 13% chance of having 2 winning numbers on a ticket.

# Conclusion

In this project, our purpose was to build a mobile app to help lottery addicts better estimate their chances of winning. A medical institute that aims to prevent such issue wanted to build a dedicated mobile app. We've helped their engineering team by creating the logical core of the app and calculate probabilities. Eventually, we managed to answer the following questions: 

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?

We then created functions that gave us the following answers:
- There is 1 in 13,983,816 or a 0.000072% chance of winning with a single ticket.
- There is a 1 in 13 or a 7.2% chance of winning with a 1,000,000 tickets.
- There is a 15.1% chance of having at least 2 winning numbers on a single ticket.

Although we provided the logic behind the probabilities of winning at the lottery, we're realistically expecting players to keep on playing. However, the reasoning and logic by numbers might help persuade the less addicted players to avoid being drawn into full addiction. Whereas it might be harder for the app to be effective for the truly addicted players, whom might not be persuaded that easily. 