# Mobile app for lottery addiction
## DataQuest project

This notebook deals with a scenario as part of the Dataquest.io curriculum.

#### Scenario
This notebook deals with the fictional scenario of wanting to build an app. This app needs to combat lottery addiction by informing users of the actual chances for them to win money in the 6/49 lottery. This project will deal with the probability calculations of such an app.

In the 6/49 lottery, every ticket has a combination of 6 distinct numbers. To win money, one needs to have at least 2 numbers correct. Your winnings increase with the number of digits you have correct.

#### Dataset
Historical data of the 6/49 lottery drawings from 1982 until 2018.

To calculate probabilities, we'll be repeating the same calculations a few times, therefore we'll write the below functions.

In [2]:
def factorial(n):
    total = 1
    for i in range(n,0,-1):
        total *= i
    return total

def combinations(n,k):
    fn = factorial(n)
    fk = factorial(n-k)
    result = fn / (factorial(k) * fk)
    return result

In [3]:
# Checking to see whether the functions are correct
print(factorial(5) == 5 * 4 * 3 * 2 * 1)
print(factorial(7) == 7 * 6 * 5 * 4 * 3 * 2 * 1)
print(combinations(5,2) == 10)
print(combinations(9,5) == 126)

True
True
True
True


We'll first focus on the big prize. To win this, you need to have all 6 numbers correct, which are drawn from the set {1,49}.

In [3]:
def one_ticket_probability():
    outcomes = combinations(49,6)
    return "The chance of winning the big prize if you have one ticket is 1 in {:,.0f}.".format(outcomes)

In [4]:
one_ticket_probability()

'The chance of winning the big prize if you have one ticket is 1 in 13,983,816.'

We also want users to be able to compare their ticket against historical data of the lottery drawings, to determine if users would have ever won.

In [5]:
import pandas as pd
drawings = pd.read_csv("649.csv")

In [6]:
drawings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3665 entries, 0 to 3664
Data columns (total 11 columns):
PRODUCT            3665 non-null int64
DRAW NUMBER        3665 non-null int64
SEQUENCE NUMBER    3665 non-null int64
DRAW DATE          3665 non-null object
NUMBER DRAWN 1     3665 non-null int64
NUMBER DRAWN 2     3665 non-null int64
NUMBER DRAWN 3     3665 non-null int64
NUMBER DRAWN 4     3665 non-null int64
NUMBER DRAWN 5     3665 non-null int64
NUMBER DRAWN 6     3665 non-null int64
BONUS NUMBER       3665 non-null int64
dtypes: int64(10), object(1)
memory usage: 315.0+ KB


The dataset appears to be complete and not in need of any cleaning.

In [7]:
drawings.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [8]:
drawings.tail()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


We're going to create a function that will take the lottery numbers of the user and compare them against the full dataset. We will then inform the user how many times they would have won throughout the years using these numbers.

In [26]:
# Storing individual drawn numbers together in one set in one column
drawings['DRAWN NUMBERS'] = drawings.apply(lambda row: set(row[4:10]), axis=1)

In [30]:
drawings.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER,DRAWN NUMBERS
0,649,1,0,6/12/1982,3,11,12,14,41,43,13,"{3, 41, 11, 12, 43, 14}"
1,649,2,0,6/19/1982,8,33,36,37,39,41,9,"{33, 36, 37, 39, 8, 41}"
2,649,3,0,6/26/1982,1,6,23,24,27,39,34,"{1, 6, 39, 23, 24, 27}"


In [73]:
def check_historical_occurrence(ticket):
    wins = 0
    ticket = set(ticket)
    games = len(drawings)
    
    for t in drawings['DRAWN NUMBERS']:
        if t == ticket:
            wins += 1
            
    outcomes = combinations(49,6)

    print("""There have been {} drawings since 1982.
    With your ticket numbers you would have won {} times.""".format(games, wins))
    print('\n')
    print("""The probability of you winning the big prize in the next game using one ticket is 1 in {:,.0f}.""".format(outcomes))

In [74]:
check_historical_occurrence([1,2,3,4,5,6])

There has been a drawing 3665 times since 1982.
    With your ticket numbers you would have won 0 times.


The probability of you winning the big prize in the next game using one ticket is 1 in 13,983,816.


The functions so far inform users of their chances of winning if they have one ticket. Since the target audience of our app are (potential) gambling addicts, they are likely to have more than one ticket. We'll create a function to calculate their theoretical probability of winning for N number of tickets.

In [67]:
def multi_ticket_probability(n):
    outcomes = combinations(49,6)
    
    print("Using {0} tickets, you have a chance of {0} in {1:,.0f} in winning the big prize".format(n, outcomes))

In [68]:
multi_ticket_probability(2)

Using 2 tickets, you have a chance of 2 in 13,983,816 in winning the big prize


Since users can also win smaller prices for having between 2 and 5 numbers correct, we will also present them with their chances of winning those. We will use a function which will take the argument of N numbers correct.

In [100]:
def probability_less_6(n):
    winning_numbers = combinations(6,n)
    remainder = combinations(43,6-n)
    combos = winning_numbers * remainder
    outcomes = round(combinations(49,6) / combos)
    return "With one ticket, you have a chance of 1 in {:,} of getting {} numbers right".format(outcomes,n)