# Mobile App for Lottery Addiction

## Introduction

This project is the final step of the first Probability course in Dataquest.
The main target is to contribute to the development of a mobile app that is meant to help lottery addicts better estimate their chances of winning.

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

Suppose that a medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

For the first version of the app, they want us to focus on the 6/49 lottery and build functions that enable users to answer questions like:

- What is the probability of winning the big prize with a single ticket?
- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?
- What is the probability of having at least five (or four, or three, or two) winning numbers on a single ticket?



## Dataset

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kaggle.com/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018.

We'll start by writing two functions that we'll use often:

- A function that calculates factorials
- A function that calculates combinations

In [1]:
import pandas as pd


def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product


def combinations(n, k):
    num = factorial(n)
    den = factorial(k) * factorial(n-k)
    return num / den

## Lottery system

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. As an example, if a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are exactly those. If only one number differs, he doesn't win.

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with the various numbers they play on a single ticket (for each ticket a player chooses six numbers out of 49). So, we'll start by building a function that calculates the probability of winning the big prize for any given ticket.

We can build a function to estimate the probability of a single ticket with a list of numbers chosen.

In [2]:
def one_ticket_probability(l):
    outcomes = combinations(49, 6)
    p = 1 / outcomes
    ppercentage = p * 100
    print('''The chances to win the big prize with the set {} are {:.8f}%.
It means 1 in {:,} chances.'''.format(l, ppercentage, int(outcomes)))

Let's try two (pseudo)random sets of numbers to try our functions.

In [3]:
from random import seed, randint

seed(1)
list1 = []
for _ in range(6):
    list1.append(randint(0, 49))
seed(2)
list2 = []
for _ in range(6):
    list2.append(randint(0, 49))

In [4]:
one_ticket_probability(list1)

The chances to win the big prize with the set [8, 36, 48, 4, 16, 7] are 0.00000715%.
It means 1 in 13,983,816 chances.


In [5]:
one_ticket_probability(list2)

The chances to win the big prize with the set [3, 5, 5, 23, 10, 47] are 0.00000715%.
It means 1 in 13,983,816 chances.


## Comparing to previous tickets

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018. For each drawing, we can find the six numbers drawn in the following six columns:

- NUMBER DRAWN 1
- NUMBER DRAWN 2
- NUMBER DRAWN 3
- NUMBER DRAWN 4
- NUMBER DRAWN 5
- NUMBER DRAWN 6

Let's read the dataset and explore it!

In [6]:
lot_canada = pd.read_csv('649.csv')
lot_canada.shape

(3665, 11)

In [7]:
lot_canada.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [8]:
lot_canada.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


We will write the function to compare a given set of numbers (user's input) with the historical data in this dataset

In [9]:
def extract_numbers(row):
    row = row[4:10]
    row = set(row.values)
    return row
winers = lot_canada.apply(extract_numbers, axis=1)

In [10]:
def check_historical_occurence(l, s):
    input = set(l)
    occurrence = s == input
    n_occurrences = occurrence.sum()
    if n_occurrences == 0:
        print('''The combination {} has never occured.
It does not modify your chances to win the big prize using the set {}, that are 0.00000715%.
You have a 1 in 13983816 chances to win.'''.format(l, l))
        
    else:
        print('''The combination {} has won {} times in the past.
It does not modify your chances to win the big prize using the set {}, that are 0.00000715%.
You have a 1 in 13983816 chances to win.'''.format(l, n_occurrences, l))

Let's try our function with the previously created sets.

In [11]:
check_historical_occurence(list1, winers)

The combination [8, 36, 48, 4, 16, 7] has never occured.
It does not modify your chances to win the big prize using the set [8, 36, 48, 4, 16, 7], that are 0.00000715%.
You have a 1 in 13983816 chances to win.


In [12]:
check_historical_occurence(list2, winers)

The combination [3, 5, 5, 23, 10, 47] has never occured.
It does not modify your chances to win the big prize using the set [3, 5, 5, 23, 10, 47], that are 0.00000715%.
You have a 1 in 13983816 chances to win.


## Multi-ticket

Lottery addicts usually play more than one ticket on a single drawing, thinking that this might increase their chances of winning significantly. Our purpose is to help them better estimate their chances of winning.

We can write a new function where the user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play) and  an integer between 1 and 13,983,816 (the maximum number of different tickets). The output will be the total probability of winning.

In [16]:
def multi_ticket_probability(tickets):
    comb = combinations(49, 6)
    p = tickets / comb
    ppercentage = p * 100
    print('''The chances to win the big prize with {} tickets are {:.8f}%.'''.format(tickets, ppercentage))

Let's try this function with some inputs!

In [17]:
test = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for i in test:
    multi_ticket_probability(i)
    print('\n')

The chances to win the big prize with 1 tickets are 0.00000715%.


The chances to win the big prize with 10 tickets are 0.00007151%.


The chances to win the big prize with 100 tickets are 0.00071511%.


The chances to win the big prize with 10000 tickets are 0.07151124%.


The chances to win the big prize with 1000000 tickets are 7.15112384%.


The chances to win the big prize with 6991908 tickets are 50.00000000%.


The chances to win the big prize with 13983816 tickets are 100.00000000%.




## Probabilities for given numbers

Next, we're going to write one more function to allow the users to calculate probabilities for two, three, four, or five winning numbers. In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

We will receive an integer between 2 and 5 that represents the number of winning numbers expected.

In [20]:
def probability_less_6(number):
    comb_ticket = combinations(6, number)
    comb_remaining = combinations(43, 6 - number)
    outcomes = comb_ticket * comb_remaining
    comb_total = combinations(49, 6)    
    p = outcomes / comb_total
    ppercentage = p * 100    
    comb = round(comb_total/outcomes)    
    print('''You have {:.6f}% chances of winning with {} numbers in this ticket.
It means 1 in {:,} chances to win.'''.format(ppercentage, number, int(comb)))

Let's test this function!

In [21]:
test2 = [2, 3, 4, 5]
for i in test2:
    probability_less_6(i)
    print('\n')

You have 13.237803% chances of winning with 2 numbers in this ticket.
It means 1 in 8 chances to win.


You have 1.765040% chances of winning with 3 numbers in this ticket.
It means 1 in 57 chances to win.


You have 0.096862% chances of winning with 4 numbers in this ticket.
It means 1 in 1,032 chances to win.


You have 0.001845% chances of winning with 5 numbers in this ticket.
It means 1 in 54,201 chances to win.


