## Lottery Addiction

Many people start playing the lottery for fun, but for some this activity turns into a habit which eventually escalates into addiction. Like other compulsive gamblers, lottery addicts soon begin spending from their savings and loans, they start to accumulate debts, and eventually engage in desperate behaviors like theft.

A medical institute that aims to prevent and treat gambling addictions wants to build a dedicated mobile app to help lottery addicts better estimate their chances of winning. The institute has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities.

The institute also wants us to consider historical data coming from the national 6/49 lottery game in Canada. The [data set](https://www.kaggle.com/datasets/datascienceai/lottery-dataset) has data for 3,665 drawings, dating from 1982 to 2018 (we'll come back to this).

In [8]:
def factorial(n):
    fact = 1
    while n >= 1:
        fact *= n
        n -= 1
    return fact

def combinations(n, k):
    return factorial(n)/(factorial(n-k) * factorial(k))

In the 6/49 lottery, six numbers are drawn from a set of 49 numbers that range from 1 to 49. A player wins the big prize if the six numbers on their tickets match all the six numbers drawn. If a player has a ticket with the numbers {13, 22, 24, 27, 42, 44}, he only wins the big prize if the numbers drawn are {13, 22, 24, 27, 42, 44}. If only one number differs, he doesn't win.

In [17]:
def one_ticket_probability(numbers):
    total_combinations = combinations(49, len(numbers))
    success = 1
    print("There is {:.10f}% chance of wining the lottery from the numbers we have".format(success/total_combinations))

In [18]:
one_ticket_probability([13, 22, 24, 27, 42, 44])

There is 0.0000000715% chance of wining the lottery from the numbers we have


Users should also be able to compare their ticket against the historical lottery data in Canada and determine whether they would have ever won by now.

The data set contains historical data for 3,665 drawings (each row shows data for a single drawing), dating from 1982 to 2018.

In [19]:
import pandas as pd
lottery = pd.read_csv("649.csv")
lottery.head()

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34
3,649,4,0,7/3/1982,3,9,10,13,20,43,34
4,649,5,0,7/10/1982,5,14,21,31,34,47,45


In [20]:
lottery.shape

(3665, 11)

In [21]:
def extract_numbers(x):
    lot_num = []
    for i in x:
        lot_num.append(i)
    return set(lot_num)

In [22]:
lot_history = lottery.loc[:,'NUMBER DRAWN 1':'NUMBER DRAWN 6'].apply(extract_numbers, axis = 1)

In [25]:
def check_historical_occurance(current_num, old_nums):
    occurance = 0
    num = set(current_num)
    occur =  sum(old_nums == num)
    print("The number that we have had occured {} time(s) in past".format(occur))
    total_combinations = combinations(49, 6)
    print("So, there is about {:.10f}% chance of winning".format(occur/total_combinations))

In [26]:
check_historical_occurance([3,11,12,14,41,43], lot_history)

The number that we have had occured 1 time(s) in past
So, there is about 0.0000000715% chance of winning


We are trying to figure out if the lotter number combinations have occured in the past or not and the combinations of winning the big prize`

In [28]:
def multi_ticket_probability(n_tickets):
    total_combinations = combinations(49, 6)
    print("There is {:.10f}% chance of winning the lottery with {} tickets".format(n_tickets/total_combinations, n_tickets))

In [29]:
tickets = [1, 10, 100, 10000, 1000000, 6991908, 13983816]
for ticket in tickets:
    multi_ticket_probability(ticket)

There is 0.0000000715% chance of winning the lottery with 1 tickets
There is 0.0000007151% chance of winning the lottery with 10 tickets
There is 0.0000071511% chance of winning the lottery with 100 tickets
There is 0.0007151124% chance of winning the lottery with 10000 tickets
There is 0.0715112384% chance of winning the lottery with 1000000 tickets
There is 0.5000000000% chance of winning the lottery with 6991908 tickets
There is 1.0000000000% chance of winning the lottery with 13983816 tickets


In most 6/49 lotteries there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. As a consequence, the users might be interested in knowing the probability of having two, three, four, or five winning numbers.

In [39]:
def probability_less(n):
    n_combinations = combinations(6, n)
    remaining_combinations = combinations(43, 6-n)
    total = n_combinations * remaining_combinations
    
    total_combinations = combinations(49, 6)
    proba = (total/total_combinations)*100
    print("There is {:.10f}% chance of winning with ticket having {} winning number".format(proba, n))
    

In [40]:
for number in [2,3,4,5]:
    probability_less(number)

There is 13.2378029002% chance of winning with ticket having 2 winning number
There is 1.7650403867% chance of winning with ticket having 3 winning number
There is 0.0968619724% chance of winning with ticket having 4 winning number
There is 0.0018449900% chance of winning with ticket having 5 winning number


- one_ticket_probability() — calculates the probability of winning the big prize with a single ticket
- check_historical_occurrence() — checks whether a certain combination has occurred in the Canada lottery data set
- multi_ticket_probability() — calculates the probability for any number of of tickets between 1 and 13,983,816
- probability_less_6() — calculates the probability of having two, three, four or five winning numbers