# Lottery Addiction App

In this project, we will be working with a fictitious medical institute that wants to help gambling addicts by building a dedicated mobile app that estiamtes their chances of winning. We will be tasked with creating the logical core of the app and calculate the probabilities. 

For the first version, we will focus on the 6/49 lottery in Canada and answer questions like: 
* What's the probability of winning the big prize with one ticket?
* What's the probability of winning the big prize with 40 different tickets? 
* What's the probability of having at least five (or four, or three, or two) winning numbers on a single ticket? 

We'll be using historical data from the 6/49 lottery in canada sourced from a [kaggle data set](https://www.kaggle.com/datascienceai/lottery-dataset). It contains 3,665 drawings from 1982 - 2018. 

To calculate probabilities and combinations repeatedly, we'll first have to create two helper functions that we can use often: 
* A factorial function
* a combination function

The lottery is played **without** replacement, so no numbers can be reused. It also doesn't matter what order the numbers are drawn in. 

In [1]:
# factorial helper function
def factorial(n):
    final_product = 1
    for i in range(n,0,-1):
        final_product *= i
    return final_product

# combination helper function
def combinations(n, k):
    return factorial(n) / (factorial(k)*factorial(n-k))

## One-ticket probability

For the first version of the app, we want players to be able to calculate the probability of winning the big prize with a single ticket. In the 6/49 lottery, 6 numbers are chosen from a set of 49 numbers, ranging from 1 to 49. A player only wins if all six numbers match the ticket. 

The engineers want us to keep in mind that:
* The user will input six different numbers from 1 to 49
* The numbers will come as a Python list, which will serve as a single input to our function
* We must print the probability value in a friendly way so that anyone can understand it. 

Our function `one_ticket_probability()` will take in a single list of numbers and calculate the probability of winning the 6/49 lottery. The list should be six numbers long. 

In [63]:
# calculating one ticket probability
def one_ticket_probability(nums):
    outcomes = combinations(49, len(nums))
    probability = 1/outcomes
    percentage = probability*100
    
    print(f'A ticket with the following numbers: {nums} \nhas a {percentage:.10f}% chance of winning.\n\n\
This means you have a 1 in {1/probability:,.0f} chance of winning.')

In [64]:
# test one
one_ticket_probability([1,5,3,23,2,41])

A ticket with the following numbers: [1, 5, 3, 23, 2, 41] 
has a 0.0000071511% chance of winning.

This means you have a 1 in 13,983,816 chance of winning.


In [65]:
# test two
one_ticket_probability([10,12,2,1,43,32])

A ticket with the following numbers: [10, 12, 2, 1, 43, 32] 
has a 0.0000071511% chance of winning.

This means you have a 1 in 13,983,816 chance of winning.


## Exploring the lottery's historical data

We will also want to be able to compare a user's ticket against the historical lottery data in Canada and see if they would have won by now. 

We'll start with reading in the dataset and getting familiar with it. 

In [78]:
import pandas as pd

# reading in historical lottery data
lottery_data = pd.read_csv('649_lottery_data.csv')
print(f'Rows: {lottery_data.shape[0]}\nColumns: {lottery_data.shape[1]}')

Rows: 3665
Columns: 11


### First and last three rows of data

In [77]:
# first three rows
lottery_data.head(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
0,649,1,0,6/12/1982,3,11,12,14,41,43,13
1,649,2,0,6/19/1982,8,33,36,37,39,41,9
2,649,3,0,6/26/1982,1,6,23,24,27,39,34


In [76]:
# last three rows
lottery_data.tail(3)

Unnamed: 0,PRODUCT,DRAW NUMBER,SEQUENCE NUMBER,DRAW DATE,NUMBER DRAWN 1,NUMBER DRAWN 2,NUMBER DRAWN 3,NUMBER DRAWN 4,NUMBER DRAWN 5,NUMBER DRAWN 6,BONUS NUMBER
3662,649,3589,0,6/13/2018,6,22,24,31,32,34,16
3663,649,3590,0,6/16/2018,2,15,21,31,38,49,8
3664,649,3591,0,6/20/2018,14,24,31,35,37,48,17


We can see that the drawn numbers are in separate columns:
* NUMBER DRAWN 1
* NUMBER DRAWN 2
* NUMBER DRAWN 3
* NUMBER DRAWN 4
* NUMBER DRAWN 5
* NUMBER DRAWN 6

## Creating function for historical data checking

