# Birthday Problem
____

### Imports

In [67]:
import math
import numpy as np
import random
import plotly.graph_objs as go
import plotly.plotly as py
import pandas as pd

### Simulation

Below we will create a definition that simulates this problem. The simulation will consist of days of the year being numbered from 1-366. Three lists of 1-365 and one list from 1-366 will be joined to simulate leap year. There will be an empty list called birthdays, where we will append x random days of the year that represent the birthday of x individuals. After that, we have a list of x birthdays we check to see if any of the numbers are repeated. This process will be repeated n = 1000000  times to ensure that the results are reliable. 

In [68]:
def birthday_problem(people, n=1000000): 
    days = []
    for i in range(4):
        if i == 0:
            days.extend([i for i in range(1,367)])    #Days of a leap year in numerical order drom 1 - 366
        else:
            days.extend([i for i in range(1,366)])    #Days of 3 regular years in numerical order from 1 - 365
    success = 0                             #Number of successes and failures to tally up conversion rate
    fail = 0                         
    for _ in range(n):                      #Loop that iterates n times where n is the number of trials we will observe
        birthdays = []                      
        for _ in range(people):             #Creating a list of birtdhays by chosing randoms days out of the year
            birthdays.append(random.choice(days))
        same = [x for x in birthdays if birthdays.count(x) > 1] #Filtering out birthdays that are repeated
        if len(same)>0:
            success+=1
        else:
            fail+=1

    prob_yes = round(success/(n)*100,2)
    if prob_yes == 100 and people < 368:    #To avoid false rounting to 100%.
        prob_yes = 99.99
    
    print(f'Based on one million trials, the probability that there is a shared birthday in a room of {people} people is {prob_yes}%\nSuccesses: {success}\nFailures: {fail}\n')
    return prob_yes

Now that we have the simulation defined, we can now test the initial statement that says: "If there are 23 people in a room, the probability that there is one shared birthday is around 50%"

In [69]:
birthday_problem(23)

Based on one million trials, the probability that there is a shared birthday in a room of 23 people is 50.69%
Successes: 506872
Failures: 493128



50.69

As we can see above, when a room of 23 people is simulated we get a birthday match around 50.69% of the time. Since variability is expected, our result is only slightly smaller than our theoretical 50.73%, however close enough to accept the results. 

Now, we will simulate the birthday problem with 5-70 people in increments of 5

In [70]:
people = [5,10,15,20,25,30,35,40,45,50,55,60,65,70]
percentages = []
for i in people:
    a = birthday_problem(i)
    percentages.append(a)

Based on one million trials, the probability that there is a shared birthday in a room of 5 people is 2.7%
Successes: 27020
Failures: 972980

Based on one million trials, the probability that there is a shared birthday in a room of 10 people is 11.74%
Successes: 117356
Failures: 882644

Based on one million trials, the probability that there is a shared birthday in a room of 15 people is 25.2%
Successes: 252010
Failures: 747990

Based on one million trials, the probability that there is a shared birthday in a room of 20 people is 41.22%
Successes: 412173
Failures: 587827

Based on one million trials, the probability that there is a shared birthday in a room of 25 people is 56.81%
Successes: 568132
Failures: 431868

Based on one million trials, the probability that there is a shared birthday in a room of 30 people is 70.62%
Successes: 706152
Failures: 293848

Based on one million trials, the probability that there is a shared birthday in a room of 35 people is 81.41%
Successes: 814110
F

### Visualizing Results

In [121]:
bins = [0,25,50,75,100]
labels = ['Very Unlikely', 'Unlikely','Likely','Very Likely']
colors = {
    'Very Unlikely':'orangered',
    'Unlikely': 'orange',
    'Likely':'lightgreen',
    'Very Likely':'darkgreen'
}

df = pd.DataFrame({'y':percentages,
                   'x': people,
                   'label':pd.cut(percentages, bins=bins, labels=labels)})

data = []
for label, label_df in df.groupby('label'):
    data.append(go.Bar(x = label_df.x,
                       y = label_df.y,
                       name = label,
                       marker = {'color': colors[label]}))
    
    

layout = go.Layout(
    title='Probability Of Shared Birthday',
    xaxis=dict(
        title='Number of People',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='Probability (%)',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)  
fig = go.FigureWidget(data=data, layout=layout)
#plot_url = py.plot(fig, filename='styling-names')

In [122]:
go.FigureWidget(data=data, layout=layout)

FigureWidget({
    'data': [{'marker': {'color': 'orangered'},
              'name': 'Very Unlikely',
        …

### Comparing Simmulation vs Theory

In [103]:
theoretical_results= []
for i in people:
    theoretical_results.append(round((1 - (math.factorial(365)/(math.factorial(365-i)*365**i)))*100,2))

In [120]:
comparison = pd.DataFrame()
comparison['Number of people in a room'] = people
comparison['Theoretical results'] = theoretical_results
comparison['Simulated results'] = percentages
comparison['Difference'] = abs(comparison['Theoretical results'] - comparison['Simulated results'])

In [118]:
comparison

Unnamed: 0,Number of people in a room,Theoretical results,Simulated results,Difference
0,5,2.71,2.7,0.01
1,10,11.69,11.74,0.05
2,15,25.29,25.2,0.09
3,20,41.14,41.22,0.08
4,25,56.87,56.81,0.06
5,30,70.63,70.62,0.01
6,35,81.44,81.41,0.03
7,40,89.12,89.07,0.05
8,45,94.1,94.08,0.02
9,50,97.04,97.03,0.01
