# Welcome to STAT 515!

In STAT 515, we will be covering the fundamentals of probability with a view towards statistical inference. Probability is a collection of tools for helping us to reason quantitatively about uncertain events. This type of reasoning is becoming increasingly imporant in many disciplines.

Let's consider two questions:

+ What are the chances that at least two people in this class have the same birthday? 

+ What are the chances that someone in this class has the same birthday as you? 

Are these two questions asking about the same probability or are they different?

### [Same or different?](https://www.menti.com/d2a755) [menti poll]

Let's take a minute to talk about *why*.

## How does group size effect the chances of having at least one match?

Now let's try to answer the following question:

+ How large of a group do we need to have a better than 50% chance of having a match? 

Here by a match we mean that at least two people in the room have matching birthdays (MM/DD). We are ignoring leap years, twins, etc. We are interested in knowing the smallest number of people for which a match happens with probability greater than or equal to 0.50. Let's not worry too much about about how to compute the probability of having a match. That will come later in the course! Right now, let's just try to think about what a *sensible* answer might look like. 

If there are 365 days in a year, then will the total number of people need to be close to 365?

Perhaps 365 people is far too many... Would 182 people--about half of 365--be sufficient to have a birthday match?

How about 50 people---would that be enough to have a better than 50% chance of having a match? What is the *smallest number of people* that would be enough to fulfill the condition (having a better than 50% chance of having a match)?

### [What is the smallest number of people that would be enough to fulfill the condition?](https://www.menti.com/d2a755) [menti poll]

Let's take a minute to talk about whether our answers are reasonable estimates?

Below, we write some code that describes the probability of having at least one match (at least two people with the same birthday). The function `pmatches` takes a group size (a non-negative integer) and returns the desired probability (a real number btween zero and one).

In [1]:
%matplotlib inline
import numpy as np

# the following function pmatches takes a number of people and returns the probability of having at least two matches
def pmatches(N):
    return 1-np.prod(np.arange(365,365-N,-1,dtype=float))/365**N # this is 1 minus the probability of having no matches

At present there are 42 students (plus an Eric) in this section of STAT 515. 

In [2]:
pmatches(42+1)

0.9239228556561199

That line above gives the probability of having two matches in this section of STAT 515!!

Remember that the probability is telling us something about the frequency with which the event 'at least two matches in a group of size 41' occurs. Although the probability is high (close to 1), *it is not a guarantee*. That is, if Eric teaches Stat 515 from now until the end of time (with the same sized class), there will be two matches about 90% of the time. While it is highly likely that two people in your Stat 515 class today have the same birthday, we won't know until we take a little poll (and look at a statistic). 

### [Please enter your birthday as MM/DD.](https://www.menti.com/d2a755) [menti poll]

Below is a plot of the probability of having at least one match (i.e. two people with the same birthday) versus the number of people in the group. I've also plotted the probability of having no matches for comparison.

In [3]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot, offline
init_notebook_mode(connected=True)

In [4]:
N=75
numpeople = np.arange(0,N)
prob = np.array([pmatches(i) for i in range(0,N)]) 
prob_none = np.array([(1-pmatches(i)) for i in range(0,N)])

# Create a trace
trace = go.Scatter(
    x = numpeople,
    y = prob,
    name='≥ 1 match'
)
trace2 = go.Scatter(
    x = numpeople,
    y = prob_none,
    name='no matches'
)

data = [trace, trace2]
layout = go.Layout(
    title='Birthday problem',
    xaxis=dict(title='group size, in number of people'),
    yaxis=dict(title='probability'),
    showlegend=True
)

fig = go.Figure(data=data, layout=layout)
offline.iplot(fig, filename='birthday')

+ Use the plot above to find the smallest number of people for which the probability of having a match is at least 50%. 
+ Use the plot above to find the smallest number of people for which the probability of having a match is at least 75% and at least 25%.

*Hints*: `P(match in group of size 23) = 0.5073`, `P(match in group of size 15) = 0.2529`, `P(match in group of size 32) = 0.7533`.

*Answers*: `23`, `32`, `15`.

Hopefully this simple example has illustrateed that we need to learn some new mathematical tools if we want to reason quantitatively about uncertain events. Hopefully it has also illustrated that our intuitions can fool us. 