# Welcome to STAT 515

In STAT 515, we will be covering the fundamentals of probability with a view towards statistical inference. Probability is a collection of tools for helping us to reason quantitatively about uncertain events. This type of reasoning is becoming increasingly imporant in many disciplines.

Let's consider two questions:

+ What are the chances that at least two people in this class have the same birthday? 

+ What are the chances that someone in this class has the same birthday as you? 

Are these two questions asking about the same probability or are they different?

### [Same or different?](https://www.menti.com/d2a755)


## How large of a group do we need to have a better than 50% chance of having at least one match?

That is (ignoring leap years, twins, etc) how many people would we need in our class to have a 50% chance of having at least one birthday match (dd-mm)?

Let's not worry too much about about how to compute the probability of having a match. That will come later in the course! Right now, let's just try to think about what a *sensible* answer might look like. 

If there are 365 days in a year, then will the total number of people need to be close to 365?

Perhaps 365 people is far too many... we only want a better than 50% chance of having a match.  Would 182 people--about half of 365--be sufficient to have a birthday match?

How about 50 people---would that be sufficient? What is the *smallest number of people* that would be sufficient?

### [What is the smallest number of people that would be sufficient to have a match?](https://www.menti.com/d2a755)

If you've never thought of this problem before, the answer might surprise you. 

Below, we write some code that describes the probability of having a match, i.e. at least two matching birthdays (mm-dd)) for a given group of size. The function `pmatches` takes a group size (a non-negative integer) and returns the desired probability (a real number btween zero and one).

In [129]:
%matplotlib inline
import numpy as np

# the following function pmatches takes a number of people and returns the probability of having two matches
def pmatches(N):
    return 1-np.prod(np.arange(365,365-N,-1,dtype=float))/365**N # this is 1 minus the probability of having no matches

At present there are 40 students (plus an Eric) in this section of STAT 515. 

In [131]:
pmatches(41)

0.9031516114817354

That line above gives the probability of having two matches in this section of STAT 515!!

Remember that the probability is telling us something about the frequency with which the event 'at least two matches in a group of size 41' occurs. Although the probability is high (close to 1), *it is not a guarantee*. That is, if Eric teaches Stat 515 from now until the end of time (with the same sized class), there will be two matches about 90% of the time. While it is highly likely that two people in your Stat 515 class today have the same birthday we won't know until we take a little poll (and look at a statistic). 

### [Please enter your birthday as MM/DD.](https://www.menti.com/d2a755)

Below is a plot of the probability of having a match versus the group size.

In [147]:
from bqplot import *
from bqplot.interacts import (IndexSelector)
from IPython.display import display
from ipywidgets import ToggleButtons, VBox, HTML, widgets

numpeople = range(80)
prob = list(map(pmatches, numpeople))

x_data = numpeople
y_data = prob

x_sc = LinearScale(min=0, dtype=int)
y_sc = LinearScale(min=0, max=1)

ax_x = Axis(label='group size', scale=x_sc, tick_format='0.0f')
ax_y = Axis(scale=y_sc, orientation='vertical', tick_format='0.2f')

line = Lines(x=x_data,
             y=y_data,
             scales={'x': x_sc, 'y': y_sc},
             colors=['blue'])
pt = Scatter(x=x_data,
             y=y_data,
             scales={'x': x_sc, 'y': y_sc},
             colors=['red'])

db_index = HTML(value='<h3>P(match in group of size 0) = 0.0</h3>')
## Now we try a selector made to display the y-value associated with a single x-value
index_sel = IndexSelector(scale=x_sc, marks=[line])
## Now we define a function that will be called when the selectors are interacted with
def index_change_callback(change):
    db_index.value = '<h3>P(match in group of size '+str(int(change.new))+') = '+str(pmatches(int(change.new)))+'</h3>'
index_sel.observe(index_change_callback, names=['selected'])
## Now we plot the figure
fig = Figure(axes=[ax_x, ax_y], marks=[line], interaction=index_sel)
widgets.VBox([fig, db_index])

+ Use the plot above to find the smallest number of people for which the probability of having a match is 50%. 
+ Use the plot above to find the smallest number of people for which the probability of having a match is 75% and 25%.

*Hints*: `P(match in group of size 23) = 0.5073`, `P(match in group of size 15) = 0.2529`, `P(match in group of size 32) = 0.7533`.

*Answers*: `23`, `15`, `32`.

Hopefully this simple example has illustrateed that we need to learn some new mathematical tools if we want to reason quantitatively about uncertain events. Hopefully it has also illustrated that our intuitions can fool us. 