In many real world applications, we don't know the probability of events, but we would like to estimate this unknown probability based on our hypothesis and collected evidence. Seems rational, right? 

This is the basis for Bayes' Rule, where we:
1. Identify possible models and construct prior probabilities (based on our knowledge or beliefs)
2. Collect data and create liklihoods, or the chance of getting this data given each model 
3. Use Bayes' rule to find posterior probabilities and update our knowledge


Let's work through an example and describe some of the particulars as we go.
Say you're the city manager of Austin planning a free concert. Woah big city! And a heck of a music scene! This seems like a big task, and you want to know about how many people will come so you can plan ahead and order the right amount of food. You don't need an exact headcount because ordering is done in bulk. So you would like to know the approximate proportion of the city that will show up in 10% groups (10%, 20%, 30%, etc). Because you've been the city manager for a few years, you have some prior knowledge of these concerts and believe at least 20% of the town will show up, but not more than 90%. You also believe the 60% and 70% groups are 3X as likely as the others. What is the estimated proportion of the city that will show up? 

You're pretty savvy, and send out an email asking if people will attend the concert. You send it out to 200 people and 123 respond 'yes' and 73 respond 'no'. Now you know this proportion of attendance (123/200 = 0.615) is not the real estimate because surveys are not always accurate and people do change their minds. So, you want to employ some statistics to help you estimate the most likely proportion of people that will come.

Let's define this problem in Bayes terms.

1. Identify possible models
The models are the proportion of the town that will show up. We want to the probability that each 'model' is true so we can determine the most likely scenario. The models are: p=0.3, p=0.4, p=0.5, p=0.6, p=0.6, p=0.8 and p=0.9

2. 

In [1]:
123/200

A lot of times we have some knowledge or beliefs about something, and then we collect evidence and update our beliefs. This is the basic scientific method, and also sums up Baye's theory.

Instead of knowing absolute probabilities of an event, we use the information we have at hand and our prior belief of the situation to update our belief (now called a posterior).

Let's explore how Bayes makes use of these prior beliefs to help us better estimate things are are interested in.

For example, we want to know the average height of women in the U.S. We have know prior knowledge in the area, so we assume that heights can be any positive value. We sample 10 women and the mean and standard devations of heights (in decimal feet 5.5 = 5'6") are:


In [4]:
options(repos = c('https://cloud.r-project.org/'))
install.packages(c("ggplot2", "dplyr", "sciplot", "repr", "VennDiagram"))

heights=c(5,5.2,5.7,5.2,5.8,5.10,5.1,5.3,6.,5.5) #heights in ft decimal feet 5'6" = 5.5 ft
mean.height=mean(heights)
sd.height=sd(heights)

mean.height
round(sd.height,2)

package 'ggplot2' successfully unpacked and MD5 sums checked
package 'dplyr' successfully unpacked and MD5 sums checked
package 'sciplot' successfully unpacked and MD5 sums checked
package 'repr' successfully unpacked and MD5 sums checked
package 'VennDiagram' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\nrb75\AppData\Local\Temp\RtmpSmnMwB\downloaded_packages


Now, whats the probabiity of a certain height given these measurements? We can create a distribution for these heights


For example, we believe that the average height of women is 5'5" and that these heights are normally distributed. We collect some data from 10 women, and are now going to check our prior hypothesis and update it to a posterior estimate. We do this by creating a posterior distribution based on our belief the average is 5'5" and liklihood of that being true based on our collected data.

The measured mean height =

the measured standard deviation of height =