### Bayes Theorem

The cigar example in the lab illustrates the application of Bayes' theorem with its calculation using the formula. 
Unfortunately, that calculation is complicated and can cause confusion and/or incorrect substitution of the involved
probability values. 
Fortunately, here is another approach that is much more intuitive and easier:

Assume some convenient value for the total of all items involved, 
then construct a table of rows and columns with the individual cell frequencies based on the known probabilities.

For example, let's assume that the adult population in Boone County, Missouri is 100,000. 
Now we can use the given information to create a table.

*Number of males who smoke cigars:* 
51% of adults are males; so there are 51,000 males. 
If 9.5% of them smoke, that makes 0.095 x 51,000 = 4845. 
Then, males who do not smoke are 51,000 - 4845 = 46,155.
See the table where these values go.


*Number of females who smoke cigars:* 49% of the adults are females, that makes 49,000. 
1.7% of them are smokers, so 0.017 x 49,000 = 833. 
The number of females who do not smoke is 49,000 - 833 = 48,167. 
Again look at the table below. 

In [36]:
cigar <- matrix(c(4845, 833, 46155, 48167), ncol=2)
colnames(cigar) <- c('smoker','nonsmoker')
rownames(cigar) <- c('male','female')
cigar.table <- as.table(cigar)

addmargins(cigar)

Unnamed: 0,smoker,nonsmoker,Sum
male,4845,46155,51000
female,833,48167,49000
Sum,5678,94322,100000


In [37]:
c <- matrix(c(4845, 833, 46155, 48167), ncol = 2)
c

0,1
4845,46155
833,48167


In [38]:
colnames(c) <- c("smoker", "nonsmoker")

In [39]:
c

smoker,nonsmoker
4845,46155
833,48167


In [40]:
rownames(c) <- c("male", "female")
c

Unnamed: 0,smoker,nonsmoker
male,4845,46155
female,833,48167


In [41]:
addmargins(c)

Unnamed: 0,smoker,nonsmoker,Sum
male,4845,46155,51000
female,833,48167,49000
Sum,5678,94322,100000


The above table involves simple arithmetic. 
Simply partition the assumed population into the different cell categories by finding suitable percentages.

Now we can easily address the key question as follows: 
To find the probability of getting a male subject, given that the subject smokes cigars, 
simply use the same conditional probability described before. 

To find the probability of getting a male given that the subject smokes, 
restrict the table to the column of cigar smokers, 
then find the probability of getting a male in that column.
Among the 5678 cigar smokers, there are 4845 males, so the probability we seek is 4845/5678 = 0.85329341. 
That is, $P(M | C)$ = 4845/5678 = 0.85329341 = 0.853 (rounded).

The actual population of Boone County, Missouri is 170,733 (as of 2013).
Create the above table with actual population values for the given percentages and find the actual $P(M | C)$.

In [42]:
boone <- 170733

In [43]:
male_boone_pct <- 0.51
female_boone_pct <- 0.49

In [44]:
male_boone <- (male_boone_pct * boone)
male_boone

In [45]:
female_boone <- (female_boone_pct * boone)
female_boone

In [46]:
male_smoke_pct <- 0.095
female_smoke_pct <- 0.017

In [47]:
male_smoke_boone <- (male_smoke_pct * male_boone)
male_smoke_boone

In [48]:
male_nonsmoke_boone <- (1-male_smoke_pct) * male_boone
male_nonsmoke_boone

In [49]:
female_smoke_boone <- (female_smoke_pct * female_boone)
female_smoke_boone

In [50]:
female_nonsmoke_boone <- (1 - female_smoke_pct) * female_boone
female_nonsmoke_boone

In [51]:
# Add your code here
# --------------------

boone_smoke_table <- matrix(c(male_smoke_boone, female_smoke_boone, male_nonsmoke_boone, female_nonsmoke_boone), ncol = 2)
boone_smoke_table

0,1
8272.014,78801.82
1422.206,82236.96


In [52]:
colnames(boone_smoke_table) <- c("Smoker", "Nonsmoker")
rownames(boone_smoke_table) <- c("Male", "Female")

In [53]:
boone_smoke_table.table <- as.table(boone_smoke_table)

In [54]:
addmargins(boone_smoke_table) 

# total = 170,733 

Unnamed: 0,Smoker,Nonsmoker,Sum
Male,8272.014,78801.82,87073.83
Female,1422.206,82236.96,83659.17
Sum,9694.22,161038.78,170733.0


In [55]:
boone_smoke_table

Unnamed: 0,Smoker,Nonsmoker
Male,8272.014,78801.82
Female,1422.206,82236.96


In [56]:
addmargins(prop.table(boone_smoke_table))

Unnamed: 0,Smoker,Nonsmoker,Sum
Male,0.04845,0.46155,0.51
Female,0.00833,0.48167,0.49
Sum,0.05678,0.94322,1.0


In [57]:
p1 <- prop.table(boone_smoke_table, 1)
addmargins(p1)

Unnamed: 0,Smoker,Nonsmoker,Sum
Male,0.095,0.905,1
Female,0.017,0.983,1
Sum,0.112,1.888,2


In [58]:
p2 <- prop.table(boone_smoke_table, 2)
addmargins(p2)

Unnamed: 0,Smoker,Nonsmoker,Sum
Male,0.8532934,0.4893344,1.3426278
Female,0.1467066,0.5106656,0.6573722
Sum,1.0,1.0,2.0


In [59]:
addmargins(boone_smoke_table)

Unnamed: 0,Smoker,Nonsmoker,Sum
Male,8272.014,78801.82,87073.83
Female,1422.206,82236.96,83659.17
Sum,9694.22,161038.78,170733.0


In [60]:
# P(M|C)
# probability of male given cigar smoker

# 85.3%

(8272.014)/(9694.220)

a) Now, using the same table, randomly select an individual, what is the prior probability that the selected person is a female?

b) You later learn that the randomly selected person was smoking a cigar. 
Use this additional information to find the posterior probability that the selected person is a female.

In [61]:
# A

(83659.17) / (170733.00)

In [62]:
# B

# 0.146

(1422.206) / (9694.220)


Load the framingham data from the directory '/datasets/framingham'.

In [63]:
framingham_data <- read.csv("datasets/framingham/framingham.csv")
head(framingham_data)

male,age,education,currentSmoker,cigsPerDay,BPMeds,prevalentStroke,prevalentHyp,diabetes,totChol,sysBP,diaBP,BMI,heartRate,glucose,TenYearCHD
1,39,4,0,0,0,0,0,0,195,106.0,70,26.97,80,77,0
0,46,2,0,0,0,0,0,0,250,121.0,81,28.73,95,76,0
1,48,1,1,20,0,0,0,0,245,127.5,80,25.34,75,70,0
0,61,3,1,30,0,0,1,0,225,150.0,95,28.58,65,103,1
0,46,3,1,23,0,0,0,0,285,130.0,84,23.1,85,85,0
0,43,2,0,0,0,0,1,0,228,180.0,110,30.3,77,99,0


Two-way table from this data set with diabetes condition in the columns and gender in the rows. Use addmargins to add totals.


In [64]:
dia <- with(framingham_data, table(male,diabetes))
colnames(dia) <- c('nondiabetes','diabetes')
rownames(dia) <- c('female','male')
addmargins(dia)

Unnamed: 0,nondiabetes,diabetes,Sum
female,2363,57,2420
male,1768,52,1820
Sum,4131,109,4240


What is the probability for a female to have diabetes?
Let **d** be an event of diabetes and **d'** be event of nondiabetes.
Similarly let g be the event of male and g' be event of female. 
Find $P(d | g')$ using Bayes formula.

            
                    p(d) * p(g'/d)
     p(d/g') =  -------------------------------------
               [p(d) * p(g'/d)] + [ p(d') * p(g'/d')]
            

In [65]:
addmargins(prop.table(dia))

Unnamed: 0,nondiabetes,diabetes,Sum
female,0.5573113,0.0134434,0.5707547
male,0.4169811,0.01226415,0.4292453
Sum,0.9742925,0.02570755,1.0


In [66]:
addmargins(prop.table(dia, 1))

Unnamed: 0,nondiabetes,diabetes,Sum
female,0.9764463,0.02355372,1
male,0.9714286,0.02857143,1
Sum,1.9478749,0.05212515,2


In [67]:
addmargins(prop.table(dia, 2))

Unnamed: 0,nondiabetes,diabetes,Sum
female,0.5720165,0.5229358,1.0949522
male,0.4279835,0.4770642,0.9050478
Sum,1.0,1.0,2.0


In [68]:
# p(Female|Diabetes)

# 0.5229358

(57/109)

In [69]:
# p(Diabetes|Female)

# 0.02355372
(57/2420)