# Frequentist vs Bayesian Approaches
QNC Assignment 1

The learning objective is to gain insights into thinking about inference from a "Frequentist" versus a "Bayesian" perspective. In brief, because a Frequentist does not consider the probability of an event or state of the world or hypothesis, only their frequency of occurrance, it is not possible to ask questions of the form "what is the probabilty that hypothesis x is true?" Instead, one can only consider questions of the form, "what is the probabilty that I would have obtained my data, given that hypothesis x is true?" In contrast, Bayesians consider the probabilities of such things (often called the strength of belief), but doing so can require making assumptions that can be difficult to prove.

Let's start with a simple example, taken from:
__[wikipedia](https://en.wikipedia.org/wiki/Base_rate_fallacy#Example_1:_HIV)__

"Imagine running an HIV test on A SAMPLE of 1000 persons ..."

"The test has a false positive rate of 5% (0.05)..." i.e., the probability that someone who takes the test gets a POSITIVE result despite the fact that the person does NOT have HIV

"...and no false negative rate." i.e., The probability that someone who takes the test gets a NEGATIVE result despite the fact that the person DOES have HIV.

In [41]:
#import libraries
import numpy as np
import pandas as pd
#import decimal as dec

## Exercise 1
*If someone gets a positive test, is it "statistically significant" at the p<0.05 level? Why or why not?*

Frequentist statistics define a result as "statistically significant" if the odds of the data being observed under the null hypothesis are below a given threshold, p. If these odds are very small, we assume the data didn't come from a distribution defined by the null hypothesis and that the alternative hypothesis is supported.

In this example, the null hypothesis is that the person does not have HIV, but we observe a positive test. The probability of this given the null is synonymous with the test's false positive rate, which we know to be 0.05. Therefore, at the cutoff of p<0.05, this result is technically not significant.

## Exercise 2
*What is the probability that if someone gets a positive test, that person is infected?*

This requires the Bayesian posterior probability, or probability that a hypothesis (someone has HIV) is true given the data (a positive test). 

$$
p(Infected|Positive) = \frac{p(Positive|Infected)xp(Infected)}{p(Positive)}
$$

To calculate this, we need to estimate a prior $p(Infected)$, or rate of people who are actually infected. __[About 7% of adults globally were living with HIV in 2021](https://www.kff.org/global-health-policy/fact-sheet/the-global-hivaids-epidemic/)__, so we'll estimate our prior as 0.07.
Because there's no false negative, we know $p(Positive|Infected)=1$, i.e. everyone who's infected will have a positive test.
The probability of having a positive test, $p(Positive)$, is the rate of true and false positives combined:

In [33]:
#setting known variables
N=1000
p_infect=0.07 #prior

truep=1000*p_infect #ppl infected based on prior
falsep=(N-truep)*.05 #5% of people who don't have HIV will test positive
p_pos=(truep+falsep)/N #convert from ppl to probability 
#rate of positive tests
print(p_pos)

0.1165


In [35]:
#Now we plug into Bayes Rule:
p_hypothesis=(1*p_infect)/p_pos
print("The probability of someone who has a positive test being infected is " + str(round(p_hypothesis,3)))

The probability of someone who has a positive test being infected is 0.601


*Following on Exercise 2, let's do the same thing, but this time we will try different values for the proportion of the population that is actually infected. What you should notice is that the PROPORTION INFECTED GIVEN A POSITIVE TEST depends (a lot!) on the OVERALL RATE OF INFECTION. Put another way, to determine the probabilty of a hypothesis, given your data (e.g., proportion infected given a positive test), you have to know the probability that the hypothesis was true without any data.*

*Why is this the case? It is a simple consequence of the definition of a conditional probability, formulated as Bayes' Rule. Specifically, the joint probability of two events, call them A and B, is defined as:*

$$
p(A and B) = p(A)xp(B|A)
$$
$$
p(B and A) = p(B)xp(A|B)
$$

*Now, calling A the Hypothesis and B the Data, then rearranging, we get:*

$$
p(Hypothesis|Data) = \frac{p(Data|Hypothesis)xp(Hypothesis)}{p(Data)}
$$

*So you cannot calculate the probability of the hypothesis, given the data (i.e., the Bayesian posterior), without knowing the probability of the hypothesis independent of any data (i.e., the prior).*

*For this exercise, assume a range of priors (infection rates) from 0 to 1 in steps of 0.1.*

In [45]:
#Now we'll use the same procedure as above, but iteratively updating our priors from 0 to 1 by 0.1
prior=0

# Iterating using while loop
while prior <= 1:
    truep_new=1000*prior
    falsep_new=(N-truep_new)*.05
    p_pos_new=(truep_new+falsep_new)/N #positivity rate=true positives + false positives
    p_hypothesis_new=(1*prior)/p_pos_new
    print("The probability that someone who tests positive is infected, given an infection rate of " + str(prior) + " is " + str(round(p_hypothesis_new,3)))
    prior = round(prior + 0.1,2) #update to new prior

The probability that someone who tests positive is infected, given an infection rate of 0 is 0.0
The probability that someone who tests positive is infected, given an infection rate of 0.1 is 0.69
The probability that someone who tests positive is infected, given an infection rate of 0.2 is 0.833
The probability that someone who tests positive is infected, given an infection rate of 0.3 is 0.896
The probability that someone who tests positive is infected, given an infection rate of 0.4 is 0.93
The probability that someone who tests positive is infected, given an infection rate of 0.5 is 0.952
The probability that someone who tests positive is infected, given an infection rate of 0.6 is 0.968
The probability that someone who tests positive is infected, given an infection rate of 0.7 is 0.979
The probability that someone who tests positive is infected, given an infection rate of 0.8 is 0.988
The probability that someone who tests positive is infected, given an infection rate of 0.9 is 0.