# Probability Theory
Probability theory is a branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms.

Probability is a ratio of the number of probabilities that meet the given condition to the number of equally likely possibilities (i.e. P(heads on coin toss) = 1 chance of heads / 2 options (heads or tails) = ½. In probability theory, an event is a set of outcomes of an experiment to which a probability is assigned. If E represents an event, then P(E) represents the probability that E will occur. A situation where E might happen (success) or might not happen (failure) is called a trial. This event can be anything like tossing a coin, rolling a die or pulling a colored ball out of a bag. In these examples the outcome of the event is random, so the variable that represents the outcome of these events is called a random variable.
- The **empirical probability** of an event is given by number of times the event occurs divided by the total number of incidents observed. If forntrials and we observe ssuccesses, the probability of success is s/n. In the above example. any sequence of coin tosses may have more or less than exactly 50% heads.
- **Theoretical probability** on the other hand is given by the number of ways the particular event can occur divided by the total number of possible outcomes. So a head can occur once and possible outcomes are two (head, tail). The true (theoretical) probability of a head is 1/2.
- **Joint Probability** is the probability of events A and B denoted by P(A and B) or P(A ∩ B) is the probability that events A and B both occur. P(A ∩ B) = P(A). P(B) . This only applies if Aand B are independent, which means that if A occurred, that doesn’t change the probability of B, vice versa. The probability of the intersection of A and B may be written P(A ∩ B). *Example: What is the probability that a drawn card is a red four? There are two red fours in a deck of 52, the 4 of hearts and the 4 of diamonds, therefore P(four and red) = 2/52=1/26.*
- **Conditional Probability** suggests A and B are not independent, because if A occurred, the probability of B is higher. When A and B are not independent, it is often useful to compute the conditional probabiliuty, P(A|B), which is the probability of A given that B occurred: 
    - P(A|B) = P(A ∩ B)/ P(B) or similarly,  P(B|A) = P(A ∩ B)/ P(A) . We can write the joint probability of as A and B as P(A ∩ B)= p(A).P(B|A), which means : *“The chance of both things happening is the chance that the first one happens, and then the second one given the first happened.”*
    - https://youtu.be/bgCMjHzXTXs
    - https://youtu.be/ES9HFNDu4Bs
    - https://mithunmanohar.medium.com/machine-learning-101-what-the-is-a-conditional-probability-f0f9a9ec6cda
    - https://seeing-theory.brown.edu/basic-probability/index.html#section1
- **Marginal Probability** is the probability of an event occurring P(A) . We can think of it as an unconditional probability. It is not conditioned by another event. *Example: The probability that a drawn card is red P(red) = 0.5.*

## Probability Distribution
Probability distributions describe the dispersion of the values of a random variable. Consequently, the kind of variable determines the type of probability distribution. For a single random variable, statisticians divide distributions into the following two types:
- Probability mass functions for discrete variables (PMF)
- Probability density functions for continuous variables (PDF)

## Data Types
- Discrete data can take only specified values i.e. roll of a dice is 1, 2, 3, 4, 5, or 6 not 1.5
- Continuous data can take any value within a given range, finite or infinite i.e. a person’s height/weight

Online Resources:
* https://towardsdatascience.com/machine-learning-probability-statistics-f830f8c09326
* https://youtu.be/uzkc-qNVoOk

## Bayes Theorem
A relationship between the conditional probabilities of two events. For example: selling ice cream on a hot sunny day, BT uses prior knowledge of likelihood selling on other days (rainy, windy, snowy, etc.)	

<img src='images/bayes.png'>

* where H and E are events, P(H|E) is the conditional probability that event H occurs given E occurred
* Probability P(H) is basically frequency analysis; given our prior data what is the probability of it occurring
* P(E|H) is called the likelihood, the probability that the evidence is correct, given info from freq. analysis.
* P(E) is the probability that the actual evidence is true.

In example: H represents event selling ice cream. E is the event of weather. P(H) is the marginal probability of prior sales of ice cream regardless of weather.

Online Resources:
* https://www.mathsisfun.com/data/bayes-theorem.html
* https://youtu.be/HZGCoVF3YvM

## Exercise 1
In Orange County, 51% of the adults are males. (It doesn't take too much advanced
mathematics to deduce that the other 49% are females.) One adult is randomly selected
for a survey involving credit card usage.

- **(a)** Find the probability that the selected person is a male.

- **(b)** It is later learned that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke cigars (based on data from the Substance Abuse and Mental Health Services Administration). Use this additional information to find the probability that the cigar−smoking respondent is a male

Use following notation:
M = male <br>
F = female <br>
C = cigar smoker<br>
NC = not a cigar smoker<br>

In [1]:
import numpy as np
#resident count of Orange County
OC = 100000

# list probability variables
p_M = 0.51
p_F = 0.49
p_M_C = 0.095
p_M_NC = 0.905
p_F_C = 0.017
p_F_NC = 0.983

M = 0
F = 0
M_C = 0
M_NC = 0

In [2]:
# problem a, solve for p of selected person = male
for _ in range(OC):
    if np.random.random() < p_M:
        M += 1
    else:
        F += 1
        
sel_M = M / OC * 100
print(f'Out of {OC} residents, probability that selected adult is male: {sel_M}%')

Out of 100000 residents, probability that selected adult is male: 51.054%


In [3]:
# problem b, selected person is male AND cigar smoker
# first attempt: use already M males determined from #a to see if they smoke cigars C
# additional solve option: check for male/female and smoker/non for both genders altogether without using #a

for _ in range(M):
    if np.random.random() < p_M_C:
        M_C += 1
    else:
        M_NC += 1

sel_M_C = M_C / M * 100
print(f'Probability that selected male is also cigar smoker: {sel_M_C}%')

Probability that selected male is also cigar smoker: 9.621185411525051%


# Exercise 2

A diagnostic test has a probability 0.95 of giving a positive result when applied to a person suffering
from a certain disease, and a probability 0.10 of giving a (false) positive when applied to a non-sufferer. It is
estimated that 0.5 % of the population are sufferers. Suppose that the test is now administered to a person about
whom we have no relevant information relating to the disease (apart from the fact that he/she comes from this
population). 

Calculate the following probabilities:
- **(a)** that the test result will be positive;
- **(b)** that, given a positive result, the person is a sufferer;
- **(c)** that, given a negative result, the person is a non-sufferer;
- **(d)** that the person will be misclassified.

Use following notation:

T = test positive <br>
NT = test negative<br>
S = sufferer<br>
NS = non-sufferer<br>
M = misclassified<br>

Solve it by two approaches:
1. Arithmetically
2. By simulation

In [4]:
# arithmetically
print('Results from arithmetic calculations:')
pT = 0.95
pNT = 0.05
pM = 0.10
pS = 0.005
pNS = 0.995

# P(S|T) = P(S) * P(T|S) / P(T)
#P(S) = 0.5% = 0.005
#P(T|S) = 0.5% * 95% = 47.5% = 0.475
#P(NS) = 99.5% = 0.995
#P(T|NS) = M = 10% = 0.100

# test is positive
a1 = ((pS*pT) + (pNS*pM)) * 100
print(f'Test result is positive: {a1}%')

# given positive, person is sufferer P(S|T)
b1 = (pS*pT) / ((pS*pT) + (pNS*pM)) * 100
print(f'Of positive test result, person is sufferer: {b1}%')

# given negative, person is non-sufferer
c1 = (pNS * (1-pM)) / ((pNS*(1-pM)) + (pS*pNT))  * 100
print(f'Of negative test result, person is non-sufferer: {c1}%')

# person misclassified
d1 = ((pNS*pM) + (pS*pNT)) # out of 100% of cases * 100% so cancels out
print(f'Chance person is misclassified: {d1}%')

Results from arithmetic calculations:
Test result is positive: 10.425%
Of positive test result, person is sufferer: 4.5563549160671455%
Of negative test result, person is non-sufferer: 99.97209042701647%
Chance person is misclassified: 0.09975%


In [5]:
# simulation
print('Results from simulation calculations:')
T = 0
NT = 0
S = 0
NS = 0
S_T = 0
S_NT = 0
NS_T = 0
NS_NT = 0
pop = 100000 # simulation test runs

for _ in range(pop):
    
    # person is sufferer
    if np.random.random() < pS:
        S += 1
        
        #result of testing
        if np.random.random() < pT:
            S_T += 1
            T += 1
        else:
            S_NT += 1
            NT += 1
            
    # person not sufferer
    else:
        NS += 1
        
        #result of testing
        if np.random.random() < pM:
            NS_T += 1
            T += 1
        else:
            NS_NT += 1
            NT += 1

p_S = S / pop * 100 # P(S)
p_T_S = S_T / pop * 100 # P(S|T)
p_NT_S = S_NT / pop *100 # P(S|NT)

p_NS = NS / pop * 100 # P(NS)
p_T_NS = NS_T / pop * 100 # P(NS|T)
p_NT_NS = NS_NT / pop * 100 # P(NS|NT)

p_T = T / pop * 100 # P(T)
p_NT = NT / pop * 100 # P(NT)

# test is positive
a2 = p_T
print(f'Test result is positive: {a2}%')

# given positive, person is sufferer P(S|T)
b2 = (p_T_S)/((p_T_S)+(p_T_NS))*100
print(f'Of positive test result, person is sufferer: {b2}%')

# given negative, person is non-sufferer
c2 = p_NT_NS/(p_NT_NS+p_NT_S) * 100
print(f'Of negative test result, person is non-sufferer: {c2}%')

# person misclassified
d2 = (p_T_NS + p_NT_S)/100
print(f'Chance person is misclassified: {d2}%')

Results from simulation calculations:
Test result is positive: 10.397%
Of positive test result, person is sufferer: 4.424353178801578%
Of negative test result, person is non-sufferer: 99.97321518252737%
Chance person is misclassified: 0.09960999999999999%


In [6]:
print('TOTAL SUMMARY COMPARISON\n')
print('Results from arithmetic calculations:')
print(f'Test result is positive: {a1}%')
print(f'Of positive test result, person is sufferer: {b1}%')
print(f'Of negative test result, person is non-sufferer: {c1}%')
print(f'Chance person is misclassified: {d1}%')
print('\nResults from simulation calculations:')
print(f'Test result is positive: {a2}%')
print(f'Of positive test result, person is sufferer: {b2}%')
print(f'Of negative test result, person is non-sufferer: {c2}%')
print(f'Chance person is misclassified: {d2}%')

TOTAL SUMMARY COMPARISON

Results from arithmetic calculations:
Test result is positive: 10.425%
Of positive test result, person is sufferer: 4.5563549160671455%
Of negative test result, person is non-sufferer: 99.97209042701647%
Chance person is misclassified: 0.09975%

Results from simulation calculations:
Test result is positive: 10.397%
Of positive test result, person is sufferer: 4.424353178801578%
Of negative test result, person is non-sufferer: 99.97321518252737%
Chance person is misclassified: 0.09960999999999999%
