## Exercise 1
In Orange County, 51% of the adults are males. (It doesn't take too much advanced
mathematics to deduce that the other 49% are females.) One adult is randomly selected
for a survey involving credit card usage.

- **(a)** Find the probability that the selected person is a male.

- **(b)** It is later learned that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke cigars (based on data from the Substance Abuse and Mental Health Services Administration). Use this additional information to find the probability that the cigar−smoking respondent is a male

Use following notation:
M = male <br>
F = female <br>
C = cigar smoker<br>
NC = not a cigar smoker<br>


In [1]:
p_m = 0.51 # Probability of a selected person is a male

In [2]:
# Find the probability that the cigar-smoking respondent is a male
# Find P(M|C) = P(M)P(C|M) / P(C)

# P(M) = 0.51, P(C) = P(M)P(C|M) + P(F)P(C|F), P(C|M) = 0.095
p_cm = 0.095
p_cf = 0.017
p_f = 1 - p_m
p_c = p_m * 0.095 + p_f * p_cf
p_mc = p_m * p_cm / p_c
p_mc

0.8532934131736527

# Exercise 2

A diagnostic test has a probability 0.95 of giving a positive result when applied to a person suffering
from a certain disease, and a probability 0.10 of giving a (false) positive when applied to a non-sufferer. It is
estimated that 0.5 % of the population are sufferers. Suppose that the test is now administered to a person about
whom we have no relevant information relating to the disease (apart from the fact that he/she comes from this
population). 

Calculate the following probabilities:
- **(a)** that the test result will be positive;
- **(b)** that, given a positive result, the person is a sufferer;
- **(c)** that, given a negative result, the person is a non-sufferer;
- **(d)** that the person will be misclassified.

Use following notation:

T = test positive <br>
NT = test negative<br>
S = sufferer<br>
NS = non-sufferer<br>
M = misclassified<br>

Solve it by two approaches:
1. Arithmetically
2. By simulation

In [8]:
# Arithmetic solution

# P(T|S) = 0.95
# P(T|NS) = 0.1
# P(S) = 0.005

p_ts = 0.95
p_tns = 0.1
p_s = 0.005
p_ns = 1 - p_s

# The probability where the test result will be positive
# P(T) = P(S)P(T|S) + P(NS)P(T|NS)
p_t = p_s * p_ts + p_ns * p_tns

# The probability where given a positive result, the person is a sufferer
# P(S|T) = P(S)P(T|S)/P(T)
p_st = p_s * p_ts / p_t

# The probability where given a negative result, the person is a non-sufferer
# P(NS|NT) = P(NS)P(NT|NS)/P(NT)
p_nt = 1 - p_t
p_ntns = 1 - p_tns
p_nsnt = p_ns * p_ntns / p_nt

# The probability where the person will be misclassified
# P(M) = 1 - P(S and T) - P(NS and NT)
p_m = 1 - p_s * p_t - p_ns * p_nt


print(f'Probability where the test result will be positive: {p_t}.')
print(f'Probability where given a positive result, the person is a sufferer: {p_st}.')
print(f'Probability where given a negative result, the person is a non-sufferer: {p_nsnt}.')
print(f'Probability where the person will be misclassified: {p_m}.')

Probability where the test result will be positive: 0.10425000000000001.
Probability where given a positive result, the person is a sufferer: 0.04556354916067146.
Probability where given a negative result, the person is a non-sufferer: 0.9997209042701646.
Probability where the person will be misclassified: 0.1082074999999999.


In [19]:
# Simulation solution

import numpy as np

# Number of times to run the simulation
n_runs = 10000000

In [20]:
# Initialize variables
POS = 0
NEG = 0
SUF = 0
NSUF = 0
SUF_POS = 0
NSUF_NEG = 0
MIS = 0

In [21]:
# Run simulation
for _ in range(n_runs):
    
    # Probability where the person selected is a sufferer
    if np.random.random() < p_s:
        SUF += 1
        
        # Tested positive
        if np.random.random() < p_ts: # P(T|S)
            SUF_POS += 1
            POS += 1
        
        # Tested negative
        else:
            MIS += 1
            NEG += 1
        
    else:
        NSUF += 1
        
        # Tested positive
        if np.random.random() < p_tns: # P(T|NS)
            MIS += 1
            POS += 1
        
        # Tested negative
        else:
            NSUF_NEG += 1
            NEG += 1

In [22]:
# Probabilities
P_T = POS / n_runs
P_ST = SUF_POS / n_runs
P_NSNT = NSUF_NEG / n_runs
P_M = MIS / n_runs

In [23]:
print(f'Probability where the test result will be positive: {P_T}.')
print(f'Probability where given a positive result, the person is a sufferer: {P_ST}.')
print(f'Probability where given a negative result, the person is a non-sufferer: {P_NSNT}.')
print(f'Probability where the person will be misclassified: {P_M}.')

Probability where the test result will be positive: 0.104278.
Probability where given a positive result, the person is a sufferer: 0.0047604.
Probability where given a negative result, the person is a non-sufferer: 0.8954804.
Probability where the person will be misclassified: 0.0997592.
