In [1]:
import numpy as np

## Exercise 1
In Orange County, 51% of the adults are males. (It doesn't take too much advanced
mathematics to deduce that the other 49% are females.) One adult is randomly selected
for a survey involving credit card usage.

- **(a)** Find the probability that the selected person is a male.

- **(b)** It is later learned that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke cigars (based on data from the Substance Abuse and Mental Health Services Administration). Use this additional information to find the probability that the cigar−smoking respondent is a male

Use following notation:
M = male <br>
F = female <br>
C = cigar smoker<br>
NC = not a cigar smoker<br>


In [2]:
n_trials = 1000000

def survey_results(trials: int) -> str:
    """Returns the solutions to a) and b)."""

    M = F = C = NC = M_C = M_NC = F_C = F_NC = 0

    for trial in range(trials):
        if np.random.random() < 0.51: 
            M += 1
            if np.random.random() < 0.095: C += 1; M_C += 1
            else: NC += 1; M_NC += 1
                
        else:
            F += 1
            if np.random.random() < 0.017: C += 1; F_C += 1
            else: NC += 1; F_NC += 1

    return f'a) There is a {M/trials*100:.2f}% chance that the selected person is male.\nb) There is a {(M_C/trials)/(C/trials)*100:.2f}% chance the cigar-smoking respondant is male.'

In [3]:
print(survey_results(n_trials))

a) There is a 51.03% chance that the selected person is male.
b) There is a 85.28% chance the cigar-smoking respondant is male.


# Exercise 2

A diagnostic test has a probability 0.95 of giving a positive result when applied to a person suffering
from a certain disease, and a probability 0.10 of giving a (false) positive when applied to a non-sufferer. It is
estimated that 0.5 % of the population are sufferers. Suppose that the test is now administered to a person about
whom we have no relevant information relating to the disease (apart from the fact that he/she comes from this
population). 

Calculate the following probabilities:
- **(a)** that the test result will be positive;
- **(b)** that, given a positive result, the person is a sufferer;
- **(c)** that, given a negative result, the person is a non-sufferer;
- **(d)** that the person will be misclassified.

Use following notation:

T = test positive <br>
NT = test negative<br>
S = sufferer<br>
NS = non-sufferer<br>
M = misclassified<br>

Solve it by two approaches:
1. Arithmetically
2. By simulation

1)
$P(T \mid S) = 0.95$  

$P(T \mid NS) = 0.10$  

$P(S) = 0.005$  

$P(NS) = 1 - P(S) = 0.995$  

$P(T) = 0.10425$  

$P(NT) = 1 - P(T) = 0.89575$

a)  
$$P(T) = P(T \mid S) P(S) + P(T \mid NS) P(NS)$$
$$= P(T \cap S) + P(T \cap NS)$$ 
$$= 0.00475 + (0.1 * 0.995) = 0.00475 + 0.0995 = 0.10425$$

b)  
$$P(S \mid T) = \frac{P(T \cap S)}{P(T)} = \frac{0.00475}{0.10425} = 0.04556$$  

c)  
$$P(NS \mid NT) = \frac{P(NT \cap NS)}{P(NT)} = \frac{P(NS) - P(T \cap NS)}{P(NT)} = \frac{0.995 - 0.0995}{0.89575} = 0.9997$$  

d)  
$$P(M) = P(T \cap NS) + P(NT \cap S) = 0.0995 + (P(NT) - P(NT \cap NS)) = 0.0995 + 0.00025 = 0.09975$$

In [8]:
def test_results(trials: int) -> str:
    """Returns the solutions to a), b), c), and d)."""

    T = NT = S = NS = M = T_NS = T_S = NT_S = NT_NS = 0

    for trial in range(trials):
        if np.random.random() < 0.005: 
            S += 1
            if np.random.random() < 0.95: T += 1; T_S += 1
            else: NT += 1; NT_S += 1; M += 1
                
        else:
            NS += 1
            if np.random.random() < 0.10: T += 1; T_NS += 1; M += 1
            else: NT += 1; NT_NS += 1

    return f'a) There is a {T/trials*100:.2f}% chance that the test result is positive.\nb) There is a {(T_S/trials)/(T/trials)*100:.2f}% chance the person suffers from the disease given a positive test result.\nc) There is a {(NT_NS/trials)/(NT/trials)*100:.2f}% chance the person does not suffer from the disease given a negative test result.\nd) There is a {M/trials*100:.2f}% chance that a person will be misclassified.'

In [9]:
print(test_results(n_trials))

a) There is a 10.45% chance that the test result is positive.
b) There is a 4.57% chance the person suffers from the disease given a positive test result.
c) There is a 99.97% chance the person does not suffer from the disease given a negative test result.
d) There is a 10.00% chance that a person will be misclassified.
