## Exercise 1
In Orange County, 51% of the adults are males. (It doesn't take too much advanced
mathematics to deduce that the other 49% are females.) One adult is randomly selected
for a survey involving credit card usage.

- **(a)** Find the probability that the selected person is a male.

- **(b)** It is later learned that the selected survey subject was smoking a cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females smoke cigars (based on data from the Substance Abuse and Mental Health Services Administration). Use this additional information to find the probability that the cigar−smoking respondent is a male

Use following notation:
M = male <br>
F = female <br>
C = cigar smoker<br>
NC = not a cigar smoker<br>


In [127]:
# Given:

# P(M) - 51% of the adults are males
M = 0.51

# P(F) - 49% of the adults are females
F = 0.49

# P(C) - 9.5% of the males smoke cigars
C = 0.095

# P(NC) - 1.7% of the females smoke cigars
NC = 0.017

print(f'The probability that the selected person is a male is just {M}.') 



The probability that the selected person is a male is just 0.51.


In [128]:
# Solution to b

In [129]:
# Based on Bayes' Theorem:
MC = round((M * C) / ((M * C) + (F * NC)), 2)
print(f'The probability that the cigar-smoking respondent is a male is {MC}.')

The probability that the cigar-smoking respondent is a male is 0.85.


# Exercise 2

A diagnostic test has a probability 0.95 of giving a positive result when applied to a person suffering
from a certain disease, and a probability 0.10 of giving a (false) positive when applied to a non-sufferer. It is
estimated that 0.5 % of the population are sufferers. Suppose that the test is now administered to a person about
whom we have no relevant information relating to the disease (apart from the fact that he/she comes from this
population). 

Calculate the following probabilities:
- **(a)** that the test result will be positive;
- **(b)** that, given a positive result, the person is a sufferer;
- **(c)** that, given a negative result, the person is a non-sufferer;
- **(d)** that the person will be misclassified.

Use following notation:

T = test positive <br>
NT = test negative<br>
S = sufferer<br>
NS = non-sufferer<br>
M = misclassified<br>

Solve it by two approaches:
1. Arithmetically
2. By simulation

### Arithmetically

In [11]:
# Given Data
print('Given:')

# P(S) - sufferer
S = 0.005

# P(NS) - non-sufferer
NS = 0.995

# P(M) - misclassified

T_S = 0.95
print(f'P(T|S) = {T_S} --> the probability of the test giving a positive result when applied to a person suffering from a certain disease')

T_NS = 0.10
print(f'P(T|NS) = {T_NS} --> the probability of giving a positive result when applied to non-sufferers')

NT_NS = 1 - T_NS
# NT_NS = 1 - T_NS = 1 - 0.10 = 0.90
print(f'P(NT|NS) = {NT_NS} --> the probability of giving a negative result when applied to non-sufferers')

print(f'P(S) = {S} --> the percentage of the population that are sufferers')
print(f'P(NS) = {NS} --> the percentage of the population that are non-sufferers')


'''Based on Bayes' Theorem and Conditional Probability:

Bayes' Theorem
P(A|B) = P(A) * P(B|A) / P(B)
or
P(A_B) = P(A) * P(B_A) / P(B)


P(A and B) = P(A) * P(B|A) 
'''


# (a) that the test result will be positive;
# P(T) = P(T|S)P(S) + P(T|NS)P(NS) 
T = (T_S * S) + (T_NS * NS)
print('\n')
print(f'a) The probability that the test result will be positive is: P(T) = {T}')

# Given that we now have the result for P(T), P(NT) = 1 - P(T)
NT = 1 - T
print(f'It would also follow that P(NT) = {NT} since P(NT) = 1 = P(T)')
# (b) that, given a positive result, the person is a sufferer;
# P(S|T) = P(T|S)P(S) / P(T)
S_T = T_S * S / T
print(f'b) The probability that, given a positive result, the person is a sufferer is: P(S|T) = {S_T}')

# (c) that, given a negative result, the person is a non-sufferer;
# P(NS|NT) = P(NT|NS)P(NS) / P(NT)
NS_NT = NT_NS * NS / NT
print(f'c) The probability that, given a negative result, the person is a non-sufferer is: P(NS|NT) = {NS_NT}')

# (d) that the person will be misclassified
# P(M) = P(T and NS) + P (NT and S) = P(NS)P(T_NS) + P(S)P(NT_S)
NT_S = 1 - T_S
M = (T_NS * NS) + (NT_S * S) 
print(f'd) The probability that a person will be misclassified is: P(M) = {M}')


Given:
P(T|S) = 0.95 --> the probability of the test giving a positive result when applied to a person suffering from a certain disease
P(T|NS) = 0.1 --> the probability of giving a positive result when applied to non-sufferers
P(NT|NS) = 0.9 --> the probability of giving a negative result when applied to non-sufferers
P(S) = 0.005 --> the percentage of the population that are sufferers
P(NS) = 0.995 --> the percentage of the population that are non-sufferers


a) The probability that the test result will be positive is: P(T) = 0.10425000000000001
It would also follow that P(NT) = 0.89575 since P(NT) = 1 = P(T)
b) The probability that, given a positive result, the person is a sufferer is: P(S|T) = 0.04556354916067146
c) The probability that, given a negative result, the person is a non-sufferer is: P(NS|NT) = 0.9997209042701646
d) The probability that a person will be misclassified is: P(M) = 0.09975


### By Simulation

In [1]:
# how many times simulation is run
n_runs = 100000

# Initialize variables
TEST_POSITIVE = 0
TEST_NEGATIVE = 0
SUFFERER = 0
NON_SUFFERER = 0
SUFFERER_POSITIVE = 0
SUFFERER_NEGATIVE = 0
NON_SUFFERER_POSITIVE = 0
NON_SUFFERER_NEGATIVE = 0
MISCLASSIFIED = 0

In [2]:
# import numpy
import numpy as np

# process simulation
for _ in range(n_runs):
    # the percentage of the population that are sufferers
    if np.random.random() < 0.005: 
        SUFFERER += 1

        # testing positive
        if np.random.random() < 0.95:
            SUFFERER_POSITIVE += 1
            TEST_POSITIVE +=1

        # testing negative
        else:
            SUFFERER_NEGATIVE += 1
            TEST_NEGATIVE += 1

    # the percentage of the population that are non-sufferers
    else:
        NON_SUFFERER += 1

        # testing positive
        if np.random.random() < 0.1:
            NON_SUFFERER_POSITIVE += 1
            TEST_POSITIVE += 1

        # testing negative
        else:
            NON_SUFFERER_NEGATIVE += 1
            TEST_NEGATIVE += 1


In [3]:
# create probabilities
P_SUFFERER = SUFFERER / n_runs * 100
P_SUFFERER_POSITIVE = SUFFERER_POSITIVE / n_runs * 100
P_SUFFERER_NEGATIVE = SUFFERER_NEGATIVE / n_runs * 100
P_NON_SUFFERER = NON_SUFFERER / n_runs * 100
P_NON_SUFFERER_POSITIVE = NON_SUFFERER_POSITIVE / n_runs * 100
P_NON_SUFFERER_NEGATIVE = NON_SUFFERER_NEGATIVE / n_runs * 100
P_TEST_POSITIVE = TEST_POSITIVE / n_runs * 100
P_TEST_NEGATIVE = TEST_NEGATIVE / n_runs * 100

In [7]:
print(f'The percentage of the population that are sufferers: {P_SUFFERER}%')
print(f'The probability that, given a positive result, the person is a sufferer is: {P_SUFFERER_POSITIVE}%')
print(f'The probability that, given a negative result, the person is a sufferer is: {P_SUFFERER_NEGATIVE}%')
print(f'The percentage of the population that are non-sufferers: {P_NON_SUFFERER}%')
print(f'The probability that, given a positive result, the person is a non-sufferer is: {P_NON_SUFFERER_POSITIVE}%')
print(f'The probability that, given a negative result, the person is a non-sufferer is: {P_NON_SUFFERER_NEGATIVE}%')
print(f'The test result will be positive: {P_TEST_POSITIVE}%')
print(f'The test result will be negative: {P_TEST_NEGATIVE}%')


The percentage of the population that are sufferers: 0.47200000000000003%
The probability that, given a positive result, the person is a sufferer is: 0.449%
The probability that, given a negative result, the person is a sufferer is: 0.023%
The percentage of the population that are non-sufferers: 99.528%
The probability that, given a positive result, the person is a non-sufferer is: 9.836%
The probability that, given a negative result, the person is a non-sufferer is: 89.69200000000001%
The test result will be positive: 10.285%
The test result will be negative: 89.715%
