### Opioid addiction and Bayes' Theorem (Several assumptions are made in this exercise)

    A Healthcare company needs to decide whether it should provide Oxycontin for severe pain treatment to patients. Based on a very limited amount of available historical data (1000 samples), the company estimates that 25% of previous patients became addicted to the drug (However, the company is aware that overall, in USA is closer to 5% (1)
    
(1) [NCBI Article](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2785002/)

    The company employs a Logistic Regression model to flag potential addiction cases. This model has a precision (i.e. Poitive Prediction Value PPV) of 51%, with the following preliminary metrics on a 200-claim test set:
    
    - False Positive: 42
    - False Negative: 15
    - True Positive: 44
    - True Negative: 99


In [2]:
FP = 42
FN = 15
TP = 44
TN = 99

In [3]:
import pandas as pd

data = {
    'addicted': {'flagged_addict': TP, 'flagged_not_addict': FN},
    'not_addicted': {'flagged_addict': FP, 'flagged_not_addict': TN}
}

df = pd.DataFrame(data)
df

Unnamed: 0,addicted,not_addicted
flagged_addict,44,42
flagged_not_addict,15,99


### Bayes Exercise *"If the model predicts potential addiction, what is the probability that the patient may actually become addicted?"*

Posterior Probability = [ Sensitivity (Recall) * Prior Probability ] / Predicted Probability # Bayes' Theorem

Predicted Probability = [ Sensitivity (Recall) * Prior Probability ] + [ False Positive rate * (1 - Prior Probability) ] # Total Probability

In [4]:
total_patients = 100000

In [5]:
prevalence_rate = 0.05

addicted_patients = total_patients * prevalence_rate
addicted_patients

5000.0

In [9]:
non_addicted_patients = total_patients - addicted_patients
non_addicted_patients

95000.0

In [20]:
# Test's sensitivity (true positive rate)
flagged_addict_when_addicted = TP/(TP + FN) # in terms of TP & FN?

addicted_patients_flagged_as_addict = addicted_patients * flagged_addict_when_addicted
addicted_patients_flagged_as_addict

3728.813559322034

In [24]:
# Test's miss rate (false negative rate | Type II error)
flagged_non_addict_when_addicted = FN/(TP + FN)# in terms of TP & FN?

addicted_patients_not_flagged = addicted_patients * flagged_non_addict_when_addicted
addicted_patients_not_flagged

1271.186440677966

In [26]:
# Test's false alarm rate (alpha) (false positive rate | Type I error)
flagged_addict_when_not_addicted = FP/(FP + TN)# in terms of FP & TN?

non_addicted_flagged_addict = non_addicted_patients * flagged_addict_when_not_addicted
non_addicted_flagged_addict

28297.87234042553

In [29]:
# Test's specificity (true negative rate)
flagged_non_addict_when_not_addicted = TN/(TN + FP)# In terms of TN & FP?

non_addicted_not_flagged_addict = non_addicted_patients * flagged_non_addict_when_not_addicted
non_addicted_not_flagged_addict

66702.12765957447

### *"If the model predicts potential addiction, what is the probability that the patient may actually become addicted?"*

In [38]:
from IPython.display import display, Markdown
display(
    Markdown(
    f'''
    What's the change I'm one of the {round(addicted_patients_flagged_as_addict)} who could become an addict,
    or am I just one of {round(non_addicted_flagged_addict)} patients who the model classified as addicted but wouldn't become addicted?
    '''
    )
    )


    What's the change I'm one of the 3729 who could become an addict,
    or am I just one of 28298 patients who the model classified as addicted but wouldn't become addicted?
    

In [39]:
Addicted_flagged_addict = addicted_patients_flagged_as_addict / (addicted_patients_flagged_as_addict + non_addicted_flagged_addict)

In [50]:
display(Markdown(f"# The probability of being addicted given a positive test is {Addicted_flagged_addict:.2%}"))

# The probability of being addicted given a positive test is 11.64%