# Medical consultant case study

- Average complication rate: 10%
- Consultant complication rate: p' = 3/62 = 4.84% (is used to estimate p)

- The parameter is p: the true probability of a complication for a client of the medical consultant. 

## Setup

In [1]:
import pandas as pd
import numpy as np
import altair as alt

alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Variability of the statistic

### Sampling with replacment

By sampling with replacement from the dataset (a process called bootstrapping), the variability of the possible p' values can be approximated.

In [2]:
# number of people
n = 62 
# probability
p = 3/n

#  set the random number generator (we use 0)
np.random.seed(0)

# generate 10000 bootstrap simulations
complications = np.random.binomial(n, p, 10000)

# create dataframe
df = pd.DataFrame({"comp": complications})
# calculate proportions
df['comp_rate'] = df['comp'] / n

df.head()

Unnamed: 0,comp,comp_rate
0,3,0.048387
1,4,0.064516
2,3,0.048387
3,3,0.048387
4,3,0.048387


*Note: Since we only have two outcome (complication or no complication) we use the [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution) to generate our data*

### Descriptive statistics

In [3]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
comp,10000.0,2.9745,1.689359,0.0,2.0,3.0,4.0,12.0
comp_rate,10000.0,0.047976,0.027248,0.0,0.032258,0.048387,0.064516,0.193548


### Visualization

In [4]:
chart = alt.Chart(df, title="10,000 bootstrapped proportions").mark_bar().encode(
    alt.X('comp_rate:Q', 
           title="Bootstrapped proportion of surgical complications",
           bin=alt.BinParams(maxbins=20)),
    alt.Y('count()',
           title="Count")
)

chart

### Bootstrap percentile confidence interval

#### Calculate percentiles

In [5]:
# bootstrap 2.5 percentile proportion 
lower = df['comp_rate'].quantile(0.025)
print(f"The bootstrap 2.5 percentile proportion is {lower}")

# bootstrap 97.5 percentile proportion 
upper = df['comp_rate'].quantile(0.975)
print(f"The 97.5 percentile is {upper:.3}")

The bootstrap 2.5 percentile proportion is 0.0
The 97.5 percentile is 0.113


> The result is: we are confident that, in the population, the true probability of a complication is between 0% and 11.3%.

#### Create visualization

Option 1: use precalculated values:

In [6]:
percentiles = alt.Chart(pd.DataFrame({
  'values': [lower, upper],
  'color': ['orange', 'red']
})
).mark_rule(
  strokeDash=[5, 5], 
  strokeWidth=3
).encode(
  x='values:Q',
  color=alt.Color('color:N', scale=None)
)

chart + percentiles

Option 2: 

- Calculate bootstrap percentile proportions with Altair:

In [7]:
p_lower = (
     alt.Chart(df)
    .transform_quantile('comp_rate', probs=[0.025], as_=['prob', 'value'])
    .mark_rule(color='orange', strokeDash=[5, 5], strokeWidth=3)
    .encode(
        x = "value:Q"
    )
)

p_upper = (
     alt.Chart(df)
    .transform_quantile('comp_rate', probs=[0.975], as_=['prob', 'value'])
    .mark_rule(color='red', strokeDash=[5, 5], strokeWidth=3)
    .encode(
        x = "value:Q"
    )
)

chart + p_lower + p_upper


## Interpretation

- The original claim was that the consultant’s true rate of complication was under the national rate of 10%. 

- Does the interval estimate of 0 to 11.3% for the true probability of complication indicate that the surgical consultant has a lower rate of complications than the national average?

- No. Because the interval overlaps 10%, it might be that the consultant’s work is associated with a lower risk of complications, or it might be that the consultant’s work is associated with a higher risk (i.e., greater than 10%) of complications.