# Medical consultant case study

- Average complication rate: ___ %
- Consultant complication rate: p' = __/__ = ___ % (is used to estimate p (true proportion))

- The parameter is p: the true probability of a complication for a client of the medical consultant (note: p is not the same as the p-value)

## Setup

In [None]:
import pandas as pd
import numpy as np
import altair as alt

alt.data_transformers.disable_max_rows()

## Variability of the statistic

### Sampling with replacment

By sampling with replacement from the dataset (a process called bootstrapping), the variability of the possible p' values can be approximated.

In [None]:
# number of people
n = ___
# proportion
p = ___


In [None]:

#  set the random number generator (we use 0)
np.random.seed(0)

# generate 10000 bootstrap simulations
complications = np.random.binomial(n, p, 10000)

# create dataframe
df = pd.DataFrame({"comp": complications})



In [None]:

# calculate proportions
df['comp_rate'] = df['___'] / ___

df.head()

*Note: Since we only have two outcome (complication or no complication) we use the [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution) to generate our data*

### Descriptive statistics

In [None]:
df.describe().T

### Visualization

In [None]:
chart = alt.Chart(df, title="10,000 bootstrapped proportions").mark_bar().encode(
    alt.X('comp_rate:Q', 
           title="Bootstrapped proportion of surgical complications",
           bin=alt.BinParams(maxbins=20)),
    alt.Y('count()',
           title="Count")
)

chart

### Bootstrap percentile confidence interval

#### Calculate percentiles

In [None]:
# bootstrap 2.5 percentile proportion 
lower = df['comp_rate'].quantile(0.025)
print(f"The bootstrap 2.5 percentile proportion is {lower}")

# bootstrap 97.5 percentile proportion 
upper = df['comp_rate'].quantile(0.975)
print(f"The 97.5 percentile is {upper:.3}")

> The result is: we are confident that, in the population, the true probability of a complication is between ___ % and ___ %.

#### Create visualization

In [None]:
percentiles = alt.Chart(pd.DataFrame({
  'values': [lower, upper],
  'color': ['orange', 'red']
})
).mark_rule(
  strokeDash=[5, 5], 
  strokeWidth=3
).encode(
  x='values:Q',
  color=alt.Color('color:N', scale=None)
)

chart + percentiles

## Interpretation

- The original claim was that the consultant’s true rate of complication was under the national rate of 10%. 

- Does the interval estimate of 0 to 11.3% for the true probability of complication indicate that the surgical consultant has a lower rate of complications than the national average?

...