# **Data Analysis and Study Design**

### Author
- Patrick Payne, MPH
- Share the research office with Alex
- Background in computational astrophysics and public health
- Role of supporting research, QI, and all other scholarly work in the department

### Learning Objectives
- Explain numerical intuition and its value in studies
- Build an understanding of common statistics and their interpretations
- Differentiate experimental study designs
- Differentiate observational study designs
- Introduce Data Science, Machine Learning, and Artificial Intelligence in Anesthesia

---
---

## Numerical Intuition
- the ability to see data, describe the data, and provide a meaningful interpretation -> telling a story
- (This can be as simple as interpreting a p-value or combining multiple, complex data streams to give a single picture)
- supports making a decision from the data instead of having the data make your decision
- built through experience, trial/error, and critical analysis of data in multiple settings

![SedLine brain function monitor display from Masimo](https://th.bing.com/th/id/R.a4338dcc7cd3bd57c2f605f498bf993d?rik=wtaeX9iYdEWCqg&riu=http%3a%2f%2fwww.masimo.co.uk%2fsiteassets%2fus%2fimages%2fproducts%2fcontinuous-monitors%2froot-platform%2fnext-generation-sedline%2froot_sed_digital_1-up_us_6.29.17_web.jpg&ehk=hMn359jM%2befjAqgwU3ZiQYUiUYttA%2bQ5P3004yt7q40%3d&risl=&pid=ImgRaw&r=0)
- EEG Waveforms
- Patient State Index
- EMG
- Suppression Ratio
- Spectral Edge Frequencies
- Density Spectral Array: power spectrum, frequency range, etc.

## Using Statistics - Sensitivity and Specificity

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-1jsq{border-color:inherit;font-size:medium;text-align:center;vertical-align:middle}
.tg .tg-oesp{border-color:inherit;font-size:medium;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-gom2{border-color:inherit;font-size:medium;text-align:center;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-f8tv{border-color:inherit;font-style:italic;text-align:left;vertical-align:top}
.tg .tg-c6of{background-color:#ffffff;border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-oesp" colspan="2">Confusion Matrix</th>
    <th class="tg-gom2" colspan="2">Predicted Condition</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-1jsq" rowspan="3">Actual Condition</td>
    <td class="tg-0pky"></td>
    <td class="tg-f8tv">Predicted Positive </td>
    <td class="tg-f8tv">Predicted Negative</td>
  </tr>
  <tr>
    <td class="tg-f8tv">Actual Positive</td>
    <td class="tg-c6of">True Positive (TP)</td>
    <td class="tg-0pky">False Negative (FN)</td>
  </tr>
  <tr>
    <td class="tg-f8tv">Actual Negative</td>
    <td class="tg-0pky">False Positive (FP)</td>
    <td class="tg-c6of">True Negative (TN)</td>
  </tr>
</tbody>
</table>

### How good is this test?

Can the test predict an actual positive case?
- How many of the actual cases are testing positive?
    - Look at the number of times the patient had (a positive test and the condition) considering that they (actually had the condition)
    - $\huge \frac{\text{Actual Positive and Predicted Positive}}{\text{Actual Positive}} = \frac{TP}{TP + \color{blue}{FN}} $
    <br><br>
- How many of the positive tests are actual cases?
    - Look at the number of times the patient had (a positive test and the condition) considering that they (tested positive)
    - $\huge \frac{\text{Actual Positive and Predicted Positive}}{\text{Predicted Positive}} = \frac{TP}{TP + \color{blue}{FP}} $
---    
    
Can the test predict an actual negative case?
- How many of the patients without the condition are testing negative?
    - Look at the number of times the patient had (a negative test and did **not** have the condition) considering that they (did **not** have the condition)
    - $\huge \frac{\text{Actual Negative and Predicted Negative}}{\text{Actual Negative}} = \frac{TN}{TN + \color{red}{FP}} $  <br><br>
- How many of the negative test are patients without the condition?
    - Look at the number of times the patient had (a negative test and did **not** have the condition) considering that they (tested negative)
    - $\huge \frac{\text{Actual Negative and Predicted Negative}}{\text{Predicted Negative}} = \frac{TN}{TN + \color{red}{FN}} $

### Lets name these statistics

<details>
  <summary markdown="span">How many of the actual cases are testing positive?</summary>

  &emsp; **Sensitivity**
</details>
<details>
  <summary markdown="span">How many of the positive tests are actual case?</summary>

  &emsp; **Positive predicitive value**
</details>
<details>
  <summary markdown="span">How many of the patients without the condition are testing negative?</summary>

  &emsp; **Specificity**
</details>
<details>
  <summary markdown="span">How many of the negative tests are patients without the condition?</summary>

  &emsp; **Negative predictive value**
</details>

In [32]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from ipywidgets import interactive

In [45]:
def plot_cases(Specificity=0.8, Sensitivity=0.8):
    N = 1000
    result_specificity = np.random.choice(['red','black'],int(N/2), p = [1.0 - Specificity, Specificity])
    result_sensitivity = np.random.choice(['blue','black'],int(N/2), p = [1.0 - Sensitivity, Sensitivity])

    random_specificity_x = np.random.uniform(0,4.9, int(N/2))
    random_specificity_y = np.random.uniform(0,10, int(N/2))
    random_sensitivity_x = np.random.uniform(5.1,10, int(N/2))
    random_sensitivity_y = np.random.uniform(0,10, int(N/2))

    fig= plt.figure(figsize=(8,8))

    plt.scatter(random_sensitivity_x,random_sensitivity_y, c = result_sensitivity)
    plt.scatter(random_specificity_x,random_specificity_y, c = result_specificity)
    x1, y1 = [5, 5], [0, 10]
    plt.plot(x1,y1, c='grey')

    results = []
    result_counts={}
    TP_count = 0
    FP_count = 0
    TN_count = 0
    FN_count = 0
    for elem in result_specificity:
        if elem == 'red':
            results.append("FN")
            FN_count += 1
        else:
            results.append("TN")
            TN_count += 1
    for elem in result_sensitivity:
        if elem == 'blue':
            results.append("FP")
            FP_count += 1
        else:
            results.append("TP")
            TP_count += 1
    result_counts['TP'] = TP_count
    result_counts['FP'] = FP_count
    result_counts['TN'] = TN_count
    result_counts['FN'] = FN_count

    PPV = TP_count / (TP_count + FP_count)
    NPV = TN_count / (TN_count + FN_count)
    print(f"Positive predictive value: {PPV}")
    print(f"Negative predictive value: {NPV}")

    plt.show()
    return result_counts

def plot_bayes(results):
    pretest_probability = np.linspace(0,1.0, num=100)
    false_positive_rate = results['FP']/ (results['FP'] + results['TP'])
    print(false_positive_rate)

In [46]:
print("Impact of Sensitivity and Specificity")
print("Colored dots represent a false result, left are positive results and the right are negative results")
figure = interactive(plot_cases, Specificity=(0.0,0.99, 0.01), Sensitivity=(0.0,0.99, 0.01))
display(figure)
print(figure.result)
plot_bayes(figure.result)

Impact of Sensitivity and Specificity
Colored dots represent a false result, left are positive results and the right are negative results


interactive(children=(FloatSlider(value=0.8, description='Specificity', max=0.99, step=0.01), FloatSlider(valu…

{'TP': 410, 'FP': 90, 'TN': 409, 'FN': 91}
0.18


## How are these values used?
- Often quoted to represent the 'goodness' of a test
- Can be utilized with Bayes Rule to evaluate the probability of disease for a patient

![Application of Bayes Rule](https://ebm.bmj.com/content/ebmed/16/6/163/F1.medium.gif)

Medow MA, Lucey CRA qualitative approach to Bayes' theorem BMJ Evidence-Based Medicine 2011;16:163-167.


## Bayes Rule?
- Way of using conditional probabilities to characterize the probability of a hypothesis using observations

$\huge P(\text{Hypothesis} | \text{Data}) = \frac{P(\text{Hypothesis}) \cdot P(\text{Data} | \text{Hypothesis})}{P(\text{Data})}$
- Applying this to medical testing:

$\begin{align}
\huge P(\text{True Positive} | \text{Predicted Positive})& = \frac{P(\text{True Positive}) \cdot P(\text{Predicted Positive} | \text{True Positive})}{P(\text{Predicted Positive})}
\end{align}$




## Do you really trust that P-value?
- We all know the standard p-value, level of significance, is 0.05
- This means that there is a 5% chance of a Type I error
    - An error that we incorrectly reject the null hypothesis
    - So in five percent of cases where no association exists, we would find a signficant association

![Hazard of p-values shown in an XKCD comic where an association is found after running numerous statistical tests on the same sample](https://imgs.xkcd.com/comics/significant.png)

## Machine Learning in Anesthesia

- Edwards Lifesciences Hypertension Prediction Index
    - [Kouz K, Monge García MI, Cerutti E, et al. Intraoperative hypotension when using hypotension prediction index software during major noncardiac surgery: a European multicentre prospective observational registry (EU HYPROTECT). BJA Open. 2023;6:100140. Published 2023 May 4. doi:10.1016/j.bjao.2023.100140](https://pubmed.ncbi.nlm.nih.gov/37588176/)

![Edwards Lifesciences Hypertension Prediction Index monitor screen](https://assets-us-01.kc-usercontent.com/6239a81e-8f0f-0040-a1df-b4932a10f6ae/048c09ab-50a1-46cf-82ad-4d561c74b5d5/hpi-slide-3%402x.png?w=1920&q=100&auto=format)


- Difficult airway prediction
    - [Wang G, Li C, Tang F, Wang Y, Wu S, Zhi H, Zhang F, Wang M, Zhang J. A fully-automatic semi-supervised deep learning model for difficult airway assessment. Heliyon. 2023 Apr 22;9(5):e15629. doi: 10.1016/j.heliyon.2023.e15629. PMID: 37159696; PMCID: PMC10163620.](https://pubmed.ncbi.nlm.nih.gov/37159696/)
![Difficult airway classification interface to build research dataset](https://ars.els-cdn.com/content/image/1-s2.0-S2405844023028360-gr8.jpg)
- Anatomy Identification
    - [Gungor I, Gunaydin B, Oktar SO, et al. A real-time anatomy ıdentification via tool based on artificial ıntelligence for ultrasound-guided peripheral nerve block procedures: an accuracy study. J Anesth. 2021;35(4):591-594. doi:10.1007/s00540-021-02947-3](https://pubmed.ncbi.nlm.nih.gov/34008072/)
![Anatomy identification results in ultrasound-guided peripheral nerve block artificial intelligence study](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00540-021-02947-3/MediaObjects/540_2021_2947_Fig1_HTML.jpg?as=webp)

# Caveat Emptor - Let the buyer beware
- Intepreting a machine learning publication comes with a new set of biases and concerns
- Think about how the data sets were trained
    - Garbage in garbage out
    - Only healthy patients?
    - Only members of specific risk strata, demographics etc. ?
    - Data used to train the model?
- Similar concerns of generalizability as a typical study, but this could be hidden in a model once that tool comes to market
- If a your colleage received the same training as the machine, would you be confident in their ability?