# **Data Analysis and Study Design**

### Author
- Patrick Payne, MPH
- Share the research office with Alex
- Background in computational astrophysics and public health
- Role of supporting research, QI, and all other scholarly work in the department

### Learning Objectives
- Explain numerical intuition and its value in studies
- Build an understanding of common statistics and their interpretations
- Differentiate experimental study designs
- Differentiate observational study designs
- Introduce Data Science, Machine Learning, and Artificial Intelligence in Anesthesia

---
---

## Numerical Intuition
- the ability to see data, describe the data, and provide a meaningful interpretation -> telling a story
- (This can be as simple as interpreting a p-value or combining multiple, complex data streams to give a single picture)
- supports making a decision from the data instead of having the data make your decision
- built through experience, trial/error, and critical analysis of data in multiple settings

![SedLine brain function monitor display from Masimo](https://th.bing.com/th/id/R.a4338dcc7cd3bd57c2f605f498bf993d?rik=wtaeX9iYdEWCqg&riu=http%3a%2f%2fwww.masimo.co.uk%2fsiteassets%2fus%2fimages%2fproducts%2fcontinuous-monitors%2froot-platform%2fnext-generation-sedline%2froot_sed_digital_1-up_us_6.29.17_web.jpg&ehk=hMn359jM%2befjAqgwU3ZiQYUiUYttA%2bQ5P3004yt7q40%3d&risl=&pid=ImgRaw&r=0)
- EEG Waveforms
- Patient State Index
- EMG
- Suppression Ratio
- Spectral Edge Frequencies
- Density Spectral Array: power spectrum, frequency range, etc. 

## Using Statistics - Sensitivity and Specificity

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-1jsq{border-color:inherit;font-size:medium;text-align:center;vertical-align:middle}
.tg .tg-oesp{border-color:inherit;font-size:medium;font-weight:bold;text-align:center;vertical-align:top}
.tg .tg-gom2{border-color:inherit;font-size:medium;text-align:center;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-f8tv{border-color:inherit;font-style:italic;text-align:left;vertical-align:top}
.tg .tg-c6of{background-color:#ffffff;border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-oesp" colspan="2">Confusion Matrix</th>
    <th class="tg-gom2" colspan="2">Predicted Condition</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-1jsq" rowspan="3">Actual Condition</td>
    <td class="tg-0pky"></td>
    <td class="tg-f8tv">Predicted Positive </td>
    <td class="tg-f8tv">Predicted Negative</td>
  </tr>
  <tr>
    <td class="tg-f8tv">Actual Positive</td>
    <td class="tg-c6of">True Positive (TP)</td>
    <td class="tg-0pky">False Negative (FN)</td>
  </tr>
  <tr>
    <td class="tg-f8tv">Actual Negative</td>
    <td class="tg-0pky">False Positive (FP)</td>
    <td class="tg-c6of">True Negative (TN)</td>
  </tr>
</tbody>
</table>

### How good is this test?

Can the test predict an actual positive case? 
- How many of the actual cases are testing positive?
    - Look at the number of times the patient had (a positive test and the condition) considering that they (actually had the condition)
    - $\huge \frac{\text{Actual Positive and Predicted Positive}}{\text{Actual Positive}} = \frac{TP}{TP + \color{blue}{FN}} $ 
    <br><br>
- How many of the positive tests are actual cases?
    - Look at the number of times the patient had (a positive test and the condition) considering that they (tested positive)
    - $\huge \frac{\text{Actual Positive and Predicted Positive}}{\text{Predicted Positive}} = \frac{TP}{TP + \color{blue}{FP}} $
---    
    
Can the test predict an actual negative case?
- How many of the patients without the condition are testing negative?
    - Look at the number of times the patient had (a negative test and did **not** have the condition) considering that they (did **not** have the condition)
    - $\huge \frac{\text{Actual Negative and Predicted Negative}}{\text{Actual Negative}} = \frac{TN}{TN + \color{red}{FP}} $  <br><br>
- How many of the negative test are patients without the condition?
    - Look at the number of times the patient had (a negative test and did **not** have the condition) considering that they (tested negative)
    - $\huge \frac{\text{Actual Negative and Predicted Negative}}{\text{Predicted Negative}} = \frac{TN}{TN + \color{red}{FN}} $

### Lets name these statistics

<details>
  <summary markdown="span">How many of the actual cases are testing positive?</summary>

  &emsp; **Sensitivity**
</details>
<details>
  <summary markdown="span">How many of the positive tests are actual case?</summary>

  &emsp; **Positive predicitive value**
</details>
<details>
  <summary markdown="span">How many of the patients without the condition are testing negative?</summary>

  &emsp; **Specificity**
</details>
<details>
  <summary markdown="span">How many of the negative tests are patients without the condition?</summary>

  &emsp; **Negative predictive value**
</details>

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from ipywidgets import interact

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
def plot_cases(Specificity=0.8, Sensitivity=0.8):
    N = 1000
    result_specificity = np.random.choice(['red','black'],int(N/2), p = [1.0 - Specificity, Specificity])
    result_sensitivity = np.random.choice(['blue','black'],int(N/2), p = [1.0 - Sensitivity, Sensitivity])
    
    random_specificity_x = np.random.uniform(0,4.9, int(N/2))
    random_specificity_y = np.random.uniform(0,10, int(N/2))
    random_sensitivity_x = np.random.uniform(5.1,10, int(N/2))
    random_sensitivity_y = np.random.uniform(0,10, int(N/2))

    fig= plt.figure(figsize=(8,8))

    plt.scatter(random_sensitivity_x,random_sensitivity_y, c = result_sensitivity)
    plt.scatter(random_specificity_x,random_specificity_y, c = result_specificity)
    x1, y1 = [5, 5], [0, 10]
    plt.plot(x1,y1, c='grey')

    results = []
    TP_count = 0
    FP_count = 0
    TN_count = 0
    FN_count = 0
    for elem in result_specificity:
        if elem == 'red':
            results.append("FN")
            FN_count += 1
        else:
            results.append("TN")
            TN_count += 1
    for elem in result_sensitivity:
        if elem == 'blue':
            results.append("FP")
            FP_count += 1
        else:
            results.append("TP")
            TP_count += 1
        
    PPV = TP_count / (TP_count + FP_count)
    NPV = TN_count / (TN_count + FN_count)
    print(f"Positive predictive value: {PPV}")
    print(f"Negative predictive value: {NPV}")
    
    plt.show()
    return results

In [None]:
print("Impact of Sensitivity and Specificity")
print("Colored dots represent a false result, left are positive results and the right are negative results")
results = interact(plot_cases, specificity=(0.0,0.99, 0.01), sensitivity=(0.0,0.99, 0.01))
print('-'*80)