### Week 1

- calculate performance metrics of the model output compared to the truth values 

In [1]:
import pandas as pd
import seaborn as sns


sns.set_style("darkgrid")

In [2]:
df = pd.read_csv("helena_results.csv")
df.head()

Unnamed: 0,file,x1,x2,y1,y2,mask,date,start_time,frame no,quadrant,true_mask
0,park-tests/converted/coverted_FRAMES/20210526_...,720,763,185,246,0,20210526,182313,1248,3,0
1,park-tests/converted/coverted_FRAMES/20210526_...,13,40,3,40,1,20210526,182313,4876,3,1
2,park-tests/converted/coverted_FRAMES/20210526_...,695,736,201,254,0,20210526,182313,4876,3,0
3,park-tests/converted/coverted_FRAMES/20210526_...,573,663,326,430,1,20210526,182313,4397,1,1
4,park-tests/converted/coverted_FRAMES/20210526_...,893,973,346,452,1,20210526,182313,4397,1,1


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1038 entries, 0 to 1037
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   file        1038 non-null   object
 1   x1          1038 non-null   int64 
 2   x2          1038 non-null   int64 
 3   y1          1038 non-null   int64 
 4   y2          1038 non-null   int64 
 5   mask        1038 non-null   int64 
 6   date        1038 non-null   int64 
 7   start_time  1038 non-null   int64 
 8   frame no    1038 non-null   int64 
 9   quadrant    1038 non-null   int64 
 10  true_mask   1038 non-null   int64 
dtypes: int64(10), object(1)
memory usage: 89.3+ KB


#### Definitions
mask: model output

true_mask: true label (human labeled)

- 0: mask
- 1: no mask
- 2: false detection

A false detection describes an instance where the model detected the ground, or other object, as a face.

In [4]:
df['mask'].value_counts()

0    707
1    331
Name: mask, dtype: int64

In [5]:
df['true_mask'].value_counts()

1    530
0    447
2     61
Name: true_mask, dtype: int64

#### Metrics and Scoring

In [6]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

# exclude false detections from this analysis
df2 = df[df['true_mask'] < 2]

df2.shape

(977, 11)

In [7]:
mask_true = df2['true_mask']
mask_pred = df2['mask']

tn, fp, fn, tp = confusion_matrix(y_true=mask_true, y_pred=mask_pred).ravel()

sensitivity = tp/(tp+fn)
specificity = tn/(tn+fp)
ppv = tp/(tp+fp)
npv = tn/(fn+tn)

print(f'sensitivity: {round(sensitivity, 4)}')
print(f'specificity: {round(specificity, 4)}')
print(f'ppv: {round(ppv, 4)}')
print(f'npv: {round(npv, 4)}')

sensitivity: 0.4302
specificity: 0.7987
ppv: 0.717
npv: 0.5417


In [8]:
metrics = {'accuracy': accuracy_score, 
           'precision': precision_score,
           'recall': recall_score,
           'f1': f1_score}

for metric in metrics:
    score = metrics[metric](y_true=mask_true, y_pred=mask_pred)
    print(f'{metric} score: {round(score, 4)}')

accuracy score: 0.5988
precision score: 0.717
recall score: 0.4302
f1 score: 0.5377


### Week 2

- calculating confidence intervals around these numbers
- two sample testing of accuracty metrics - testing specificity between different groups, p values 
- looking at different groups where the metrics are better/worse
    - look for big spreads
    
- Nirmal - worked with Ted
- Ted Morris (camera / robotics, etc)
    - Camera has object detection capabilities
- Helena - worked with Catherine Zhao
    - Graduated and developed NN
    
- Catherine Zhao
    - Reach out to Catherine for training data / model code
    - Hoping to get access to the code / training data
    - Willing to put them on github with prof wolfson

#### Exploration

- explore rnn for image classification since it has a memory component and looks at multiple frames at once