# 04_operatingpoint_trainval_25D_5fold.ipynb

This script contain the process to calculate the operating point for the 3D CT scans from 2.5D images. The value of the operating point calculated here ONLY obtained from the training and validation set. The independent test set is remain unseen in this process.

Required :
- `trainval_5fold_df.csv` (generated from `03_model_25D_5fold.ipynb`).

Generate:
- the value of the operating point (saved in `operating_point.out`).

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import math
import tqdm

pd.set_option("display.max_columns", 20)
pd.set_option("display.max_colwidth", 20)

# utilitiy files located in the src folder
from src.util_analysis import cal_best_op

In [2]:
# load the model predictions of the training-validation set
trainval_df = pd.read_csv("trainval_5fold_df.csv") 
# get 2.5D slice number w.r.t the CT scans
slicenumber_list = []
for i in range(len(trainval_df)):
    slicenumber_list.append(int((os.path.splitext(trainval_df.jpeg_name[i])[0]).split("_")[1]))
trainval_df["slicenumber"] = slicenumber_list
df = trainval_df.sort_values(["PatientName","image_name","slicenumber"], ascending=[True, True, True]).reset_index(drop=True)

### Make predictions by combining the 5 models predictions (from the 5-fold cross validation)

In [3]:
# 5-fold cross validation combined prediction
df["test_prediction_5fold"] = df.apply(lambda x: np.average([x.test_prediction_model_fold1, x.test_prediction_model_fold2, x.test_prediction_model_fold3, x.test_prediction_model_fold4, x.test_prediction_model_fold5]), axis=1)
df["predicted_classes_avg_5fold"] = df.apply(lambda x: np.average([x.predicted_classes_model_fold1, x.predicted_classes_model_fold2, x.predicted_classes_model_fold3, x.predicted_classes_model_fold4, x.predicted_classes_model_fold5]), axis=1)
df["predicted_classes_5fold"] = np.where(df["predicted_classes_avg_5fold"]<0.5,0,1) # abnoraml if the majority of the folds predicted abnormal.

In [None]:
# set grid search range for the operating value
search_range = np.arange(0.0125,1.0,0.0125)
# optain the best operating value from the training and validation set.
op_for_best_acc = cal_best_op(df, search_range)
# save the best operating point in operating_point.out
np.savetxt('operating_point.out', [op_for_best_acc], fmt='%.4f')

Expected output:
- plot of operating point ($x$-axis) vs accuracy score on CT scan ($y$-axis)
- value of the operating point that gives the best accuracy score on CT scan