<center><h1>Clinical Workflow Integration</h1></center>

In [1]:
import numpy as np
import pandas as pd
import pydicom
%matplotlib inline
import matplotlib.pyplot as plt
import keras 
from keras.models import model_from_json
from skimage.transform import resize

Using TensorFlow backend.


In [2]:
# We take a sample of a given dicom file and see the attributes in the file

dcm1 = pydicom.dcmread('test3.dcm')
dcm1

(0008, 0016) SOP Class UID                       UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.3.6.1.4.1.11129.5.5.179222148351666120521423991179194552820263
(0008, 0060) Modality                            CS: 'DX'
(0008, 1030) Study Description                   LO: 'Effusion'
(0010, 0020) Patient ID                          LO: '61'
(0010, 0040) Patient's Sex                       CS: 'M'
(0010, 1010) Patient's Age                       AS: '77'
(0018, 0015) Body Part Examined                  CS: 'CHEST'
(0018, 5100) Patient Position                    CS: 'AP'
(0020, 000d) Study Instance UID                  UI: 1.3.6.1.4.1.11129.5.5.189886800072183603320722059194952488628637
(0020, 000e) Series Instance UID                 UI: 1.3.6.1.4.1.11129.5.5.110145974268321300517474523922373370343198
(0028, 0002) Samples per Pixel                   US: 1
(0028, 0004) Photometric Interpretation          CS: 'MONOCHROME2'
(0028, 0010) Rows        

In [3]:
# This function reads in a .dcm file, checks the important fields for our device, and 
# returns a numpy array of just the imaging data
def check_dicom(filename): 
    print('Load file {} ...'.format(filename))     
    dcm = pydicom.dcmread(filename)
    if dcm.BodyPartExamined != 'CHEST' or dcm.Modality != 'DX' or dcm.PatientPosition not in ['PA', 'AP']:
        print("Unable to process ", filename)
        return None
    else:        
        print("Able to process ", filename)
        img = dcm.pixel_array
        return img
        
# This function takes the numpy array output by check_dicom and 
# runs the appropriate pre-processing needed for our model input
def preprocess_image(img,img_mean,img_std,img_size):
    img=img/255.0  # Normalize
    proc_img = (img - img_mean)/img_std # Standardize
    proc_img =  resize(proc_img, img_size, anti_aliasing=True)
    return proc_img

# This function loads in our trained model w/ weights and compiles it 
def load_model(model_path, weight_path):
    with open(model_path, "r") as file:
        loaded_model = file.read()
    model = model_from_json(loaded_model)
    model.load_weights(weight_path)
    return model

# This function uses our device's threshold parameters to predict whether or not
# the image shows the presence of pneumonia using our trained model
def predict_image(model, img, thresh): 
    probability = model.predict(img)
    print('Prediction: ', probability)
    predict=probability[0]
    prediction='No pneumonia'
    if(predict>thresh):
        prediction='Pneumonia'
    return prediction 

Before we test our dicom files, let's see some information about the files first so we can compare it with our test result.

In [4]:
def create_dicom_df(filenames):
    print('Load file {} ...'.format(filenames))
    column_names = ["Filename", "Body Part Examined", "Modality", "Findings", "Patient Position"]
    all_data = []
    for i in filenames:
        dcm = pydicom.dcmread(i)
        fields = [i, dcm.BodyPartExamined, dcm.Modality, dcm.StudyDescription, dcm.PatientPosition]
        all_data.append(fields)
    mydata = pd.DataFrame(all_data, columns = column_names)
    return mydata

In [5]:
test_dicoms = ['test1.dcm','test2.dcm','test3.dcm','test4.dcm','test5.dcm','test6.dcm']
create_dicom_df(test_dicoms)

Load file ['test1.dcm', 'test2.dcm', 'test3.dcm', 'test4.dcm', 'test5.dcm', 'test6.dcm'] ...


Unnamed: 0,Filename,Body Part Examined,Modality,Findings,Patient Position
0,test1.dcm,CHEST,DX,No Finding,PA
1,test2.dcm,CHEST,DX,Cardiomegaly,AP
2,test3.dcm,CHEST,DX,Effusion,AP
3,test4.dcm,RIBCAGE,DX,No Finding,PA
4,test5.dcm,CHEST,CT,No Finding,PA
5,test6.dcm,CHEST,DX,No Finding,XX


From the dataframe above we can see that test4.dcm, test5.dcm, test6.dcm are not appropriate for our model since our model requires the Body Part Examinded to be 'Chest', Modality to be 'DX' and Patient Position to be 'AP' or 'PA'.

### Use the threshold which was chosen from Bulid and Train Model.

In [7]:
test_dicoms = ['test1.dcm','test2.dcm','test3.dcm','test4.dcm','test5.dcm','test6.dcm']

weight_path="{}_my_model3.best.hdf5".format('xray_class')
model_path = 'my_model3.json'

IMG_SIZE=(1,224,224,3) # This might be different if you did not use vgg16
#img_mean = # loads the mean image value they used during training preprocessing
#img_std = # loads the std dev image value they used during training preprocessing

my_model = load_model(model_path, weight_path)
thresh3 = 0.5226132 # threshold which was chosen from Bulid and Train Mode 3

# use the .dcm files to test your prediction
for i in test_dicoms:
    
    img = np.array([])
    img = check_dicom(i)
    
    if img is None:
        continue
    else:
        img_mean = np.mean(img)
        img_std =  np.std(img)
        img_proc = preprocess_image(img,img_mean,img_std,IMG_SIZE)
        pred = predict_image(my_model,img_proc,thresh3)
        print(pred)

Load file test1.dcm ...
Able to process  test1.dcm
Prediction:  [[0.30717582]]
No pneumonia
Load file test2.dcm ...
Able to process  test2.dcm
Prediction:  [[0.40139523]]
No pneumonia
Load file test3.dcm ...
Able to process  test3.dcm
Prediction:  [[0.41943836]]
No pneumonia
Load file test4.dcm ...
Unable to process  test4.dcm
Load file test5.dcm ...
Unable to process  test5.dcm
Load file test6.dcm ...
Unable to process  test6.dcm


The above shows that the threshold which was chosen in **Build and Train Model 3** correctly predict "no pneumonia" in all 3 valid files (test1.dcm, test2.dcm, test3.dcm). We see that the algorithm's prediction is between 0.3 - 0.42 while the best threshold is 0.5226132. 

Note that in here we are only given 3 valid test images to test the performance of the model, so the result should not represent the performance of the overall model. In order get a better idea of the model's performance, we need to test the model with more than 3 images, maybe hundereds or thousands.