<div class='alert alert-info' style='text-align: center'><h1>DICOM Full Range Pixels as CNN Input</h1>
    - yet another chest x-ray processing notebook -
</div>

**This notebook uses the full range of DICOM pixel values as input for a CNN and a Sequential model, rather than exporting 8 bit JPGs.**

- It does not address MONOCHROME1 (inverted pixel intensities). Obviously this would have to be added.
- It resizes the images to a managable size (256,256) by default. Which is lossy of course.
- It ignores JPG compressed Transfer Syntaxes and only gets images that are in Explicit VR LE (to avoid pylibjpg install).
- It only uses a few images and a few epochs to demonstrate the thought process. Clearly, it won't be accurate at all.
- I didn't bother normalizing or otherwise apply any kind of processing to the data.
- I'm sure the pros already do this, but I'm still trying to wrap my brain around it.

**It seems that adding a VOI LUT, or bit plane slicing on the full range images would produce more usable results for the model.**

In [None]:
import warnings
warnings.filterwarnings('ignore')
import os
import cv2
import numpy as np
import pandas as pd
import pydicom
import matplotlib.pyplot as plt
%matplotlib inline
from skimage.transform import resize
from sklearn.model_selection import train_test_split
from skimage.color import gray2rgb
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, Conv2D, MaxPooling2D, GlobalMaxPooling2D, Activation, Flatten
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.applications.inception_v3 import InceptionV3

In [None]:
# Path to the SIIM Covid19 dataset
base_path = "/kaggle/input/siim-covid19-detection/"

# Final image size for model input
img_size = 256

# number of images in each of the two classes to get into the train set. Keep it real small for this demo.
num_images = 100

# Number of epochs
epochs = 10

# number of batches
batch_size = 10

In [None]:
# Load the data
studies_df = pd.read_csv(os.path.join(base_path,"train_study_level.csv"))
images_df = pd.read_csv(os.path.join(base_path,"train_image_level.csv"))

In [None]:
images_df.head(10)

In [None]:
# Loop through the dfs and remove the "_study" and "_image" from the ids so we can use them as keys to join on later
studies_df['id'] = studies_df['id'].map(lambda x: x.rstrip('_study'))
images_df['id'] = images_df['id'].map(lambda x: x.rstrip('_image'))

In [None]:
# Merge the study and image df's on StudyInstanceUID
data_df = pd.merge(images_df, studies_df, how='inner', left_on='StudyInstanceUID', right_on='id')
data_df.drop(['id_y'], axis=1, inplace=True)

In [None]:
data_df.head()

In [None]:
new_df = data_df[data_df['Atypical Appearance'] == 1]

new_df2 = new_df[new_df['Indeterminate Appearance'] == 1]
new_df2.head()

In [None]:
# Split the studies dataframe into negative and positive dataframes
positive_images_df = data_df[data_df['Negative for Pneumonia'] == 0]
negative_images_df = data_df[data_df['Negative for Pneumonia'] == 1]

## Make some helper functions

In [None]:
# This function finds the first image in a StudyInstanceUID directory and returns its path
def get_image_by_study_id(study_id):
    study_path = base_path + "train/" + study_id + "/"
    for subdir, dirs, files in os.walk(study_path):
        for file in files:     
            image = os.path.join(subdir, file)
            if os.path.isfile(image):
                return image
    return "none"

In [None]:
# Function to resize pixels
def resize_image(pixels_in): 
    return resize(pixels_in, (img_size, img_size), anti_aliasing=True).astype(float)

## Visualize an image

In [None]:
# Take a look at a random image
img_file = get_image_by_study_id("00c74279c5b7")
img = pydicom.dcmread(img_file)
pixels = img.pixel_array

print("Pixel range: " + str(np.amin(pixels)) + " - " + str(np.amax(pixels)))
plt.imshow(pixels,cmap="gray");

## Get the pixels

In [None]:
# Iterate through images in the positive set and extract pixels. This takes a while with a large set.
# Get only LE Explicit DICOMs and ignore other transfer syntaxes so we don't have to deal with pylibjpg

X_train_data = []
y_train_data = []

# Iterate through the rows of the 'positive' DF
count = 0
for index, row in positive_images_df.iterrows():
    img_file = get_image_by_study_id(row['StudyInstanceUID'])
    img = pydicom.dcmread(img_file)
    
    # Get only Explicit VR LE Transfer Syntax
    if img.file_meta.TransferSyntaxUID == "1.2.840.10008.1.2.1":
        pixels = resize_image(img.pixel_array)
        X_train_data.append(pixels)
        y_train_data.append(row['Negative for Pneumonia'])
        if (count == num_images):
            break
        count += 1
    
print("Done getting " + str(count) + " negative images")

In [None]:
# Extract pixels from images in the negative set
count = 0
for index, row in negative_images_df.iterrows():
    img_file = get_image_by_study_id(row['StudyInstanceUID'])
    img = pydicom.dcmread(img_file)
    if img.file_meta.TransferSyntaxUID == "1.2.840.10008.1.2.1":
        pixels = resize_image(img.pixel_array)
        X_train_data.append(pixels)
        y_train_data.append(row['Negative for Pneumonia'])
        if (count == num_images):
            break
        count += 1 
print("Done getting " + str(count) + " positive images")

In [None]:
# Split the train/test data
X_train, X_test, y_train, y_test = train_test_split(X_train_data, y_train_data, test_size = 0.3, random_state = 82, shuffle=True)
print("X_train len: " + str(len(X_train)))
print("y_train len: " + str(len(y_train)))
print("X_test len: " + str(len(X_test)))
print("y_test len: " + str(len(y_test)))

In [None]:
# Convert the lists to arrays
X_train = np.asarray(X_train)
X_test = np.asarray(X_test)
y_train = np.asarray(y_train)
y_test = np.asarray(y_test)

# Reshape the images to 3 channels
X_train = gray2rgb(X_train)
X_test = gray2rgb(X_test)

## Build the models

In [None]:
# Use InceptionV3 for transfer learning and add our data as the last layer.
pre_trained_model = InceptionV3(input_shape = (img_size, img_size, 3), include_top = False, weights = "imagenet")

# Freeze the lower layers
for layer in pre_trained_model.layers:
     layer.trainable = False
        
last_layer = pre_trained_model.get_layer('mixed7')
last_output = last_layer.output

# Add our layer
layer = Flatten()(last_output)
layer = Dense(1024, activation='relu')(layer)
layer = Dropout(0.2)(layer)                  
layer = Dense(1, activation='sigmoid')(layer)           

model = Model(pre_trained_model.input, layer) 
model.compile(optimizer = RMSprop(lr=0.0001), loss = 'binary_crossentropy', metrics = ['acc'])

# Fit the data
history=model.fit(X_train,y_train,epochs=epochs,verbose=1,validation_data=(X_test,y_test))

In [None]:
# plot training history
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

## Build a simple Sequential model with a couple layers

In [None]:
model2 = Sequential()
model2.add(Conv2D(32, (3, 3), activation="relu", input_shape = X_train.shape[1:]))
model2.add(BatchNormalization())
model2.add(MaxPooling2D((2, 2)))
model2.add(Dropout(0.2))

model2.add(Conv2D(64, (3, 3), activation="relu"))
model2.add(BatchNormalization())
model2.add(MaxPooling2D((2, 2)))
model2.add(Dropout(0.3))

model2.add(GlobalMaxPooling2D())
model2.add(Dense(256, activation="relu"))
model2.add(Dropout(0.5))
model2.add(Dense(1, activation="sigmoid"))

opt = Adam(learning_rate=0.001)
model2.compile(loss='binary_crossentropy',optimizer=opt,metrics=['accuracy'])
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.001, patience=10)

history = model2.fit(X_train,y_train,batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(X_test, y_test),callbacks=[early_stopping])

In [None]:
# plot training history
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

#### Here are some other processing notebooks I made:
- Lung Segmentation Without CNN -> https://www.kaggle.com/davidbroberts/lung-segmentation-without-cnn
- Applying filters to x-rays -> https://www.kaggle.com/davidbroberts/applying-filters-to-chest-x-rays
- Rib supression on Chest X-Rays -> https://www.kaggle.com/davidbroberts/rib-suppression-poc
- Manual DICOM VOI LUT -> https://www.kaggle.com/davidbroberts/manual-dicom-voi-lut
- Apply Unsharp Mask to Chest X-Rays -> https://www.kaggle.com/davidbroberts/unsharp-masking-chest-x-rays
- Cropping Chest X-Rays -> https://www.kaggle.com/davidbroberts/cropping-chest-x-rays
- Bounding Boxes on Cropped Images -> https://www.kaggle.com/davidbroberts/bounding-boxes-on-cropped-images
- Visualizing Chest X-Ray bit planes -> https://www.kaggle.com/davidbroberts/visualizing-chest-x-ray-bitplanes
- Standardizing Chest X-Ray Dataset Exports -> https://www.kaggle.com/davidbroberts/standardizing-cxr-datasets