

Introduction
------------


The challenge is that all patients have different number of CT scan images and Convolutional Neural Network need the depth of the images to be same even if the images size is 512 * 512 pixels. But the advantage we have is that the CT scan images are all sequential axially so we could just average the pixels values and merge all (irrespective of the varying number of slices) them into one image. This would give us a single image per patient which we can feed to 2D Convolutional Neural Network. Also we can have resized images as 512 x 512 might be bigger for convolutional network.

I am not an expert Python developer, but I have gathered parts of the code from various places. 
So I would like to thank:

 1. Guido Zuidhof (https://www.kaggle.com/gzuidhof/data-science-bowl-2017/full-preprocessing-tutorial) here on kaggle kernels for reading Dicom files.
 2. CnrL (http://stackoverflow.com/questions/17291455/how-to-get-an-average-picture-from-100-pictures-using-pil) for averaging the pixels.

Intuition here is that by averaging the pixel values, some pattern would evolve that hopefully could help in detection of benign or malignant tumor.

In [None]:
%matplotlib inline

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import dicom
import os
import scipy.ndimage
import matplotlib.pyplot as plt
from PIL import Image

from skimage import measure, morphology
from mpl_toolkits.mplot3d.art3d import Poly3DCollection

# Some constants 
INPUT_FOLDER = '../input/sample_images/'
patients = os.listdir(INPUT_FOLDER)
patients.sort()

In [None]:
# Load the scans in given folder path
def load_scan(path):
    slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
    slices.sort(key = lambda x: int(x.ImagePositionPatient[2]))
    try:
        slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
    except:
        slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
        
    for s in slices:
        s.SliceThickness = slice_thickness
    return slices

In [None]:
def get_pixels_hu(slices):
    image = np.stack([s.pixel_array for s in slices])
    # Convert to int16 (from sometimes int16), 
    # should be possible as values should always be low enough (<32k)
    image = image.astype(np.int16)

    # Set outside-of-scan pixels to 0
    # The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    
    # Convert to Hounsfield units (HU)
    for slice_number in range(len(slices)):
        
        intercept = slices[slice_number].RescaleIntercept
        slope = slices[slice_number].RescaleSlope
        
        if slope != 1:
            image[slice_number] = slope * image[slice_number].astype(np.float64)
            image[slice_number] = image[slice_number].astype(np.int16)
            
        image[slice_number] += np.int16(intercept)
    
    return np.array(image, dtype=np.int16)

In [None]:
if (len(patients) == 21):
    del patients[0]
print(len(patients))

for patient in patients:
    path = INPUT_FOLDER + patient
    patient_slices = load_scan(path)
    stacked_slices = get_pixels_hu(patient_slices)
    
    N=len(stacked_slices)    
    print ("number of slices: ", N)
    arr = np.zeros((512, 512), np.int16)
    count = 3
    for im in stacked_slices:
        smallest = np.amin(im)
        biggest = np.amax(im)
        
        #imarr = np.array(im, dtype=np.int16)        
        arr = arr + (1 - im) * np.log(count)/(biggest - smallest)

        #print ((N * 14)/ np.log10(count))
        count = count + 1
        #arr = np.array(np.round(arr), dtype=np.uint8)
        arr = np.array(np.round(arr),dtype=np.uint8)
    #out=Image.fromarray(arr, mode='L')

    imName = patient + ".jpeg"
    print(imName)
    plt.imshow(arr, cmap=plt.cm.gray)
    plt.show()
#out.save(imName)
#plt.show()