# Pre-processing

DICOM format file stands for Digital Imaging and Communications in Medicine. It is used as standard for communication and management of medical imaging information and related data. 

**PyDicom** it's a python package for inspecting and modifying DICOM files. Modifications can be written again to a new file.

# Visualizing images using pydicom

In [None]:
import pydicom as dicom
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
dir_img_path='C:/Users/Kevin.Diaz.nscorp/Documents/GitHub/AI-stuff/Pneumonia_Challenge/rsna-pneumonia-detection-challenge/stage_2_train_images/'
image_path=dir_img_path+'970bb2ce-6966-47ed-82d0-923d8d0b8617'+'.dcm'

ds= dicom.dcmread(image_path)

plt.imshow(ds.pixel_array)

plt.show()

# Converting DCM file to JPG or PNG

The following code was reused from [Vivek Kumar post](https://medium.com/@vivek8981/dicom-to-jpg-and-extract-all-patients-information-using-python-5e6dd1f1a07d) where he explains more about PyDicom and some other uses. For this exercises we will only use the conversion part for our image analysis.

In [None]:
#Install and Import these libraries
import pydicom as dicom
import os
import cv2

In [None]:
#Make true if PNG format needed
PNG= False

In [None]:
#Specify the .dcm fodler path
#This was tested on a Windows 7 machine
#Here you have to specify either test or train folder. This is where images will be retrieved to convert.
folder_path = 'C:/Users/Kevin.Diaz.nscorp/Documents/GitHub/AI-stuff/Pneumonia_Challenge/rsna-pneumonia-detection-challenge/stage_2_test_images/'

In [None]:
#Specify the output jpg/png folder path
#This is where file will be saved as .jpg or .png
jpg_folder_path = 'C:/Users/Kevin.Diaz.nscorp/Documents/GitHub/AI-stuff/Pneumonia_Challenge/rsna-pneumonia-detection-challenge/test_jpg/'

In [None]:
#List the folder path
images_path=os.listdir(folder_path)

In [None]:
#iterate through every image and convert using pydicom, cv2 and numpy
for n, image in enumerate(images_path):
    ds = dicom.dcmread(os.path.join(folder_path, image))
    pixel_array_numpy= ds.pixel_array
    if PNG == False:
        image=image.replace('.dcm','.jpg')
    else:
        image=image.replace('.dcm','.png')
    cv2.imwrite(os.path.join(jpg_folder_path, image), pixel_array_numpy)
    if n%50==0: #To verify progress of conversion a mod 50 operation is done, change it as needed.
        print('{} image converted'.format(n))

# New function to convert dcm to jpg/png

In [1]:
import numpy as np
import png, os, pydicom

In [2]:
def dicom2png(source_folder, output_folder):
    list_of_files=os.listdir(source_folder)
    for file in list_of_files:
        try:
            ds = pydicom.dcmread(os.path.join(source_folder,file))
            shape = ds.pixel_array.shape
            
            #Convert to float to avoid overflow or underflow losses
            imaged_2d = ds.pixel_array.astype(float)
            
            #resclain grey scale between 0-255
            images_2d_scaled = (np.maximum(imaged_2d,0)/imaged_2d.max())*255.0
            
            #convert to uint
            images_2d_scaled = np.uint8(images_2d_scaled)
            
            #Write the PNG File
            with open(os.path.join(output_folder,file)+'.png','wb') as png_file:
                w = png.Writer(shape[1], shape[0], greyscale=True)
                w.write(png_file, images_2d_scaled)
                
        except:
            print('Could not convert:',file)

In [3]:
source_folder='C:/Users/Kevin.Diaz.nscorp/Documents/GitHub/AI-stuff/Pneumonia_Challenge/rsna-pneumonia-detection-challenge/stage_2_test_images/'
output_folder='C:/Users/Kevin.Diaz.nscorp/Documents/GitHub/AI-stuff/Pneumonia_Challenge/rsna-pneumonia-detection-challenge/test_png/'

dicom2png(source_folder, output_folder)