###       Reading the Attributes from Dicom File

In the VinBigData Chect x-ray Competition, we are dealing with dicom files for the x-ray chest images. The train.cvs does not contain all the patient data that might be important for the localization and detction of the abnormalities.

In the notebook you will find a code for reading some of the immportant patient attributes 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
#for dirname, _, filenames in os.walk('/kaggle/input'):
#    for filename in filenames:
#        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import pydicom
import warnings
warnings.filterwarnings("ignore")

In [None]:
#
# The path to the dataset
#
DataDir = "../input/vinbigdata-chest-xray-abnormalities-detection/"
!ls {DataDir}

In [None]:
#
# Reading the train.cvs data
#
train = pd.read_csv(DataDir+'train.csv')
train.head()

In [None]:
#
# Finding the keywords for accessing the data elements in a dicom file.
#
dcm_file = pydicom.dcmread(DataDir+ 'train/'+train['image_id'][2]+'.dicom')
dcm_file.dir()

In [None]:
#
# Here another way for accessing the image data as a numpy array
#
dcm_pixels = dcm_file.pixel_array
dcm_pixels

In [None]:
#
# Displaying the image from the pixel array
#
plt.figure(figsize=(12,10))
plt.imshow(dcm_pixels, cmap=plt.cm.gray)
plt.show()

In [None]:
#
# Here is the function for reading the patients' attributes 
# from the dicom images.
# 
def get_dcm_attributes(path):

    df = pd.DataFrame(columns=['image_id', 'Age', 'Gender','Image_Hieght',
                    'ImageWidth','x_spacing','y_spacing'])
    #Read some files for testing
    files = list(os.listdir(path))[0:10]
    #Read All files
    #files = list(os.listdir(path))
   
    try:
        i = 0
        for file in files:

            file_path = os.path.join(path,file)
            dcmData = pydicom.dcmread(file_path,stop_before_pixels=True)

            file_name = file.split(".")[0]

            attributes = dcmData.dir()
            if 'PatientAge' in attributes:
                age_str = dcmData.PatientAge
                if age_str != '' and age_str != 'Y':
                    age = int(age_str[:-1])
                else:
                    age = np.NaN
            else:
                age = np.NaN
            if 'PatientSex' in attributes:
                gender = dcmData.PatientSex
                if gender =='' : gender = np.NaN
            else:
                gender = np.NaN
            if 'Rows' in attributes:
                rows = dcmData.Rows
            else:
                rows = np.NaN
            if 'Columns' in attributes:
                clmns = dcmData.Columns
            else:
                clmns = np.NaN
            if 'PixelSpacing' in attributes:
                ps = dcmData.PixelSpacing
            else:
                ps = [np.NaN,np.NaN]

            df = df.append(pd.DataFrame({'image_id': file_name, 
                    'Age': age, 'Gender': gender,'Image_Hieght': rows,
                    'ImageWidth': clmns,
                    'x_spacing': ps[0],'y_spacing': ps[1]}, index=[i]))
            i+=1
    except ValueError:
            print('age_str',"   ", age_str)
    return df

In [None]:
#
# Reading some image attributes. (it takes several minutes for the whole dataset)
#
TrainDir = DataDir+'train/'
dcm_attr = get_dcm_attributes(TrainDir)
dcm_attr.head(10)

In [None]:
np.sum(dcm_attr.isna())

In [None]:
#
# Now Join this info with the data in the train.cvs
#
train_mrg = pd.merge(train, dcm_attr, on = 'image_id')

In [None]:
train_mrg.head(20)

## Don't forget to upvote ^_^