# Dicom Data: Get Patient Demographics

**Purpose:** This notebook is a starter for gathering patient demographics data from the dicom data on an image by image basis.

**Description:** This notebook is helping me learn code documentation and coding best practices in Python. Also a starter for working with dicom data, which is new to me. Any criticism is appreciated.

_Currently, I only see three demographic features that I would use alongside the pixel data for this project._

[Link to the overview of the "VinBigData Chest X-ray Abnormalities Detection" Competition](http://www.kaggle.com/c/vinbigdata-chest-xray-abnormalities-detection)

## Import Packages

In [None]:
import pandas as pd
import numpy as np
import pydicom
import typing
import warnings
warnings.filterwarnings("ignore")

## Get Data

Use the **path** parameter to read the training data CSV file into the notebook as a pandas dataframe.

In [None]:
# filepath to chest x-ray dicom data
path = '/kaggle/input/vinbigdata-chest-xray-abnormalities-detection/'

train_data = pd.read_csv(path+'train.csv')

In [None]:
# list of demographic features
demo_features = ['PatientAge','PatientSex','PatientSize']

# randomly picked test image from training data
test_image_id = train_data.image_id[6]

## Custom Functions

+ withdraw_element
+ get_demographics

In [None]:
def withdraw_element(x):
    """
    Gets the data element from the raw element from the dicom data if raw, else returns the dicom data element.
    
    Parameters:
    -----------
        x (DataElement or RawDataElement): a dicom data element
    
    Returns:
    --------
        x (DataElement)
    """
    if isinstance(x, pydicom.dataelem.RawDataElement):
        return(pydicom.dataelem.DataElement_from_raw(x))
    else:
        return(x)

def get_demographics(image_id: str, demo_features: list)->pd.DataFrame:
    """
    Gets the demographic feature values from the dicom data associated with an image_id and returns a dataframe with values labeled by columns.
    
    Parameters:
    -----------
        image_id (str): an alpha-numeric string corresponding to an images id
        
        demo_features (list): a list of strings corresponding to demographic features the user wants from the dicom data
        
    Returns:
    --------
        result_df (pandas dataframe): a dataframe of the demographic features and their values for the image_id
    """
    file_path = path+'train/'+image_id+'.dicom'
    meta_data = pydicom.dcmread(file_path)
    
    demo_values = [image_id]
    for demo in demo_features:
        try:
            demo_values.append(withdraw_element(meta_data[demo]).value) # not all image_id meta data files have all demographic features
        except:
            demo_values.append('NaN')
    result_df = pd.DataFrame(data = demo_values, index = [['image_id'] + demo_features]).T
    
    return(result_df)

## Get Patient Demographics from Dicom Data

Use the image_id to grab the only three demographic features seen in the data so far: Age, Sex, and Size.  If more are found, add them to the **demo_features** list parameter.

In [None]:
# get_demographics function result on single test image

test_results = get_demographics(test_image_id, demo_features)
print(test_results)

In [None]:
%%time
# get_demographics function result on dataframe

test_df = train_data[0:100]

pd.concat([get_demographics(x,demo_features) for x in test_df.image_id]) # concatenating a list of dataframes along the 0 axis returns a dataframe

Now, you can merge this demographic info with the pixel data using the unique image_id.

### **Please upvote this notebook if you found it usefull!**

## References

* [Extracting raw elements from Dicom data with pydicom package](http://stackoverflow.com/questions/56601525/how-to-store-the-header-data-of-a-dicom-file-in-a-pandas-dataframe)
* [drcapa's notebook for viewing the pixel data](http://www.kaggle.com/drcapa/chest-x-ray-starter)