**Getting Started with Dicom using fastai**

I haven't spent a lot of time using fastai with dicom files and since I'm entering this competition I wanted to use my fastai knowledge to learn more about the dataset and to generally understand everything better with a starter notebook.

This will be one of a few notebooks. I'll link here to the bounding box version soon after, but first wanted to get a look at the data and follow the [fastai medical imaging tutorial](https://docs.fast.ai/tutorial.medical_imaging.html). 

In [None]:
#make sure we have latest build
! [ -e /content ] && pip install -Uqq fastai 

Import what we will need. Take special note of the fastai.medical. That will give us some great tools to work with dicom files, etc. 

In [None]:
from fastai.basics import *
from fastai.callback.all import *
from fastai.vision.all import *
from fastai.medical.imaging import *

import pydicom,kornia,skimage
from pydicom.dataset import Dataset as DcmDataset
from pydicom.tag import BaseTag as DcmTag
from pydicom.multival import MultiValue as DcmMultiValue
from PIL import Image

try:
    import cv2
    cv2.setNumThreads(0)
except: pass

Set our paths for files and for our training dicom set. 

In [None]:
path = Path('../input/vinbigdata-chest-xray-abnormalities-detection')
train_imgs = path/'train'

A quick look to make sure we are on track before we get too far. 

In [None]:
fname = train_imgs/'000434271f63a053c4128a0ba6352c7f.dicom'
dcm = fname.dcmread()
dcm.show(scale=False)

Set up our images to get dicom files and read them.

In [None]:
items = get_dicom_files(train_imgs)

We can now split the set for or training and validation sets

In [None]:
train,val = RandomSplitter()(items)

Pydicom is a python package for parsing DICOM files, making it easier to access the header of the DICOM as well as coverting the raw pixel_data into pythonic structures for easier manipulation. fastai.medical.imaging uses pydicom.dcmread to load the DICOM file.

To plot an X-ray, we can select an entry in the items list and load the DICOM file with dcmread.

In [None]:
patient = 3
xray_sample = items[patient].dcmread()

Now we can view the header meta data within the dicom file. 

In [None]:
xray_sample

There is a lot of information here and the good news is there is an excellent resource to learn more about these:

http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.3.html#sect_C.7.6.3.1.4

One row you will notice is pixel data as an array. We can view this, although in its raw for, isn't very useful. 

In [None]:
xray_sample.PixelData[:200]

Because of the complexity in interpreting PixelData, pydicom provides an easy way to get it in a convenient form: pixel_array which returns a numpy.ndarray containing the pixel data:

In [None]:
xray_sample.pixel_array, xray_sample.pixel_array.shape

We can view the image again.

In [None]:
xray_sample.show()

Remember all the meta data? 
That data can be pulled into a dataframe.

Thanks to [Ben](https://www.kaggle.com/beezus666/chest-x-ray-with-fastai) for finding a solution with the dataframe hanging!

In [None]:
%%time 
# takes 7-8 minutes, so load from pickle
dicom_dataframe = pd.DataFrame.from_dicoms(items, window=dicom_windows.lungs, px_summ=False)

dicom_dataframe.to_pickle('./dicom_dataframe_pickle.pkl')
dicom_dataframe.shape

In [None]:
dicom_dataframe = pd.read_pickle('./dicom_dataframe_pickle.pkl')
dicom_dataframe.shape # should be 15k by 29

In [None]:
dicom_dataframe.head()