This is the 2nd notebook in a series inspired by (and borrowing heavily from) [Jeremy Howard's notebooks](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/114214) from the 2019 RSNA Intracranial Hemorrhage Challenge.

The first notebook in the series can be found [here](https://www.kaggle.com/wfwiggins203/extracting-dicom-metadata-labels-with-fast-ai).

Here we'll dig into the DICOM metadata and view some images with `fastai == 2.0.x`. I'll try to inject some of my domain knowledge as a radiologist as we go. _DISCLAIMER: I do NOT subspecialize in thoracic radiology, but rather neuroradiology. Despite that, we all get training in general radiology and detecting PEs is an important skill for any radiologist reading imaging studies of anatomic regions that overlap with the chest (like the neck and spine, in my case)._

Again, we'll have to start by upgrading the `fastai` library.

In [None]:
!pip install fastai --upgrade >/dev/null

In [None]:
from fastai.medical.imaging import *
from fastai.basics import *

In [None]:
path_inp = Path('../input')
path = path_inp/'rsna-str-pulmonary-embolism-detection'
path_trn = path/'train'
path_tst = path/'test'

In [None]:
path_df = path_inp/'extracting-dicom-metadata-labels-with-fast-ai'
df_lbls = pd.read_feather(path_df/'lbls.fth')
df_trn = pd.read_feather(path_df/'df_trn.fth')

# Exploring the DICOM metadata

The `fastai.medical.imaging` library has some nice tools for dealing with DICOM files, built on top of the `pydicom` library. The label for each field in the following table corresponds to a _DICOM tag_ - with the exception of `fname`, which is added by the `fastai` function that created this DataFrame.

In [None]:
df_trn.columns

In [None]:
np.random.seed(42)
df_trn.sample(20).T

# Key findings in the DICOM metadata

1. Notice the difference in slice thickness. The vast majority of images are reconstructed with `SliceThickness == 1.25`; however, as you explore the data, you'll notice that some are reconstructed with `SliceThickness == 1.00 || 1.50 || 2.00`.
2. `Instance Number` is an important attribute for arranging the DICOM images in the appropriate order.

These findings will have important implications if you're considering reconstructing a volumetric dataset for a 3D CNN architecture.

# Visualizing DICOM images

The image data in a DICOM file is the attribute with the `PixelData` tag. The `.show()` method from `fastai` will automatically extract the pixel data and rescale it for display.

In [None]:
np.random.seed(42)
fns = L(df_trn.sample(12)['fname'].values.tolist())
dcms = fns.map(dcmread)

I got an error when trying to plot some of these images that the `GDCM` package was missing, so we'll install that here.

In [None]:
!conda install gdcm -c conda-forge -y >/dev/null

In [None]:
import gdcm

In [None]:
for i in range(len(dcms)):
    try:
        dcms[i].show()
    except:
        print('GDCM error')


Looks like installing GDCM didn't exactly fix the error, but at least we can look at most of the images.

A few comments on the above images:
* `fastai` automatically rescales the pixels for viewing
* This may not always result in optimal viewing for the task of PE detection as the intravenous contrast (white stuff in the branching pulmonary arterial tree) may obscure underlying filling defects (== PE) if it is too bright
* However, there are other ways of _windowing_ the pixels, including with the `('WindowCenter', 'WindowWidth', 'RescaleIntercept')` DICOM attributes, which may result in more ideal viewing
* This is not the ideal viewing size for this task; however, it appears that the 10th image in this series has filling defects == PEs in the right and left lower lobes

Let's take a look at the labels for these images to see if image # 10 is labeled as positive for PE.

In [None]:
sops = dcms.map(lambda x: x['SOPInstanceUID'].value)
df_lbls.set_index('SOPInstanceUID').loc[sops]

Suspicion confirmed! That's all for now... I'll try to come back later and add more to this series.