# RSNA: the worst cases
Most images in this dataset are normal.  
When they are not, one hemorrhage type is detected most of the time.  
But sometimes there are multiple hemorrhages or even all of them.  
I have counted those cases, and looked at them.  

In [None]:
from glob import glob
import os
import pandas as pd
import numpy as np
import re
from PIL import Image
import seaborn as sns
import pydicom
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tqdm import tqdm_notebook as tqdm
import cv2

#checnking the input files
# print(os.listdir("../input/rsna-intracranial-hemorrhage-detection/"))

# Load data

In [None]:
# Load datasets
train_images_dir = '../input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/'
test_images_dir = '../input/rsna-intracranial-hemorrhage-detection/stage_1_test_images/'

train = pd.read_csv('../input/rsna-intracranial-hemorrhage-detection/stage_1_train.csv')
test = pd.read_csv('../input/rsna-intracranial-hemorrhage-detection/stage_1_sample_submission.csv')

# Transform training set. Code from https://www.kaggle.com/taindow/pytorch-efficientnet-b0-benchmark
train[['ID', 'Image', 'Diagnosis']] = train['ID'].str.split('_', expand=True)
train = train[['Image', 'Diagnosis', 'Label']]
train.drop_duplicates(inplace=True)
train = train.pivot(index='Image', columns='Diagnosis', values='Label').reset_index()
train['Image'] = 'ID_' + train['Image']
train.head(10)

# Look at some normal cases
Here are multiple images where `any == 0`.

In [None]:
def _get_first_of_dicom_field_as_int(x):
    #get x[0] as in int is x is a 'pydicom.multival.MultiValue', otherwise get int(x)
    if type(x) == pydicom.multival.MultiValue:
        return int(x[0])
    else:
        return int(x)

def _get_windowing(data):
    dicom_fields = [data[('0028','1050')].value, #window center
                    data[('0028','1051')].value, #window width
                    data[('0028','1052')].value, #intercept
                    data[('0028','1053')].value] #slope
    return [_get_first_of_dicom_field_as_int(x) for x in dicom_fields]

def get_image(data, windowing=None):
    window_center, window_width, intercept, slope = windowing or _get_windowing(data)
    img = data.pixel_array
    img = (img*slope +intercept)
    img_min = window_center - window_width//2
    img_max = window_center + window_width//2
    img[img<img_min] = img_min
    img[img>img_max] = img_max
    return img

In [None]:
fig, ax = plt.subplots(nrows=4, ncols=4, figsize=(30,30))

for i in range(16):
    idx = train[train['any'] == 0]['Image'].iloc[i]
    data = pydicom.dcmread(train_images_dir + idx + '.dcm')
    img = get_image(data)
    ax[i//4, i%4].set_title(idx)
    ax[i//4, i%4].imshow(img, cmap=plt.cm.bone)

# Look at the bad cases
Let's count images in which there are multiple hemorrhages detected.

In [None]:
for n in range(6):
    many = train[train[['epidural', 'intraparenchymal', 'intraventricular', 'subarachnoid', 'subdural']].sum(1) == n].copy()
    print('Number of hemorrhages: {}, amount of such images: {}, fraction: {:.3f}%'.format(n, len(many), 100 * len(many) / len(train)))

Most images (86%) show normal brains.  
10% of images have only one type of hemorrhage.  
There are **20 samples** having all the hemorrhage types at the same time:

In [None]:
many

Let's look closer at these samples: extract `Patient ID` and see if they belong to same patient

In [None]:
many['Patient ID'] = many['Image'].map(lambda image: pydicom.dcmread(train_images_dir + image + '.dcm')[('0010', '0020')].repval.replace("'", ''))
many

In [None]:
many['Patient ID'].unique()

There are 8 unique patients to whom those 20 images belong.  

## Print bad scans

Let's look at each of them.

In [None]:
log = []

for i, patient in enumerate(many['Patient ID'].unique()):
    print('Patient №{}: {}'.format(i + 1, patient))
    ids = many[many['Patient ID'] == patient]['Image']
    fig, ax = plt.subplots(nrows=1, ncols=len(ids), figsize=(30,10))
    for j, idx in enumerate(ids):
        data = pydicom.dcmread(train_images_dir + idx + '.dcm')
        log.append(('Patient №{}: {}'.format(i + 1, patient), idx, data))
        img = get_image(data)
        if len(ids) == 1:
            ax.set_title(idx, fontsize=40)
            ax.imshow(img, cmap=plt.cm.bone)
        else:
            ax[j].set_title(idx, fontsize=30)
            ax[j].imshow(img, cmap=plt.cm.bone)
    plt.show()

I am not a doctor, so I am not exactly sure what do I see.  
For patients №2,4,5,7,8 I suppose there are some serious skull injuries; no surprise that all types of hemorrhage were detected...  

I am not sure about other ones, especially №3.  
№3 looks asymetric a bit, but maybe it's just angle of view.  
I cannot see any obvious defects there.   
Maybe the windowing in not correct?  

# Windowing
Let's try drawing this image with different windowings.  
I used windows from kernel https://www.kaggle.com/dcstang/see-like-a-radiologist-with-systematic-windowing

In [None]:
# Get id of image of patient №3
idx = many[many['Patient ID'] == many['Patient ID'].unique()[2]]['Image'].iloc[0]
data = pydicom.dcmread(train_images_dir + idx + '.dcm')
c, w, intercept, slope = _get_windowing(data)

known_windows = [('Default window', c, w),
           ('Brain Matter window', 40, 80),
           ('Blood/subdural window', 80, 200),
           ('Soft tissue window', 40, 375),
           ('Bone window', 600, 2800),
           ('Grey-white differentiation window', 32, 8)]
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(30,20))

for i, (window_name, window_center, window_width) in enumerate(known_windows):
    img = get_image(data, [window_center, window_width, intercept, slope])
    ax[i//3, i%3].set_title(window_name, fontsize=40)
    ax[i//3, i%3].imshow(img, cmap=plt.cm.bone)

Now I can see, using the bone window, that there is probably a crack at the top left corner.  

Let's look at all those 20 images with all windows that I know:

In [None]:
for patient, idx, data in log:
    print(patient, 'Image', idx)
    c, w, intercept, slope = _get_windowing(data)

    fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(30,20))

    for i, (window_name, window_center, window_width) in enumerate(known_windows):
        img = get_image(data, [window_center, window_width, intercept, slope])
        ax[i//3, i%3].set_title(window_name, fontsize=40)
        ax[i//3, i%3].imshow(img, cmap=plt.cm.bone)
    plt.show()

It was not clear what happens at image №1. Now, through the bone window I can clearly see what is wrong.  
All the patients in this set, except №6, have clearly visible cracks.  

I am not sure what exactly is wrong with patient №6.  
But, thanks to https://www.kaggle.com/dcstang/see-like-a-radiologist-with-systematic-windowing, I can see with my own eyes that there is not only something big outside the scull on the left, but also a big "bad" spot in `Soft tissue window` inside the brain. I couldn't see this using `Default window`.  

# Conclusion
Those pictures look interesting but also scary...

Now I know that:
* There are actually a lot of samples when it is 2 or more types of hemorrhage
* I probably want to use multiple windows to feed multiple inputs to a neural network. If I can't see anything on the default windowed image, why should model? :)

### I wish health to everyone, and good luck with this challenge!
If you have something to add, especially from medical point, please leave comments :)